* [PATCH v6 0/4] media: raspberrypi: Support RPi5's CFE
@ 2024-10-03 10:31 Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 1/4] media: uapi: Add meta formats for PiSP FE config and stats Tomi Valkeinen
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-03 10:31 UTC (permalink / raw)
To: Mauro Carvalho Chehab, Raspberry Pi Kernel Maintenance,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Tomi Valkeinen, Krzysztof Kozlowski
This series adds support to the CFE hardware block on RaspberryPi 5. The
CFE (Camera Front End) contains a CSI-2 receiver and Front End, a small
ISP.
To run this, you need the basic RPi5 kernel support plus relevant dts
changes to enable the cfe and camera. My work branch with everything
needed to run CFE can be found from:
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux.git rp1-cfe
A few notes about the patches:
- The original work was done by RaspberryPi, mostly by Naushir Patuck.
- The second video node only sets V4L2_CAP_META_CAPTURE instead of both
V4L2_CAP_META_CAPTURE and V4L2_CAP_META_CAPTURE like the other nodes.
This is a temporary workaround for userspace (libcamera), and
hopefully can be removed soon.
I have tested this with:
- A single IMX219 sensor connected to the RPi5's CSI-2 port
- Arducam's UB960 FPD-Link board with four imx219 sensors connected
Tomi
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
---
Changes in v6:
- Rename "isr_handler" functions to just "isr"
- Fix checkpatch warnings, mainly lines over 80 chars
- A diff of the changes be be found from: https://paste.debian.net/1331173/ (note: the tab there seems to be 4 chars)
- Link to v5: https://lore.kernel.org/r/20240910-rp1-cfe-v5-0-9ab4c4c8eace@ideasonboard.com
Changes in v5:
- Add "depends on PM". The platforms that use CFE will always have PM in
practice, and it's not worth supporting both the PM and !PM cases as
it adds complexity to the driver.
- Link to v4: https://lore.kernel.org/r/20240904-rp1-cfe-v4-0-f1b5b3d69c81@ideasonboard.com
Changes in v4:
- Drop unnecessary clock-lanes from the DT bindings
- Drop unnecessary linux-media from MAINTAINERS entry
- Drop unnecessary conversion to bool with !!
- Don't set cap->bus_info in cfe_querycap()
- Make debugfs files not readable by the world
- Check the return value of v4l2_fwnode_endpoint_parse()
- Remove the code dealing with remote_ep_fwnode. Instead use
v4l2_create_fwnode_links_to_pad() and media_pad_remote_pad_unique() to
create the link and get the pad index.
- Add cfe/csi2/fe/dphy argument to the respective dbg/info/err print
macros.
- Drop some debug prints and add a few, clarifying the prints for
enabling and disabling the streams.
- Some cosmetic changes (linefeed, drop unnecessary assignment, move a
define)
- Link to v3: https://lore.kernel.org/r/20240815-rp1-cfe-v3-0-e15a979db327@ideasonboard.com
Changes in v3:
- Based on v6.11-rc3. The PiSP BE series is now in upstream so no extra
dependencies are needed.
- Fixed cfe_remove() return value, as the .remove hook has changed
- Added Krzysztof's Rb.
- Link to v2: https://lore.kernel.org/r/20240620-rp1-cfe-v2-0-b8b48fdba3b3@ideasonboard.com
Changes in v2:
- Change the compatible string back to raspberrypi,rp1-cfe from raspberrypi,rpi5-rp1-cfe
- Drop the references to rp1 headers in the DT binding example. This
allows compiling the example without the rp1 support.
- Fix missing remap lines for mono formats
- Fix csi2_pad_set_fmt() so that the format can be changed back to the
sink's format from 16-bit or compressed format.
- Link to v1: https://lore.kernel.org/r/20240318-rp1-cfe-v1-0-ac6d960ff22d@ideasonboard.com
---
Tomi Valkeinen (4):
media: uapi: Add meta formats for PiSP FE config and stats
dt-bindings: media: Add bindings for raspberrypi,rp1-cfe
media: raspberrypi: Add support for RP1-CFE
media: admin-guide: Document the Raspberry Pi CFE (rp1-cfe)
.../admin-guide/media/raspberrypi-rp1-cfe.dot | 27 +
.../admin-guide/media/raspberrypi-rp1-cfe.rst | 78 +
Documentation/admin-guide/media/v4l-drivers.rst | 1 +
.../bindings/media/raspberrypi,rp1-cfe.yaml | 93 +
.../userspace-api/media/v4l/meta-formats.rst | 1 +
.../userspace-api/media/v4l/metafmt-pisp-fe.rst | 39 +
MAINTAINERS | 7 +
drivers/media/platform/raspberrypi/Kconfig | 1 +
drivers/media/platform/raspberrypi/Makefile | 1 +
drivers/media/platform/raspberrypi/rp1-cfe/Kconfig | 15 +
.../media/platform/raspberrypi/rp1-cfe/Makefile | 6 +
.../media/platform/raspberrypi/rp1-cfe/cfe-fmts.h | 332 +++
.../media/platform/raspberrypi/rp1-cfe/cfe-trace.h | 202 ++
drivers/media/platform/raspberrypi/rp1-cfe/cfe.c | 2504 ++++++++++++++++++++
drivers/media/platform/raspberrypi/rp1-cfe/cfe.h | 43 +
drivers/media/platform/raspberrypi/rp1-cfe/csi2.c | 586 +++++
drivers/media/platform/raspberrypi/rp1-cfe/csi2.h | 89 +
drivers/media/platform/raspberrypi/rp1-cfe/dphy.c | 181 ++
drivers/media/platform/raspberrypi/rp1-cfe/dphy.h | 27 +
.../media/platform/raspberrypi/rp1-cfe/pisp-fe.c | 605 +++++
.../media/platform/raspberrypi/rp1-cfe/pisp-fe.h | 53 +
drivers/media/v4l2-core/v4l2-ioctl.c | 2 +
.../uapi/linux/media/raspberrypi/pisp_fe_config.h | 273 +++
.../linux/media/raspberrypi/pisp_fe_statistics.h | 64 +
include/uapi/linux/videodev2.h | 2 +
25 files changed, 5232 insertions(+)
---
base-commit: 98f7e32f20d28ec452afb208f9cffc08448a2652
change-id: 20240314-rp1-cfe-142b628b7214
Best regards,
--
Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v6 1/4] media: uapi: Add meta formats for PiSP FE config and stats
2024-10-03 10:31 [PATCH v6 0/4] media: raspberrypi: Support RPi5's CFE Tomi Valkeinen
@ 2024-10-03 10:31 ` Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 2/4] dt-bindings: media: Add bindings for raspberrypi,rp1-cfe Tomi Valkeinen
` (2 subsequent siblings)
3 siblings, 0 replies; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-03 10:31 UTC (permalink / raw)
To: Mauro Carvalho Chehab, Raspberry Pi Kernel Maintenance,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Tomi Valkeinen
Add two meta formats for PiSP FE: V4L2_META_FMT_RPI_FE_CFG and
V4L2_META_FMT_RPI_FE_STATS. The former is used to provide configuration
for the FE and the latter is used to read the statistics from the FE.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
---
.../userspace-api/media/v4l/meta-formats.rst | 1 +
.../userspace-api/media/v4l/metafmt-pisp-fe.rst | 39 ++++++++++++++++++++++
drivers/media/v4l2-core/v4l2-ioctl.c | 2 ++
include/uapi/linux/videodev2.h | 2 ++
4 files changed, 44 insertions(+)
diff --git a/Documentation/userspace-api/media/v4l/meta-formats.rst b/Documentation/userspace-api/media/v4l/meta-formats.rst
index c6e56b5888bc..86ffb3bc8ade 100644
--- a/Documentation/userspace-api/media/v4l/meta-formats.rst
+++ b/Documentation/userspace-api/media/v4l/meta-formats.rst
@@ -16,6 +16,7 @@ These formats are used for the :ref:`metadata` interface only.
metafmt-generic
metafmt-intel-ipu3
metafmt-pisp-be
+ metafmt-pisp-fe
metafmt-rkisp1
metafmt-uvc
metafmt-vivid
diff --git a/Documentation/userspace-api/media/v4l/metafmt-pisp-fe.rst b/Documentation/userspace-api/media/v4l/metafmt-pisp-fe.rst
new file mode 100644
index 000000000000..fddeada83e4a
--- /dev/null
+++ b/Documentation/userspace-api/media/v4l/metafmt-pisp-fe.rst
@@ -0,0 +1,39 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _v4l2-meta-fmt-rpi-fe-cfg:
+
+************************
+V4L2_META_FMT_RPI_FE_CFG
+************************
+
+Raspberry Pi PiSP Front End configuration format
+================================================
+
+The Raspberry Pi PiSP Front End image signal processor is configured by
+userspace by providing a buffer of configuration parameters to the
+`rp1-cfe-fe-config` output video device node using the
+:c:type:`v4l2_meta_format` interface.
+
+The `Raspberry Pi PiSP technical specification
+<https://datasheets.raspberrypi.com/camera/raspberry-pi-image-signal-processor-specification.pdf>`_
+provide detailed description of the Front End configuration and programming
+model.
+
+.. _v4l2-meta-fmt-rpi-fe-stats:
+
+**************************
+V4L2_META_FMT_RPI_FE_STATS
+**************************
+
+Raspberry Pi PiSP Front End statistics format
+=============================================
+
+The Raspberry Pi PiSP Front End image signal processor provides statistics data
+by writing to a buffer provided via the `rp1-cfe-fe-stats` capture video device
+node using the
+:c:type:`v4l2_meta_format` interface.
+
+The `Raspberry Pi PiSP technical specification
+<https://datasheets.raspberrypi.com/camera/raspberry-pi-image-signal-processor-specification.pdf>`_
+provide detailed description of the Front End configuration and programming
+model.
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 5eb4d797d259..95c2c4a97966 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1466,6 +1466,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
case V4L2_PIX_FMT_Y212: descr = "12-bit YUYV Packed"; break;
case V4L2_PIX_FMT_Y216: descr = "16-bit YUYV Packed"; break;
case V4L2_META_FMT_RPI_BE_CFG: descr = "RPi PiSP BE Config format"; break;
+ case V4L2_META_FMT_RPI_FE_CFG: descr = "RPi PiSP FE Config format"; break;
+ case V4L2_META_FMT_RPI_FE_STATS: descr = "RPi PiSP FE Statistics format"; break;
case V4L2_META_FMT_GENERIC_8: descr = "8-bit Generic Metadata"; break;
case V4L2_META_FMT_GENERIC_CSI2_10: descr = "8-bit Generic Meta, 10b CSI-2"; break;
case V4L2_META_FMT_GENERIC_CSI2_12: descr = "8-bit Generic Meta, 12b CSI-2"; break;
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 4e91362da6da..9f68cd55248f 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -857,6 +857,8 @@ struct v4l2_pix_format {
/* Vendor specific - used for RaspberryPi PiSP */
#define V4L2_META_FMT_RPI_BE_CFG v4l2_fourcc('R', 'P', 'B', 'C') /* PiSP BE configuration */
+#define V4L2_META_FMT_RPI_FE_CFG v4l2_fourcc('R', 'P', 'F', 'C') /* PiSP FE configuration */
+#define V4L2_META_FMT_RPI_FE_STATS v4l2_fourcc('R', 'P', 'F', 'S') /* PiSP FE stats */
#ifdef __KERNEL__
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 2/4] dt-bindings: media: Add bindings for raspberrypi,rp1-cfe
2024-10-03 10:31 [PATCH v6 0/4] media: raspberrypi: Support RPi5's CFE Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 1/4] media: uapi: Add meta formats for PiSP FE config and stats Tomi Valkeinen
@ 2024-10-03 10:31 ` Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 4/4] media: admin-guide: Document the Raspberry Pi CFE (rp1-cfe) Tomi Valkeinen
[not found] ` <20241003-rp1-cfe-v6-3-d6762edd98a8@ideasonboard.com>
3 siblings, 0 replies; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-03 10:31 UTC (permalink / raw)
To: Mauro Carvalho Chehab, Raspberry Pi Kernel Maintenance,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Tomi Valkeinen, Krzysztof Kozlowski
Add DT bindings for raspberrypi,rp1-cfe.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
---
.../bindings/media/raspberrypi,rp1-cfe.yaml | 93 ++++++++++++++++++++++
1 file changed, 93 insertions(+)
diff --git a/Documentation/devicetree/bindings/media/raspberrypi,rp1-cfe.yaml b/Documentation/devicetree/bindings/media/raspberrypi,rp1-cfe.yaml
new file mode 100644
index 000000000000..eba5394719b9
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/raspberrypi,rp1-cfe.yaml
@@ -0,0 +1,93 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/media/raspberrypi,rp1-cfe.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Raspberry Pi PiSP Camera Front End
+
+maintainers:
+ - Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
+ - Raspberry Pi Kernel Maintenance <kernel-list@raspberrypi.com>
+
+description: |
+ The Raspberry Pi PiSP Camera Front End is a module in Raspberrypi 5's RP1 I/O
+ controller, that contains:
+ - MIPI D-PHY
+ - MIPI CSI-2 receiver
+ - Simple image processor (called PiSP Front End, or FE)
+
+ The FE documentation is available at:
+ https://datasheets.raspberrypi.com/camera/raspberry-pi-image-signal-processor-specification.pdf
+
+ The PHY and CSI-2 receiver part have no public documentation.
+
+properties:
+ compatible:
+ items:
+ - const: raspberrypi,rp1-cfe
+
+ reg:
+ items:
+ - description: CSI-2 registers
+ - description: D-PHY registers
+ - description: MIPI CFG (a simple top-level mux) registers
+ - description: FE registers
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ port:
+ $ref: /schemas/graph.yaml#/$defs/port-base
+ additionalProperties: false
+ description: CSI-2 RX Port
+
+ properties:
+ endpoint:
+ $ref: video-interfaces.yaml#
+ unevaluatedProperties: false
+
+ properties:
+ data-lanes:
+ minItems: 1
+ maxItems: 4
+
+ required:
+ - data-lanes
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+
+additionalProperties: false
+
+examples:
+ - |
+ rp1 {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ csi@110000 {
+ compatible = "raspberrypi,rp1-cfe";
+ reg = <0xc0 0x40110000 0x0 0x100>,
+ <0xc0 0x40114000 0x0 0x100>,
+ <0xc0 0x40120000 0x0 0x100>,
+ <0xc0 0x40124000 0x0 0x1000>;
+
+ interrupts = <42>;
+
+ clocks = <&rp1_clocks>;
+
+ port {
+ csi_ep: endpoint {
+ remote-endpoint = <&cam_endpoint>;
+ data-lanes = <1 2>;
+ };
+ };
+ };
+ };
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 4/4] media: admin-guide: Document the Raspberry Pi CFE (rp1-cfe)
2024-10-03 10:31 [PATCH v6 0/4] media: raspberrypi: Support RPi5's CFE Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 1/4] media: uapi: Add meta formats for PiSP FE config and stats Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 2/4] dt-bindings: media: Add bindings for raspberrypi,rp1-cfe Tomi Valkeinen
@ 2024-10-03 10:31 ` Tomi Valkeinen
[not found] ` <20241003-rp1-cfe-v6-3-d6762edd98a8@ideasonboard.com>
3 siblings, 0 replies; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-03 10:31 UTC (permalink / raw)
To: Mauro Carvalho Chehab, Raspberry Pi Kernel Maintenance,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Tomi Valkeinen
Add documentation for rp1-cfe driver.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
---
.../admin-guide/media/raspberrypi-rp1-cfe.dot | 27 ++++++++
.../admin-guide/media/raspberrypi-rp1-cfe.rst | 78 ++++++++++++++++++++++
Documentation/admin-guide/media/v4l-drivers.rst | 1 +
3 files changed, 106 insertions(+)
diff --git a/Documentation/admin-guide/media/raspberrypi-rp1-cfe.dot b/Documentation/admin-guide/media/raspberrypi-rp1-cfe.dot
new file mode 100644
index 000000000000..7717f2291049
--- /dev/null
+++ b/Documentation/admin-guide/media/raspberrypi-rp1-cfe.dot
@@ -0,0 +1,27 @@
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{<port0> 0} | csi2\n/dev/v4l-subdev0 | {<port1> 1 | <port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port1 -> n00000011 [style=dashed]
+ n00000001:port1 -> n00000007:port0
+ n00000001:port2 -> n00000015
+ n00000001:port2 -> n00000007:port0 [style=dashed]
+ n00000001:port3 -> n00000019 [style=dashed]
+ n00000001:port3 -> n00000007:port0 [style=dashed]
+ n00000001:port4 -> n0000001d [style=dashed]
+ n00000001:port4 -> n00000007:port0 [style=dashed]
+ n00000007 [label="{{<port0> 0 | <port1> 1} | pisp-fe\n/dev/v4l-subdev1 | {<port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000007:port2 -> n00000021
+ n00000007:port3 -> n00000025 [style=dashed]
+ n00000007:port4 -> n00000029
+ n0000000d [label="{imx219 6-0010\n/dev/v4l-subdev2 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000d:port0 -> n00000001:port0 [style=bold]
+ n00000011 [label="rp1-cfe-csi2-ch0\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n00000015 [label="rp1-cfe-csi2-ch1\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n00000019 [label="rp1-cfe-csi2-ch2\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
+ n0000001d [label="rp1-cfe-csi2-ch3\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
+ n00000021 [label="rp1-cfe-fe-image0\n/dev/video4", shape=box, style=filled, fillcolor=yellow]
+ n00000025 [label="rp1-cfe-fe-image1\n/dev/video5", shape=box, style=filled, fillcolor=yellow]
+ n00000029 [label="rp1-cfe-fe-stats\n/dev/video6", shape=box, style=filled, fillcolor=yellow]
+ n0000002d [label="rp1-cfe-fe-config\n/dev/video7", shape=box, style=filled, fillcolor=yellow]
+ n0000002d -> n00000007:port1
+}
diff --git a/Documentation/admin-guide/media/raspberrypi-rp1-cfe.rst b/Documentation/admin-guide/media/raspberrypi-rp1-cfe.rst
new file mode 100644
index 000000000000..668d978a9875
--- /dev/null
+++ b/Documentation/admin-guide/media/raspberrypi-rp1-cfe.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================
+Raspberry Pi PiSP Camera Front End (rp1-cfe)
+============================================
+
+The PiSP Camera Front End
+=========================
+
+The PiSP Camera Front End (CFE) is a module which combines a CSI-2 receiver with
+a simple ISP, called the Front End (FE).
+
+The CFE has four DMA engines and can write frames from four separate streams
+received from the CSI-2 to the memory. One of those streams can also be routed
+directly to the FE, which can do minimal image processing, write two versions
+(e.g. non-scaled and downscaled versions) of the received frames to memory and
+provide statistics of the received frames.
+
+The FE registers are documented in the `Raspberry Pi Image Signal Processor
+(ISP) Specification document
+<https://datasheets.raspberrypi.com/camera/raspberry-pi-image-signal-processor-specification.pdf>`_,
+and example code for FE can be found in `libpisp
+<https://github.com/raspberrypi/libpisp>`_.
+
+The rp1-cfe driver
+==================
+
+The Raspberry Pi PiSP Camera Front End (rp1-cfe) driver is located under
+drivers/media/platform/raspberrypi/rp1-cfe. It uses the `V4L2 API` to register
+a number of video capture and output devices, the `V4L2 subdev API` to register
+subdevices for the CSI-2 received and the FE that connects the video devices in
+a single media graph realized using the `Media Controller (MC) API`.
+
+The media topology registered by the `rp1-cfe` driver, in this particular
+example connected to an imx219 sensor, is the following one:
+
+.. _rp1-cfe-topology:
+
+.. kernel-figure:: raspberrypi-rp1-cfe.dot
+ :alt: Diagram of an example media pipeline topology
+ :align: center
+
+The media graph contains the following video device nodes:
+
+- rp1-cfe-csi2-ch0: capture device for the first CSI-2 stream
+- rp1-cfe-csi2-ch1: capture device for the second CSI-2 stream
+- rp1-cfe-csi2-ch2: capture device for the third CSI-2 stream
+- rp1-cfe-csi2-ch3: capture device for the fourth CSI-2 stream
+- rp1-cfe-fe-image0: capture device for the first FE output
+- rp1-cfe-fe-image1: capture device for the second FE output
+- rp1-cfe-fe-stats: capture device for the FE statistics
+- rp1-cfe-fe-config: output device for FE configuration
+
+rp1-cfe-csi2-chX
+----------------
+
+The rp1-cfe-csi2-chX capture devices are normal V4L2 capture devices which
+can be used to capture video frames or metadata received from the CSI-2.
+
+rp1-cfe-fe-image0, rp1-cfe-fe-image1
+------------------------------------
+
+The rp1-cfe-fe-image0 and rp1-cfe-fe-image1 capture devices are used to write
+the processed frames to memory.
+
+rp1-cfe-fe-stats
+----------------
+
+The format of the FE statistics buffer is defined by
+:c:type:`pisp_statistics` C structure and the meaning of each parameter is
+described in the `PiSP specification` document.
+
+rp1-cfe-fe-config
+-----------------
+
+The format of the FE configuration buffer is defined by
+:c:type:`pisp_fe_config` C structure and the meaning of each parameter is
+described in the `PiSP specification` document.
diff --git a/Documentation/admin-guide/media/v4l-drivers.rst b/Documentation/admin-guide/media/v4l-drivers.rst
index b6af448b9fe9..61da154e079a 100644
--- a/Documentation/admin-guide/media/v4l-drivers.rst
+++ b/Documentation/admin-guide/media/v4l-drivers.rst
@@ -26,6 +26,7 @@ Video4Linux (V4L) driver-specific documentation
raspberrypi-pisp-be
rcar-fdp1
rkisp1
+ raspberrypi-rp1-cfe
saa7134
si470x
si4713
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
[not found] ` <20241003-rp1-cfe-v6-3-d6762edd98a8@ideasonboard.com>
@ 2024-10-24 8:20 ` Hans Verkuil
2024-10-24 11:08 ` Tomi Valkeinen
2024-10-28 9:21 ` Tomi Valkeinen
0 siblings, 2 replies; 21+ messages in thread
From: Hans Verkuil @ 2024-10-24 8:20 UTC (permalink / raw)
To: Tomi Valkeinen, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham
Hi Tomi,
I know this driver is already merged, but while checking for drivers that use
q->max_num_buffers I stumbled on this cfe code:
<snip>
> +/*
> + * vb2 ops
> + */
> +
> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> + unsigned int *nplanes, unsigned int sizes[],
> + struct device *alloc_devs[])
> +{
> + struct cfe_node *node = vb2_get_drv_priv(vq);
> + struct cfe_device *cfe = node->cfe;
> + unsigned int size = is_image_node(node) ?
> + node->vid_fmt.fmt.pix.sizeimage :
> + node->meta_fmt.fmt.meta.buffersize;
> +
> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
> + node->buffer_queue.type);
> +
> + if (vq->max_num_buffers + *nbuffers < 3)
> + *nbuffers = 3 - vq->max_num_buffers;
This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
is called. So 32 + *nbuffers is never < 3.
If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
for you.
Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
since the code is almost always wrong.
Regards,
Hans
> +
> + if (*nplanes) {
> + if (sizes[0] < size) {
> + cfe_err(cfe, "sizes[0] %i < size %u\n", sizes[0], size);
> + return -EINVAL;
> + }
> + size = sizes[0];
> + }
> +
> + *nplanes = 1;
> + sizes[0] = size;
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-24 8:20 ` [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE Hans Verkuil
@ 2024-10-24 11:08 ` Tomi Valkeinen
2024-10-24 11:21 ` Hans Verkuil
2024-10-28 9:21 ` Tomi Valkeinen
1 sibling, 1 reply; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-24 11:08 UTC (permalink / raw)
To: Hans Verkuil
Cc: Mauro Carvalho Chehab, Raspberry Pi Kernel Maintenance,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski, linux-media, linux-kernel, devicetree,
linux-rpi-kernel, linux-arm-kernel, Naushir Patuck,
Laurent Pinchart, Sakari Ailus, Jacopo Mondi, Kieran Bingham
Hi Hans,
On 24/10/2024 11:20, Hans Verkuil wrote:
> Hi Tomi,
>
> I know this driver is already merged, but while checking for drivers that use
> q->max_num_buffers I stumbled on this cfe code:
>
> <snip>
>
>> +/*
>> + * vb2 ops
>> + */
>> +
>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>> + unsigned int *nplanes, unsigned int sizes[],
>> + struct device *alloc_devs[])
>> +{
>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>> + struct cfe_device *cfe = node->cfe;
>> + unsigned int size = is_image_node(node) ?
>> + node->vid_fmt.fmt.pix.sizeimage :
>> + node->meta_fmt.fmt.meta.buffersize;
>> +
>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>> + node->buffer_queue.type);
>> +
>> + if (vq->max_num_buffers + *nbuffers < 3)
>> + *nbuffers = 3 - vq->max_num_buffers;
>
> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
> is called. So 32 + *nbuffers is never < 3.
>
> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
> for you.
>
> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
> since the code is almost always wrong.
Indeed, the code doesn't make sense. I have to say I don't know what was
the intent here, but I think "at least 3 buffers should be allocated by
REQBUFS" is the likely explanation.
I think the hardware should work with even just a single buffer, so is
it then fine to not set either q->min_queued_buffers nor
q->min_reqbufs_allocation before calling vb2_queue_init()? This seems to
result in REQBUFS giving at least two buffers.
Tomi
>
> Regards,
>
> Hans
>
>> +
>> + if (*nplanes) {
>> + if (sizes[0] < size) {
>> + cfe_err(cfe, "sizes[0] %i < size %u\n", sizes[0], size);
>> + return -EINVAL;
>> + }
>> + size = sizes[0];
>> + }
>> +
>> + *nplanes = 1;
>> + sizes[0] = size;
>> +
>> + return 0;
>> +}
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-24 11:08 ` Tomi Valkeinen
@ 2024-10-24 11:21 ` Hans Verkuil
2024-10-24 12:05 ` Laurent Pinchart
0 siblings, 1 reply; 21+ messages in thread
From: Hans Verkuil @ 2024-10-24 11:21 UTC (permalink / raw)
To: Tomi Valkeinen
Cc: Mauro Carvalho Chehab, Raspberry Pi Kernel Maintenance,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski, linux-media, linux-kernel, devicetree,
linux-rpi-kernel, linux-arm-kernel, Naushir Patuck,
Laurent Pinchart, Sakari Ailus, Jacopo Mondi, Kieran Bingham
On 24/10/2024 13:08, Tomi Valkeinen wrote:
> Hi Hans,
>
> On 24/10/2024 11:20, Hans Verkuil wrote:
>> Hi Tomi,
>>
>> I know this driver is already merged, but while checking for drivers that use
>> q->max_num_buffers I stumbled on this cfe code:
>>
>> <snip>
>>
>>> +/*
>>> + * vb2 ops
>>> + */
>>> +
>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>> + unsigned int *nplanes, unsigned int sizes[],
>>> + struct device *alloc_devs[])
>>> +{
>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>> + struct cfe_device *cfe = node->cfe;
>>> + unsigned int size = is_image_node(node) ?
>>> + node->vid_fmt.fmt.pix.sizeimage :
>>> + node->meta_fmt.fmt.meta.buffersize;
>>> +
>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>> + node->buffer_queue.type);
>>> +
>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>> + *nbuffers = 3 - vq->max_num_buffers;
>>
>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>> is called. So 32 + *nbuffers is never < 3.
>>
>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>> for you.
>>
>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>> since the code is almost always wrong.
>
> Indeed, the code doesn't make sense. I have to say I don't know what was the intent here, but I think "at least 3 buffers should be allocated by REQBUFS" is the likely explanation.
>
> I think the hardware should work with even just a single buffer, so is it then fine to not set either q->min_queued_buffers nor q->min_reqbufs_allocation before calling vb2_queue_init()? This seems to
> result in REQBUFS giving at least two buffers.
min_queued_buffers is really HW dependent. If not set, then start_streaming can be called even if there are no buffers queued.
If your hardware can handle that, then it's fine to not set it.
Regards,
Hans
>
> Tomi
>
>>
>> Regards,
>>
>> Hans
>>
>>> +
>>> + if (*nplanes) {
>>> + if (sizes[0] < size) {
>>> + cfe_err(cfe, "sizes[0] %i < size %u\n", sizes[0], size);
>>> + return -EINVAL;
>>> + }
>>> + size = sizes[0];
>>> + }
>>> +
>>> + *nplanes = 1;
>>> + sizes[0] = size;
>>> +
>>> + return 0;
>>> +}
>>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-24 11:21 ` Hans Verkuil
@ 2024-10-24 12:05 ` Laurent Pinchart
2024-10-24 12:13 ` Hans Verkuil
0 siblings, 1 reply; 21+ messages in thread
From: Laurent Pinchart @ 2024-10-24 12:05 UTC (permalink / raw)
To: Hans Verkuil
Cc: Tomi Valkeinen, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski, linux-media, linux-kernel, devicetree,
linux-rpi-kernel, linux-arm-kernel, Naushir Patuck, Sakari Ailus,
Jacopo Mondi, Kieran Bingham
On Thu, Oct 24, 2024 at 01:21:43PM +0200, Hans Verkuil wrote:
> On 24/10/2024 13:08, Tomi Valkeinen wrote:
> > On 24/10/2024 11:20, Hans Verkuil wrote:
> >> Hi Tomi,
> >>
> >> I know this driver is already merged, but while checking for drivers that use
> >> q->max_num_buffers I stumbled on this cfe code:
> >>
> >> <snip>
> >>
> >>> +/*
> >>> + * vb2 ops
> >>> + */
> >>> +
> >>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> >>> + unsigned int *nplanes, unsigned int sizes[],
> >>> + struct device *alloc_devs[])
> >>> +{
> >>> + struct cfe_node *node = vb2_get_drv_priv(vq);
> >>> + struct cfe_device *cfe = node->cfe;
> >>> + unsigned int size = is_image_node(node) ?
> >>> + node->vid_fmt.fmt.pix.sizeimage :
> >>> + node->meta_fmt.fmt.meta.buffersize;
> >>> +
> >>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
> >>> + node->buffer_queue.type);
> >>> +
> >>> + if (vq->max_num_buffers + *nbuffers < 3)
> >>> + *nbuffers = 3 - vq->max_num_buffers;
> >>
> >> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
> >> is called. So 32 + *nbuffers is never < 3.
> >>
> >> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
> >> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
> >> for you.
> >>
> >> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
> >> since the code is almost always wrong.
> >
> > Indeed, the code doesn't make sense. I have to say I don't know what
> > was the intent here, but I think "at least 3 buffers should be
> > allocated by REQBUFS" is the likely explanation.
> >
> > I think the hardware should work with even just a single buffer, so
> > is it then fine to not set either q->min_queued_buffers nor
> > q->min_reqbufs_allocation before calling vb2_queue_init()? This
> > seems to result in REQBUFS giving at least two buffers.
>
> min_queued_buffers is really HW dependent. If not set, then
> start_streaming can be called even if there are no buffers queued.
Having min_queued_buffers > 1 is bad for userspace, and it's much nicer
to have it set to 0. The main issue with a value of 1 is that validation
of the pipeline ends up being deferred to the first QBUF if it occurs
after STREAMON, and error handling is then complicated.
It's not just a property of the hardware, kernel drivers can decide to
work with scratch buffers if needed. In many cases, a scratch buffer
allocated by the kernel could be very small, either relying on the same
physical page being mapped through the IOMMU to a larger DMA space, or
using a 0 stride value to write all lines to the same location.
For drivers supported by libcamera, we will require min_queued_buffers
<= 1 and may tighten that to == 0. Tomi, if you submit a patch, please
try to target 0, and if that's too much work for now, set it to 1 at
most.
> If your hardware can handle that, then it's fine to not set it.
>
> >>> +
> >>> + if (*nplanes) {
> >>> + if (sizes[0] < size) {
> >>> + cfe_err(cfe, "sizes[0] %i < size %u\n", sizes[0], size);
> >>> + return -EINVAL;
> >>> + }
> >>> + size = sizes[0];
> >>> + }
> >>> +
> >>> + *nplanes = 1;
> >>> + sizes[0] = size;
> >>> +
> >>> + return 0;
> >>> +}
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-24 12:05 ` Laurent Pinchart
@ 2024-10-24 12:13 ` Hans Verkuil
2024-10-24 12:26 ` Laurent Pinchart
0 siblings, 1 reply; 21+ messages in thread
From: Hans Verkuil @ 2024-10-24 12:13 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Tomi Valkeinen, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski, linux-media, linux-kernel, devicetree,
linux-rpi-kernel, linux-arm-kernel, Naushir Patuck, Sakari Ailus,
Jacopo Mondi, Kieran Bingham
On 24/10/2024 14:05, Laurent Pinchart wrote:
> On Thu, Oct 24, 2024 at 01:21:43PM +0200, Hans Verkuil wrote:
>> On 24/10/2024 13:08, Tomi Valkeinen wrote:
>>> On 24/10/2024 11:20, Hans Verkuil wrote:
>>>> Hi Tomi,
>>>>
>>>> I know this driver is already merged, but while checking for drivers that use
>>>> q->max_num_buffers I stumbled on this cfe code:
>>>>
>>>> <snip>
>>>>
>>>>> +/*
>>>>> + * vb2 ops
>>>>> + */
>>>>> +
>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>>>> + unsigned int *nplanes, unsigned int sizes[],
>>>>> + struct device *alloc_devs[])
>>>>> +{
>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>>>> + struct cfe_device *cfe = node->cfe;
>>>>> + unsigned int size = is_image_node(node) ?
>>>>> + node->vid_fmt.fmt.pix.sizeimage :
>>>>> + node->meta_fmt.fmt.meta.buffersize;
>>>>> +
>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>>>> + node->buffer_queue.type);
>>>>> +
>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>>>> + *nbuffers = 3 - vq->max_num_buffers;
>>>>
>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>>>> is called. So 32 + *nbuffers is never < 3.
>>>>
>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>>>> for you.
>>>>
>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>>>> since the code is almost always wrong.
>>>
>>> Indeed, the code doesn't make sense. I have to say I don't know what
>>> was the intent here, but I think "at least 3 buffers should be
>>> allocated by REQBUFS" is the likely explanation.
>>>
>>> I think the hardware should work with even just a single buffer, so
>>> is it then fine to not set either q->min_queued_buffers nor
>>> q->min_reqbufs_allocation before calling vb2_queue_init()? This
>>> seems to result in REQBUFS giving at least two buffers.
>>
>> min_queued_buffers is really HW dependent. If not set, then
>> start_streaming can be called even if there are no buffers queued.
>
> Having min_queued_buffers > 1 is bad for userspace, and it's much nicer
> to have it set to 0. The main issue with a value of 1 is that validation
> of the pipeline ends up being deferred to the first QBUF if it occurs
> after STREAMON, and error handling is then complicated.
The validation can be done in the prepare_streaming callback instead of
in start_streaming. prepare_streaming is called at STREAMON time.
>
> It's not just a property of the hardware, kernel drivers can decide to
> work with scratch buffers if needed. In many cases, a scratch buffer
> allocated by the kernel could be very small, either relying on the same
> physical page being mapped through the IOMMU to a larger DMA space, or
> using a 0 stride value to write all lines to the same location.
None of which is possible for some of the older drivers (e.g. TI davinci)
that do not have an IOMMU and require two contiguous buffers before they can
start streaming. But for modern devices you can solve it through a scratch
buffer, that's true.
>
> For drivers supported by libcamera, we will require min_queued_buffers
> <= 1 and may tighten that to == 0. Tomi, if you submit a patch, please
> try to target 0, and if that's too much work for now, set it to 1 at
> most.
Regards,
Hans
>
>> If your hardware can handle that, then it's fine to not set it.
>>
>>>>> +
>>>>> + if (*nplanes) {
>>>>> + if (sizes[0] < size) {
>>>>> + cfe_err(cfe, "sizes[0] %i < size %u\n", sizes[0], size);
>>>>> + return -EINVAL;
>>>>> + }
>>>>> + size = sizes[0];
>>>>> + }
>>>>> +
>>>>> + *nplanes = 1;
>>>>> + sizes[0] = size;
>>>>> +
>>>>> + return 0;
>>>>> +}
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-24 12:13 ` Hans Verkuil
@ 2024-10-24 12:26 ` Laurent Pinchart
0 siblings, 0 replies; 21+ messages in thread
From: Laurent Pinchart @ 2024-10-24 12:26 UTC (permalink / raw)
To: Hans Verkuil
Cc: Tomi Valkeinen, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski, linux-media, linux-kernel, devicetree,
linux-rpi-kernel, linux-arm-kernel, Naushir Patuck, Sakari Ailus,
Jacopo Mondi, Kieran Bingham
On Thu, Oct 24, 2024 at 02:13:07PM +0200, Hans Verkuil wrote:
> On 24/10/2024 14:05, Laurent Pinchart wrote:
> > On Thu, Oct 24, 2024 at 01:21:43PM +0200, Hans Verkuil wrote:
> >> On 24/10/2024 13:08, Tomi Valkeinen wrote:
> >>> On 24/10/2024 11:20, Hans Verkuil wrote:
> >>>> Hi Tomi,
> >>>>
> >>>> I know this driver is already merged, but while checking for drivers that use
> >>>> q->max_num_buffers I stumbled on this cfe code:
> >>>>
> >>>> <snip>
> >>>>
> >>>>> +/*
> >>>>> + * vb2 ops
> >>>>> + */
> >>>>> +
> >>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> >>>>> + unsigned int *nplanes, unsigned int sizes[],
> >>>>> + struct device *alloc_devs[])
> >>>>> +{
> >>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
> >>>>> + struct cfe_device *cfe = node->cfe;
> >>>>> + unsigned int size = is_image_node(node) ?
> >>>>> + node->vid_fmt.fmt.pix.sizeimage :
> >>>>> + node->meta_fmt.fmt.meta.buffersize;
> >>>>> +
> >>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
> >>>>> + node->buffer_queue.type);
> >>>>> +
> >>>>> + if (vq->max_num_buffers + *nbuffers < 3)
> >>>>> + *nbuffers = 3 - vq->max_num_buffers;
> >>>>
> >>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
> >>>> is called. So 32 + *nbuffers is never < 3.
> >>>>
> >>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
> >>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
> >>>> for you.
> >>>>
> >>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
> >>>> since the code is almost always wrong.
> >>>
> >>> Indeed, the code doesn't make sense. I have to say I don't know what
> >>> was the intent here, but I think "at least 3 buffers should be
> >>> allocated by REQBUFS" is the likely explanation.
> >>>
> >>> I think the hardware should work with even just a single buffer, so
> >>> is it then fine to not set either q->min_queued_buffers nor
> >>> q->min_reqbufs_allocation before calling vb2_queue_init()? This
> >>> seems to result in REQBUFS giving at least two buffers.
> >>
> >> min_queued_buffers is really HW dependent. If not set, then
> >> start_streaming can be called even if there are no buffers queued.
> >
> > Having min_queued_buffers > 1 is bad for userspace, and it's much nicer
> > to have it set to 0. The main issue with a value of 1 is that validation
> > of the pipeline ends up being deferred to the first QBUF if it occurs
> > after STREAMON, and error handling is then complicated.
>
> The validation can be done in the prepare_streaming callback instead of
> in start_streaming. prepare_streaming is called at STREAMON time.
True, we have that now, I keep forgetting about it. I've noticed the
annoying behaviour with drivers that predate .prepare_streaming(). It
should be a requirement to validate the pipeline in .prepare_streaming()
(or possibly earlier in some cases, not sure).
> > It's not just a property of the hardware, kernel drivers can decide to
> > work with scratch buffers if needed. In many cases, a scratch buffer
> > allocated by the kernel could be very small, either relying on the same
> > physical page being mapped through the IOMMU to a larger DMA space, or
> > using a 0 stride value to write all lines to the same location.
>
> None of which is possible for some of the older drivers (e.g. TI davinci)
> that do not have an IOMMU and require two contiguous buffers before they can
> start streaming. But for modern devices you can solve it through a scratch
> buffer, that's true.
In the worst case kernel drivers could allocate a full scratch buffer.
That's not nice from a memory consumption point of view of course. The
stride hack may have worked with TI davinci, I haven't checked. It's
something that driver authors don't usually think about even if the
hardware can do it. Even if the device supports a 0 stride, it also
needs to support programming the stride and the buffer addresses in a
race-free way.
> > For drivers supported by libcamera, we will require min_queued_buffers
> > <= 1 and may tighten that to == 0. Tomi, if you submit a patch, please
> > try to target 0, and if that's too much work for now, set it to 1 at
> > most.
>
> Regards,
>
> Hans
>
> >> If your hardware can handle that, then it's fine to not set it.
> >>
> >>>>> +
> >>>>> + if (*nplanes) {
> >>>>> + if (sizes[0] < size) {
> >>>>> + cfe_err(cfe, "sizes[0] %i < size %u\n", sizes[0], size);
> >>>>> + return -EINVAL;
> >>>>> + }
> >>>>> + size = sizes[0];
> >>>>> + }
> >>>>> +
> >>>>> + *nplanes = 1;
> >>>>> + sizes[0] = size;
> >>>>> +
> >>>>> + return 0;
> >>>>> +}
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-24 8:20 ` [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE Hans Verkuil
2024-10-24 11:08 ` Tomi Valkeinen
@ 2024-10-28 9:21 ` Tomi Valkeinen
2024-10-28 10:11 ` Hans Verkuil
1 sibling, 1 reply; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-28 9:21 UTC (permalink / raw)
To: Hans Verkuil
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Hi,
On 24/10/2024 11:20, Hans Verkuil wrote:
> Hi Tomi,
>
> I know this driver is already merged, but while checking for drivers that use
> q->max_num_buffers I stumbled on this cfe code:
>
> <snip>
>
>> +/*
>> + * vb2 ops
>> + */
>> +
>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>> + unsigned int *nplanes, unsigned int sizes[],
>> + struct device *alloc_devs[])
>> +{
>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>> + struct cfe_device *cfe = node->cfe;
>> + unsigned int size = is_image_node(node) ?
>> + node->vid_fmt.fmt.pix.sizeimage :
>> + node->meta_fmt.fmt.meta.buffersize;
>> +
>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>> + node->buffer_queue.type);
>> +
>> + if (vq->max_num_buffers + *nbuffers < 3)
>> + *nbuffers = 3 - vq->max_num_buffers;
>
> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
> is called. So 32 + *nbuffers is never < 3.
>
> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
> for you.
>
> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
> since the code is almost always wrong.
Looking at this, the original code in the old BSP tree was, which
somehow, along the long way, got turned into the above:
if (vq->num_buffers + *nbuffers < 3)
*nbuffers = 3 - vq->num_buffers;
So... I think that is the same as "q->min_reqbufs_allocation = 3"?
The distinction between min_queued_buffers and min_reqbufs_allocation,
or rather the need for the latter, still escapes me. If the HW/SW
requires N buffers to be queued, why would we require allocating more
than N buffers?
Tomi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 9:21 ` Tomi Valkeinen
@ 2024-10-28 10:11 ` Hans Verkuil
2024-10-28 11:05 ` Tomi Valkeinen
0 siblings, 1 reply; 21+ messages in thread
From: Hans Verkuil @ 2024-10-28 10:11 UTC (permalink / raw)
To: Tomi Valkeinen
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
On 28/10/2024 10:21, Tomi Valkeinen wrote:
> Hi,
>
> On 24/10/2024 11:20, Hans Verkuil wrote:
>> Hi Tomi,
>>
>> I know this driver is already merged, but while checking for drivers that use
>> q->max_num_buffers I stumbled on this cfe code:
>>
>> <snip>
>>
>>> +/*
>>> + * vb2 ops
>>> + */
>>> +
>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>> + unsigned int *nplanes, unsigned int sizes[],
>>> + struct device *alloc_devs[])
>>> +{
>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>> + struct cfe_device *cfe = node->cfe;
>>> + unsigned int size = is_image_node(node) ?
>>> + node->vid_fmt.fmt.pix.sizeimage :
>>> + node->meta_fmt.fmt.meta.buffersize;
>>> +
>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>> + node->buffer_queue.type);
>>> +
>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>> + *nbuffers = 3 - vq->max_num_buffers;
>>
>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>> is called. So 32 + *nbuffers is never < 3.
>>
>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>> for you.
>>
>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>> since the code is almost always wrong.
>
> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
>
> if (vq->num_buffers + *nbuffers < 3)
> *nbuffers = 3 - vq->num_buffers;
>
> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
>
> The distinction between min_queued_buffers and min_reqbufs_allocation, or rather the need for the latter, still escapes me. If the HW/SW requires N buffers to be queued, why would we require
> allocating more than N buffers?
min_queued_buffers is easiest to explain: that represents the requirements of the DMA
engine, i.e. how many buffers much be queued before the DMA engine can be started.
Typically it is 0, 1 or 2.
min_reqbufs_allocation is the minimum number of buffers that will be allocated when
calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
or dropping frames.
Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
and the third is being processed by userspace. But sometimes drivers have other
requirements.
The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
"It can be smaller than the number requested, even zero, when the driver runs out of
free memory. A larger number is also possible when the driver requires more buffers
to function correctly."
How drivers implement this is a mess, and usually the code in the driver is wrong as
well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
with this ioctl the number of buffers will never be more than requested, although it
may be less if you run out of memory.
I really should go through all existing drivers and fix them up if they try to
handle this in the queue_setup function, I suspect a lot of them are quite messy.
One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
to use CREATE_BUFS you need that information so you know that you have to create
at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
but it is effectively codec specific. This probably should be clarified.
I wonder if it wouldn't be better to add a min_num_buffers field to
struct v4l2_create_buffers and set it to min_queued_buffers + 1.
Regards,
Hans
>
> Tomi
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 10:11 ` Hans Verkuil
@ 2024-10-28 11:05 ` Tomi Valkeinen
2024-10-28 11:13 ` Hans Verkuil
0 siblings, 1 reply; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-28 11:05 UTC (permalink / raw)
To: Hans Verkuil
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Hi Hans,
On 28/10/2024 12:11, Hans Verkuil wrote:
> On 28/10/2024 10:21, Tomi Valkeinen wrote:
>> Hi,
>>
>> On 24/10/2024 11:20, Hans Verkuil wrote:
>>> Hi Tomi,
>>>
>>> I know this driver is already merged, but while checking for drivers that use
>>> q->max_num_buffers I stumbled on this cfe code:
>>>
>>> <snip>
>>>
>>>> +/*
>>>> + * vb2 ops
>>>> + */
>>>> +
>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>>> + unsigned int *nplanes, unsigned int sizes[],
>>>> + struct device *alloc_devs[])
>>>> +{
>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>>> + struct cfe_device *cfe = node->cfe;
>>>> + unsigned int size = is_image_node(node) ?
>>>> + node->vid_fmt.fmt.pix.sizeimage :
>>>> + node->meta_fmt.fmt.meta.buffersize;
>>>> +
>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>>> + node->buffer_queue.type);
>>>> +
>>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>>> + *nbuffers = 3 - vq->max_num_buffers;
>>>
>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>>> is called. So 32 + *nbuffers is never < 3.
>>>
>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>>> for you.
>>>
>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>>> since the code is almost always wrong.
>>
>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
>>
>> if (vq->num_buffers + *nbuffers < 3)
>> *nbuffers = 3 - vq->num_buffers;
>>
>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
>>
>> The distinction between min_queued_buffers and min_reqbufs_allocation, or rather the need for the latter, still escapes me. If the HW/SW requires N buffers to be queued, why would we require
>> allocating more than N buffers?
>
> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
> engine, i.e. how many buffers much be queued before the DMA engine can be started.
> Typically it is 0, 1 or 2.
>
> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
> or dropping frames.
>
> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
> and the third is being processed by userspace. But sometimes drivers have other
> requirements.
>
> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
>
> "It can be smaller than the number requested, even zero, when the driver runs out of
> free memory. A larger number is also possible when the driver requires more buffers
> to function correctly."
>
> How drivers implement this is a mess, and usually the code in the driver is wrong as
> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
Thanks, this was educational!
So. If I have a driver that has min_queued_buffers = 1, I can use
VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture just
one buffer, right? Whereas VIDIOC_REQBUFS would give me (probably) three
(or two, if the driver does not set min_reqbufs_allocation). Three
buffers makes sense for full streaming, of course.
> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
> with this ioctl the number of buffers will never be more than requested, although it
> may be less if you run out of memory.
>
> I really should go through all existing drivers and fix them up if they try to
> handle this in the queue_setup function, I suspect a lot of them are quite messy.
>
> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
Hmm, so what I wrote above is not correct? One needs min_queued_buffers
+ 1? Why is that?
> to use CREATE_BUFS you need that information so you know that you have to create
> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
> but it is effectively codec specific. This probably should be clarified.
>
> I wonder if it wouldn't be better to add a min_num_buffers field to
> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
I think this makes sense (although I still don't get the +1).
However, based on the experiences from adding the streams features to
various ioctls, let's be very careful =). The new 'min_num_buffers' can
be filled with garbage by the userspace. If we define the
'min_num_buffers' field to be always filled by the kernel, and any value
provided from the userspace to be ignored, I think it should work.
Tomi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 11:05 ` Tomi Valkeinen
@ 2024-10-28 11:13 ` Hans Verkuil
2024-10-28 11:25 ` Tomi Valkeinen
0 siblings, 1 reply; 21+ messages in thread
From: Hans Verkuil @ 2024-10-28 11:13 UTC (permalink / raw)
To: Tomi Valkeinen
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
On 28/10/2024 12:05, Tomi Valkeinen wrote:
> Hi Hans,
>
> On 28/10/2024 12:11, Hans Verkuil wrote:
>> On 28/10/2024 10:21, Tomi Valkeinen wrote:
>>> Hi,
>>>
>>> On 24/10/2024 11:20, Hans Verkuil wrote:
>>>> Hi Tomi,
>>>>
>>>> I know this driver is already merged, but while checking for drivers that use
>>>> q->max_num_buffers I stumbled on this cfe code:
>>>>
>>>> <snip>
>>>>
>>>>> +/*
>>>>> + * vb2 ops
>>>>> + */
>>>>> +
>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>>>> + unsigned int *nplanes, unsigned int sizes[],
>>>>> + struct device *alloc_devs[])
>>>>> +{
>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>>>> + struct cfe_device *cfe = node->cfe;
>>>>> + unsigned int size = is_image_node(node) ?
>>>>> + node->vid_fmt.fmt.pix.sizeimage :
>>>>> + node->meta_fmt.fmt.meta.buffersize;
>>>>> +
>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>>>> + node->buffer_queue.type);
>>>>> +
>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>>>> + *nbuffers = 3 - vq->max_num_buffers;
>>>>
>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>>>> is called. So 32 + *nbuffers is never < 3.
>>>>
>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>>>> for you.
>>>>
>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>>>> since the code is almost always wrong.
>>>
>>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
>>>
>>> if (vq->num_buffers + *nbuffers < 3)
>>> *nbuffers = 3 - vq->num_buffers;
>>>
>>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
>>>
>>> The distinction between min_queued_buffers and min_reqbufs_allocation, or rather the need for the latter, still escapes me. If the HW/SW requires N buffers to be queued, why would we require
>>> allocating more than N buffers?
>>
>> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
>> engine, i.e. how many buffers much be queued before the DMA engine can be started.
>> Typically it is 0, 1 or 2.
>>
>> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
>> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
>> or dropping frames.
>>
>> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
>> and the third is being processed by userspace. But sometimes drivers have other
>> requirements.
>>
>> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
>> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
>> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
>>
>> "It can be smaller than the number requested, even zero, when the driver runs out of
>> free memory. A larger number is also possible when the driver requires more buffers
>> to function correctly."
>>
>> How drivers implement this is a mess, and usually the code in the driver is wrong as
>> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
>> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
>
> Thanks, this was educational!
>
> So. If I have a driver that has min_queued_buffers = 1, I can use VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture just one buffer, right? Whereas VIDIOC_REQBUFS would give me
> (probably) three (or two, if the driver does not set min_reqbufs_allocation). Three buffers makes sense for full streaming, of course.
>
>> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
>> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
>> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
>> with this ioctl the number of buffers will never be more than requested, although it
>> may be less if you run out of memory.
>>
>> I really should go through all existing drivers and fix them up if they try to
>> handle this in the queue_setup function, I suspect a lot of them are quite messy.
>>
>> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
>> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
>
> Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?
The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
then it can never return a buffer to userspace! So you need one more. That's the absolute
minimum. For smooth capture you need two more to allow time for userspace to process the
buffer.
>
>> to use CREATE_BUFS you need that information so you know that you have to create
>> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
>> but it is effectively codec specific. This probably should be clarified.
>>
>> I wonder if it wouldn't be better to add a min_num_buffers field to
>> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
>
> I think this makes sense (although I still don't get the +1).
>
> However, based on the experiences from adding the streams features to various ioctls, let's be very careful =). The new 'min_num_buffers' can be filled with garbage by the userspace. If we define the
> 'min_num_buffers' field to be always filled by the kernel, and any value provided from the userspace to be ignored, I think it should work.
I've posted an RFC for this.
Regards,
Hans
>
> Tomi
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 11:13 ` Hans Verkuil
@ 2024-10-28 11:25 ` Tomi Valkeinen
2024-10-28 11:30 ` Hans Verkuil
0 siblings, 1 reply; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-28 11:25 UTC (permalink / raw)
To: Hans Verkuil
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Hi,
On 28/10/2024 13:13, Hans Verkuil wrote:
> On 28/10/2024 12:05, Tomi Valkeinen wrote:
>> Hi Hans,
>>
>> On 28/10/2024 12:11, Hans Verkuil wrote:
>>> On 28/10/2024 10:21, Tomi Valkeinen wrote:
>>>> Hi,
>>>>
>>>> On 24/10/2024 11:20, Hans Verkuil wrote:
>>>>> Hi Tomi,
>>>>>
>>>>> I know this driver is already merged, but while checking for drivers that use
>>>>> q->max_num_buffers I stumbled on this cfe code:
>>>>>
>>>>> <snip>
>>>>>
>>>>>> +/*
>>>>>> + * vb2 ops
>>>>>> + */
>>>>>> +
>>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>>>>> + unsigned int *nplanes, unsigned int sizes[],
>>>>>> + struct device *alloc_devs[])
>>>>>> +{
>>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>>>>> + struct cfe_device *cfe = node->cfe;
>>>>>> + unsigned int size = is_image_node(node) ?
>>>>>> + node->vid_fmt.fmt.pix.sizeimage :
>>>>>> + node->meta_fmt.fmt.meta.buffersize;
>>>>>> +
>>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>>>>> + node->buffer_queue.type);
>>>>>> +
>>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>>>>> + *nbuffers = 3 - vq->max_num_buffers;
>>>>>
>>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>>>>> is called. So 32 + *nbuffers is never < 3.
>>>>>
>>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>>>>> for you.
>>>>>
>>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>>>>> since the code is almost always wrong.
>>>>
>>>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
>>>>
>>>> if (vq->num_buffers + *nbuffers < 3)
>>>> *nbuffers = 3 - vq->num_buffers;
>>>>
>>>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
>>>>
>>>> The distinction between min_queued_buffers and min_reqbufs_allocation, or rather the need for the latter, still escapes me. If the HW/SW requires N buffers to be queued, why would we require
>>>> allocating more than N buffers?
>>>
>>> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
>>> engine, i.e. how many buffers much be queued before the DMA engine can be started.
>>> Typically it is 0, 1 or 2.
>>>
>>> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
>>> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
>>> or dropping frames.
>>>
>>> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
>>> and the third is being processed by userspace. But sometimes drivers have other
>>> requirements.
>>>
>>> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
>>> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
>>> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
>>>
>>> "It can be smaller than the number requested, even zero, when the driver runs out of
>>> free memory. A larger number is also possible when the driver requires more buffers
>>> to function correctly."
>>>
>>> How drivers implement this is a mess, and usually the code in the driver is wrong as
>>> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
>>> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
>>
>> Thanks, this was educational!
>>
>> So. If I have a driver that has min_queued_buffers = 1, I can use VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture just one buffer, right? Whereas VIDIOC_REQBUFS would give me
>> (probably) three (or two, if the driver does not set min_reqbufs_allocation). Three buffers makes sense for full streaming, of course.
>>
>>> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
>>> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
>>> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
>>> with this ioctl the number of buffers will never be more than requested, although it
>>> may be less if you run out of memory.
>>>
>>> I really should go through all existing drivers and fix them up if they try to
>>> handle this in the queue_setup function, I suspect a lot of them are quite messy.
>>>
>>> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
>>> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
>>
>> Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?
>
> The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
> then it can never return a buffer to userspace! So you need one more. That's the absolute
> minimum. For smooth capture you need two more to allow time for userspace to process the
> buffer.
Hmm, ok, I see. Well, I guess my "I want to capture just a single frame"
is not a very common case.
Can I queue one buffer, start streaming, stop streaming, and get the
filled buffer? But then I guess I don't when the buffer has been filled,
i.e. when to call stop streaming.
So, never mind, I don't actually have any use case for this, just wondering.
>>
>>> to use CREATE_BUFS you need that information so you know that you have to create
>>> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
>>> but it is effectively codec specific. This probably should be clarified.
>>>
>>> I wonder if it wouldn't be better to add a min_num_buffers field to
>>> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
>>
>> I think this makes sense (although I still don't get the +1).
>>
>> However, based on the experiences from adding the streams features to various ioctls, let's be very careful =). The new 'min_num_buffers' can be filled with garbage by the userspace. If we define the
>> 'min_num_buffers' field to be always filled by the kernel, and any value provided from the userspace to be ignored, I think it should work.
>
> I've posted an RFC for this.
Thanks, I'll check it out.
For the original issue in this thread, I think the correct fix is to
remove the lines from cfe_queue_setup(), and add
"q->min_reqbufs_allocation = 3".
I'll send a patch for that.
Tomi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 11:25 ` Tomi Valkeinen
@ 2024-10-28 11:30 ` Hans Verkuil
2024-10-28 15:17 ` Laurent Pinchart
0 siblings, 1 reply; 21+ messages in thread
From: Hans Verkuil @ 2024-10-28 11:30 UTC (permalink / raw)
To: Tomi Valkeinen
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Laurent Pinchart, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
On 28/10/2024 12:25, Tomi Valkeinen wrote:
> Hi,
>
> On 28/10/2024 13:13, Hans Verkuil wrote:
>> On 28/10/2024 12:05, Tomi Valkeinen wrote:
>>> Hi Hans,
>>>
>>> On 28/10/2024 12:11, Hans Verkuil wrote:
>>>> On 28/10/2024 10:21, Tomi Valkeinen wrote:
>>>>> Hi,
>>>>>
>>>>> On 24/10/2024 11:20, Hans Verkuil wrote:
>>>>>> Hi Tomi,
>>>>>>
>>>>>> I know this driver is already merged, but while checking for drivers that use
>>>>>> q->max_num_buffers I stumbled on this cfe code:
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> +/*
>>>>>>> + * vb2 ops
>>>>>>> + */
>>>>>>> +
>>>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>>>>>> + unsigned int *nplanes, unsigned int sizes[],
>>>>>>> + struct device *alloc_devs[])
>>>>>>> +{
>>>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>>>>>> + struct cfe_device *cfe = node->cfe;
>>>>>>> + unsigned int size = is_image_node(node) ?
>>>>>>> + node->vid_fmt.fmt.pix.sizeimage :
>>>>>>> + node->meta_fmt.fmt.meta.buffersize;
>>>>>>> +
>>>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>>>>>> + node->buffer_queue.type);
>>>>>>> +
>>>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>>>>>> + *nbuffers = 3 - vq->max_num_buffers;
>>>>>>
>>>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>>>>>> is called. So 32 + *nbuffers is never < 3.
>>>>>>
>>>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>>>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>>>>>> for you.
>>>>>>
>>>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>>>>>> since the code is almost always wrong.
>>>>>
>>>>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
>>>>>
>>>>> if (vq->num_buffers + *nbuffers < 3)
>>>>> *nbuffers = 3 - vq->num_buffers;
>>>>>
>>>>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
>>>>>
>>>>> The distinction between min_queued_buffers and min_reqbufs_allocation, or rather the need for the latter, still escapes me. If the HW/SW requires N buffers to be queued, why would we require
>>>>> allocating more than N buffers?
>>>>
>>>> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
>>>> engine, i.e. how many buffers much be queued before the DMA engine can be started.
>>>> Typically it is 0, 1 or 2.
>>>>
>>>> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
>>>> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
>>>> or dropping frames.
>>>>
>>>> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
>>>> and the third is being processed by userspace. But sometimes drivers have other
>>>> requirements.
>>>>
>>>> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
>>>> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
>>>> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
>>>>
>>>> "It can be smaller than the number requested, even zero, when the driver runs out of
>>>> free memory. A larger number is also possible when the driver requires more buffers
>>>> to function correctly."
>>>>
>>>> How drivers implement this is a mess, and usually the code in the driver is wrong as
>>>> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
>>>> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
>>>
>>> Thanks, this was educational!
>>>
>>> So. If I have a driver that has min_queued_buffers = 1, I can use VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture just one buffer, right? Whereas VIDIOC_REQBUFS would give me
>>> (probably) three (or two, if the driver does not set min_reqbufs_allocation). Three buffers makes sense for full streaming, of course.
>>>
>>>> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
>>>> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
>>>> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
>>>> with this ioctl the number of buffers will never be more than requested, although it
>>>> may be less if you run out of memory.
>>>>
>>>> I really should go through all existing drivers and fix them up if they try to
>>>> handle this in the queue_setup function, I suspect a lot of them are quite messy.
>>>>
>>>> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
>>>> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
>>>
>>> Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?
>>
>> The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
>> then it can never return a buffer to userspace! So you need one more. That's the absolute
>> minimum. For smooth capture you need two more to allow time for userspace to process the
>> buffer.
>
> Hmm, ok, I see. Well, I guess my "I want to capture just a single frame" is not a very common case.
>
> Can I queue one buffer, start streaming, stop streaming, and get the filled buffer? But then I guess I don't when the buffer has been filled, i.e. when to call stop streaming.
Exactly. If you really want that, then the driver has to be adapted in the way that Laurent
suggested, i.e. with one or more scratch buffers. But that is not always possible, esp. with
older hardware without an IOMMU.
Regards,
Hans
>
> So, never mind, I don't actually have any use case for this, just wondering.
>
>>>
>>>> to use CREATE_BUFS you need that information so you know that you have to create
>>>> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
>>>> but it is effectively codec specific. This probably should be clarified.
>>>>
>>>> I wonder if it wouldn't be better to add a min_num_buffers field to
>>>> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
>>>
>>> I think this makes sense (although I still don't get the +1).
>>>
>>> However, based on the experiences from adding the streams features to various ioctls, let's be very careful =). The new 'min_num_buffers' can be filled with garbage by the userspace. If we define the
>>> 'min_num_buffers' field to be always filled by the kernel, and any value provided from the userspace to be ignored, I think it should work.
>>
>> I've posted an RFC for this.
>
> Thanks, I'll check it out.
>
> For the original issue in this thread, I think the correct fix is to remove the lines from cfe_queue_setup(), and add "q->min_reqbufs_allocation = 3".
>
> I'll send a patch for that.
>
> Tomi
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 11:30 ` Hans Verkuil
@ 2024-10-28 15:17 ` Laurent Pinchart
2024-10-28 15:32 ` Tomi Valkeinen
0 siblings, 1 reply; 21+ messages in thread
From: Laurent Pinchart @ 2024-10-28 15:17 UTC (permalink / raw)
To: Hans Verkuil
Cc: Tomi Valkeinen, linux-media, linux-kernel, devicetree,
linux-rpi-kernel, linux-arm-kernel, Naushir Patuck, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
On Mon, Oct 28, 2024 at 12:30:45PM +0100, Hans Verkuil wrote:
> On 28/10/2024 12:25, Tomi Valkeinen wrote:
> > On 28/10/2024 13:13, Hans Verkuil wrote:
> >> On 28/10/2024 12:05, Tomi Valkeinen wrote:
> >>> On 28/10/2024 12:11, Hans Verkuil wrote:
> >>>> On 28/10/2024 10:21, Tomi Valkeinen wrote:
> >>>>> On 24/10/2024 11:20, Hans Verkuil wrote:
> >>>>>> Hi Tomi,
> >>>>>>
> >>>>>> I know this driver is already merged, but while checking for drivers that use
> >>>>>> q->max_num_buffers I stumbled on this cfe code:
> >>>>>>
> >>>>>> <snip>
> >>>>>>
> >>>>>>> +/*
> >>>>>>> + * vb2 ops
> >>>>>>> + */
> >>>>>>> +
> >>>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> >>>>>>> + unsigned int *nplanes, unsigned int sizes[],
> >>>>>>> + struct device *alloc_devs[])
> >>>>>>> +{
> >>>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
> >>>>>>> + struct cfe_device *cfe = node->cfe;
> >>>>>>> + unsigned int size = is_image_node(node) ?
> >>>>>>> + node->vid_fmt.fmt.pix.sizeimage :
> >>>>>>> + node->meta_fmt.fmt.meta.buffersize;
> >>>>>>> +
> >>>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
> >>>>>>> + node->buffer_queue.type);
> >>>>>>> +
> >>>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
> >>>>>>> + *nbuffers = 3 - vq->max_num_buffers;
> >>>>>>
> >>>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
> >>>>>> is called. So 32 + *nbuffers is never < 3.
> >>>>>>
> >>>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
> >>>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
> >>>>>> for you.
> >>>>>>
> >>>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
> >>>>>> since the code is almost always wrong.
> >>>>>
> >>>>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
> >>>>>
> >>>>> if (vq->num_buffers + *nbuffers < 3)
> >>>>> *nbuffers = 3 - vq->num_buffers;
> >>>>>
> >>>>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
> >>>>>
> >>>>> The distinction between min_queued_buffers and
> >>>>> min_reqbufs_allocation, or rather the need for the latter, still
> >>>>> escapes me. If the HW/SW requires N buffers to be queued, why
> >>>>> would we require allocating more than N buffers?
> >>>>
> >>>> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
> >>>> engine, i.e. how many buffers much be queued before the DMA engine can be started.
> >>>> Typically it is 0, 1 or 2.
That's partly true only. Even if the hardware requires 2 buffers, a
driver can allocate scratch buffers to lower the requirement for
userspace. Setting min_queued_buffers to 1 is usually fine, as there are
few use cases for userspace to start the hardware before a buffer is
available to capture a frame to. A value of 2 is much more problematic,
as it prevents operating with a single buffer. I know using a single
buffer results in frame drops, but there are resource-constrained
systems where application don't always need all the frames (such as the
Raspberry Pi Zero for instance). I very strongly encourage drivers to
never set a min_queued_buffers value higher than 1.
> >>>>
> >>>> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
> >>>> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
> >>>> or dropping frames.
> >>>>
> >>>> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
> >>>> and the third is being processed by userspace. But sometimes drivers have other
> >>>> requirements.
This is exactly why I dislike min_reqbufs_allocation when set based on
this logic, it encodes assumption on userspace use cases that a capture
driver really shouldn't make.
> >>>>
> >>>> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
> >>>> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
> >>>> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
> >>>>
> >>>> "It can be smaller than the number requested, even zero, when the driver runs out of
> >>>> free memory. A larger number is also possible when the driver requires more buffers
> >>>> to function correctly."
> >>>>
> >>>> How drivers implement this is a mess, and usually the code in the driver is wrong as
> >>>> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
> >>>> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
> >>>
> >>> Thanks, this was educational!
> >>>
> >>> So. If I have a driver that has min_queued_buffers = 1, I can use
> >>> VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture
> >>> just one buffer, right? Whereas VIDIOC_REQBUFS would give me
> >>> (probably) three (or two, if the driver does not set
> >>> min_reqbufs_allocation). Three buffers makes sense for full
> >>> streaming, of course.
> >>>
> >>>> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
> >>>> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
I agree it's better to handle it in the core than in drivers, even if I
dislike the feature in the first place.
> >>>> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
> >>>> with this ioctl the number of buffers will never be more than requested, although it
> >>>> may be less if you run out of memory.
On a side note, we should transition libcamera to use VIDIOC_CREATE_BUFS
unconditionally.
> >>>>
> >>>> I really should go through all existing drivers and fix them up if they try to
> >>>> handle this in the queue_setup function, I suspect a lot of them are quite messy.
> >>>>
> >>>> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
> >>>> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
> >>>
> >>> Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?
> >>
> >> The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
> >> then it can never return a buffer to userspace! So you need one more. That's the absolute
> >> minimum. For smooth capture you need two more to allow time for userspace to process the
> >> buffer.
> >
> > Hmm, ok, I see. Well, I guess my "I want to capture just a single frame" is not a very common case.
It's not that uncommon, see above.
> >
> > Can I queue one buffer, start streaming, stop streaming, and get the
> > filled buffer? But then I guess I don't when the buffer has been
> > filled, i.e. when to call stop streaming.
>
> Exactly. If you really want that, then the driver has to be adapted in the way that Laurent
> suggested, i.e. with one or more scratch buffers. But that is not always possible, esp. with
> older hardware without an IOMMU.
Drivers can always allocate a full-frame scratch buffer in the worst
case. That can waste memory though, which is less than ideal.
> > So, never mind, I don't actually have any use case for this, just wondering.
> >
> >>>
> >>>> to use CREATE_BUFS you need that information so you know that you have to create
> >>>> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
> >>>> but it is effectively codec specific. This probably should be clarified.
> >>>>
> >>>> I wonder if it wouldn't be better to add a min_num_buffers field to
> >>>> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
Don't add the +1. We should give userspace the information it needs to
make informed decisions, not make decisions on its behalf.
> >>>
> >>> I think this makes sense (although I still don't get the +1).
> >>>
> >>> However, based on the experiences from adding the streams features
> >>> to various ioctls, let's be very careful =). The new
> >>> 'min_num_buffers' can be filled with garbage by the userspace. If
> >>> we define the 'min_num_buffers' field to be always filled by the
> >>> kernel, and any value provided from the userspace to be ignored, I
> >>> think it should work.
> >>
> >> I've posted an RFC for this.
> >
> > Thanks, I'll check it out.
> >
> > For the original issue in this thread, I think the correct fix is to
> > remove the lines from cfe_queue_setup(), and add
> > "q->min_reqbufs_allocation = 3".
Or just don't set min_reqbufs_allocation ? This is a new driver, and it
requires a device-specific userspace to operate the ISP. I don't think
we need to care about applications blindly calling VIDIOC_REQBUFS(1) and
expecting to get more buffers.
> >
> > I'll send a patch for that.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 15:17 ` Laurent Pinchart
@ 2024-10-28 15:32 ` Tomi Valkeinen
2024-10-28 16:32 ` Laurent Pinchart
0 siblings, 1 reply; 21+ messages in thread
From: Tomi Valkeinen @ 2024-10-28 15:32 UTC (permalink / raw)
To: Laurent Pinchart, Hans Verkuil
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Sakari Ailus, Jacopo Mondi,
Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
Hi Laurent,
On 28/10/2024 17:17, Laurent Pinchart wrote:
> On Mon, Oct 28, 2024 at 12:30:45PM +0100, Hans Verkuil wrote:
>> On 28/10/2024 12:25, Tomi Valkeinen wrote:
>>> On 28/10/2024 13:13, Hans Verkuil wrote:
>>>> On 28/10/2024 12:05, Tomi Valkeinen wrote:
>>>>> On 28/10/2024 12:11, Hans Verkuil wrote:
>>>>>> On 28/10/2024 10:21, Tomi Valkeinen wrote:
>>>>>>> On 24/10/2024 11:20, Hans Verkuil wrote:
>>>>>>>> Hi Tomi,
>>>>>>>>
>>>>>>>> I know this driver is already merged, but while checking for drivers that use
>>>>>>>> q->max_num_buffers I stumbled on this cfe code:
>>>>>>>>
>>>>>>>> <snip>
>>>>>>>>
>>>>>>>>> +/*
>>>>>>>>> + * vb2 ops
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>>>>>>>> + unsigned int *nplanes, unsigned int sizes[],
>>>>>>>>> + struct device *alloc_devs[])
>>>>>>>>> +{
>>>>>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>>>>>>>> + struct cfe_device *cfe = node->cfe;
>>>>>>>>> + unsigned int size = is_image_node(node) ?
>>>>>>>>> + node->vid_fmt.fmt.pix.sizeimage :
>>>>>>>>> + node->meta_fmt.fmt.meta.buffersize;
>>>>>>>>> +
>>>>>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>>>>>>>> + node->buffer_queue.type);
>>>>>>>>> +
>>>>>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>>>>>>>> + *nbuffers = 3 - vq->max_num_buffers;
>>>>>>>>
>>>>>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>>>>>>>> is called. So 32 + *nbuffers is never < 3.
>>>>>>>>
>>>>>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>>>>>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>>>>>>>> for you.
>>>>>>>>
>>>>>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>>>>>>>> since the code is almost always wrong.
>>>>>>>
>>>>>>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
>>>>>>>
>>>>>>> if (vq->num_buffers + *nbuffers < 3)
>>>>>>> *nbuffers = 3 - vq->num_buffers;
>>>>>>>
>>>>>>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
>>>>>>>
>>>>>>> The distinction between min_queued_buffers and
>>>>>>> min_reqbufs_allocation, or rather the need for the latter, still
>>>>>>> escapes me. If the HW/SW requires N buffers to be queued, why
>>>>>>> would we require allocating more than N buffers?
>>>>>>
>>>>>> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
>>>>>> engine, i.e. how many buffers much be queued before the DMA engine can be started.
>>>>>> Typically it is 0, 1 or 2.
>
> That's partly true only. Even if the hardware requires 2 buffers, a
> driver can allocate scratch buffers to lower the requirement for
> userspace. Setting min_queued_buffers to 1 is usually fine, as there are
> few use cases for userspace to start the hardware before a buffer is
> available to capture a frame to. A value of 2 is much more problematic,
> as it prevents operating with a single buffer. I know using a single
> buffer results in frame drops, but there are resource-constrained
> systems where application don't always need all the frames (such as the
> Raspberry Pi Zero for instance). I very strongly encourage drivers to
> never set a min_queued_buffers value higher than 1.
>
>>>>>>
>>>>>> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
>>>>>> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
>>>>>> or dropping frames.
>>>>>>
>>>>>> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
>>>>>> and the third is being processed by userspace. But sometimes drivers have other
>>>>>> requirements.
>
> This is exactly why I dislike min_reqbufs_allocation when set based on
> this logic, it encodes assumption on userspace use cases that a capture
> driver really shouldn't make.
>
>>>>>>
>>>>>> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
>>>>>> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
>>>>>> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
>>>>>>
>>>>>> "It can be smaller than the number requested, even zero, when the driver runs out of
>>>>>> free memory. A larger number is also possible when the driver requires more buffers
>>>>>> to function correctly."
>>>>>>
>>>>>> How drivers implement this is a mess, and usually the code in the driver is wrong as
>>>>>> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
>>>>>> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
>>>>>
>>>>> Thanks, this was educational!
>>>>>
>>>>> So. If I have a driver that has min_queued_buffers = 1, I can use
>>>>> VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture
>>>>> just one buffer, right? Whereas VIDIOC_REQBUFS would give me
>>>>> (probably) three (or two, if the driver does not set
>>>>> min_reqbufs_allocation). Three buffers makes sense for full
>>>>> streaming, of course.
>>>>>
>>>>>> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
>>>>>> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
>
> I agree it's better to handle it in the core than in drivers, even if I
> dislike the feature in the first place.
>
>>>>>> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
>>>>>> with this ioctl the number of buffers will never be more than requested, although it
>>>>>> may be less if you run out of memory.
>
> On a side note, we should transition libcamera to use VIDIOC_CREATE_BUFS
> unconditionally.
>
>>>>>>
>>>>>> I really should go through all existing drivers and fix them up if they try to
>>>>>> handle this in the queue_setup function, I suspect a lot of them are quite messy.
>>>>>>
>>>>>> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
>>>>>> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
>>>>>
>>>>> Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?
>>>>
>>>> The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
>>>> then it can never return a buffer to userspace! So you need one more. That's the absolute
>>>> minimum. For smooth capture you need two more to allow time for userspace to process the
>>>> buffer.
>>>
>>> Hmm, ok, I see. Well, I guess my "I want to capture just a single frame" is not a very common case.
>
> It's not that uncommon, see above.
>
>>>
>>> Can I queue one buffer, start streaming, stop streaming, and get the
>>> filled buffer? But then I guess I don't when the buffer has been
>>> filled, i.e. when to call stop streaming.
>>
>> Exactly. If you really want that, then the driver has to be adapted in the way that Laurent
>> suggested, i.e. with one or more scratch buffers. But that is not always possible, esp. with
>> older hardware without an IOMMU.
>
> Drivers can always allocate a full-frame scratch buffer in the worst
> case. That can waste memory though, which is less than ideal.
>
>>> So, never mind, I don't actually have any use case for this, just wondering.
>>>
>>>>>
>>>>>> to use CREATE_BUFS you need that information so you know that you have to create
>>>>>> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
>>>>>> but it is effectively codec specific. This probably should be clarified.
>>>>>>
>>>>>> I wonder if it wouldn't be better to add a min_num_buffers field to
>>>>>> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
>
> Don't add the +1. We should give userspace the information it needs to
> make informed decisions, not make decisions on its behalf.
>
>>>>>
>>>>> I think this makes sense (although I still don't get the +1).
>>>>>
>>>>> However, based on the experiences from adding the streams features
>>>>> to various ioctls, let's be very careful =). The new
>>>>> 'min_num_buffers' can be filled with garbage by the userspace. If
>>>>> we define the 'min_num_buffers' field to be always filled by the
>>>>> kernel, and any value provided from the userspace to be ignored, I
>>>>> think it should work.
>>>>
>>>> I've posted an RFC for this.
>>>
>>> Thanks, I'll check it out.
>>>
>>> For the original issue in this thread, I think the correct fix is to
>>> remove the lines from cfe_queue_setup(), and add
>>> "q->min_reqbufs_allocation = 3".
>
> Or just don't set min_reqbufs_allocation ? This is a new driver, and it
> requires a device-specific userspace to operate the ISP. I don't think
> we need to care about applications blindly calling VIDIOC_REQBUFS(1) and
> expecting to get more buffers.
It doesn't require a device-specific userspace for plain CSI-2 capture.
If I understood right, the expected behavior for VIDIOC_REQBUFS is to
return enough buffers for "smooth streaming". So even if device-specific
userspace would be required, doesn't it still make sense to have
min_reqbufs_allocation = 3?
Or is your point that even a device-specific userspace, which knows
exactly what it's doing, would use VIDIOC_REQBUFS, instead of
VIDIOC_CREATE_BUFS?
Also, if I don't set min_reqbufs_allocation, VIDIOC_REQBUFS(1) would
still allocate two buffers, not one.
Tomi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 15:32 ` Tomi Valkeinen
@ 2024-10-28 16:32 ` Laurent Pinchart
2024-10-29 8:23 ` Hans Verkuil
0 siblings, 1 reply; 21+ messages in thread
From: Laurent Pinchart @ 2024-10-28 16:32 UTC (permalink / raw)
To: Tomi Valkeinen
Cc: Hans Verkuil, linux-media, linux-kernel, devicetree,
linux-rpi-kernel, linux-arm-kernel, Naushir Patuck, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
On Mon, Oct 28, 2024 at 05:32:27PM +0200, Tomi Valkeinen wrote:
> On 28/10/2024 17:17, Laurent Pinchart wrote:
> > On Mon, Oct 28, 2024 at 12:30:45PM +0100, Hans Verkuil wrote:
> >> On 28/10/2024 12:25, Tomi Valkeinen wrote:
> >>> On 28/10/2024 13:13, Hans Verkuil wrote:
> >>>> On 28/10/2024 12:05, Tomi Valkeinen wrote:
> >>>>> On 28/10/2024 12:11, Hans Verkuil wrote:
> >>>>>> On 28/10/2024 10:21, Tomi Valkeinen wrote:
> >>>>>>> On 24/10/2024 11:20, Hans Verkuil wrote:
> >>>>>>>> Hi Tomi,
> >>>>>>>>
> >>>>>>>> I know this driver is already merged, but while checking for drivers that use
> >>>>>>>> q->max_num_buffers I stumbled on this cfe code:
> >>>>>>>>
> >>>>>>>> <snip>
> >>>>>>>>
> >>>>>>>>> +/*
> >>>>>>>>> + * vb2 ops
> >>>>>>>>> + */
> >>>>>>>>> +
> >>>>>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> >>>>>>>>> + unsigned int *nplanes, unsigned int sizes[],
> >>>>>>>>> + struct device *alloc_devs[])
> >>>>>>>>> +{
> >>>>>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
> >>>>>>>>> + struct cfe_device *cfe = node->cfe;
> >>>>>>>>> + unsigned int size = is_image_node(node) ?
> >>>>>>>>> + node->vid_fmt.fmt.pix.sizeimage :
> >>>>>>>>> + node->meta_fmt.fmt.meta.buffersize;
> >>>>>>>>> +
> >>>>>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
> >>>>>>>>> + node->buffer_queue.type);
> >>>>>>>>> +
> >>>>>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
> >>>>>>>>> + *nbuffers = 3 - vq->max_num_buffers;
> >>>>>>>>
> >>>>>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
> >>>>>>>> is called. So 32 + *nbuffers is never < 3.
> >>>>>>>>
> >>>>>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
> >>>>>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
> >>>>>>>> for you.
> >>>>>>>>
> >>>>>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
> >>>>>>>> since the code is almost always wrong.
> >>>>>>>
> >>>>>>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
> >>>>>>>
> >>>>>>> if (vq->num_buffers + *nbuffers < 3)
> >>>>>>> *nbuffers = 3 - vq->num_buffers;
> >>>>>>>
> >>>>>>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
> >>>>>>>
> >>>>>>> The distinction between min_queued_buffers and
> >>>>>>> min_reqbufs_allocation, or rather the need for the latter, still
> >>>>>>> escapes me. If the HW/SW requires N buffers to be queued, why
> >>>>>>> would we require allocating more than N buffers?
> >>>>>>
> >>>>>> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
> >>>>>> engine, i.e. how many buffers much be queued before the DMA engine can be started.
> >>>>>> Typically it is 0, 1 or 2.
> >
> > That's partly true only. Even if the hardware requires 2 buffers, a
> > driver can allocate scratch buffers to lower the requirement for
> > userspace. Setting min_queued_buffers to 1 is usually fine, as there are
> > few use cases for userspace to start the hardware before a buffer is
> > available to capture a frame to. A value of 2 is much more problematic,
> > as it prevents operating with a single buffer. I know using a single
> > buffer results in frame drops, but there are resource-constrained
> > systems where application don't always need all the frames (such as the
> > Raspberry Pi Zero for instance). I very strongly encourage drivers to
> > never set a min_queued_buffers value higher than 1.
> >
> >>>>>>
> >>>>>> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
> >>>>>> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
> >>>>>> or dropping frames.
> >>>>>>
> >>>>>> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
> >>>>>> and the third is being processed by userspace. But sometimes drivers have other
> >>>>>> requirements.
> >
> > This is exactly why I dislike min_reqbufs_allocation when set based on
> > this logic, it encodes assumption on userspace use cases that a capture
> > driver really shouldn't make.
> >
> >>>>>>
> >>>>>> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
> >>>>>> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
> >>>>>> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
> >>>>>>
> >>>>>> "It can be smaller than the number requested, even zero, when the driver runs out of
> >>>>>> free memory. A larger number is also possible when the driver requires more buffers
> >>>>>> to function correctly."
> >>>>>>
> >>>>>> How drivers implement this is a mess, and usually the code in the driver is wrong as
> >>>>>> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
> >>>>>> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
> >>>>>
> >>>>> Thanks, this was educational!
> >>>>>
> >>>>> So. If I have a driver that has min_queued_buffers = 1, I can use
> >>>>> VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture
> >>>>> just one buffer, right? Whereas VIDIOC_REQBUFS would give me
> >>>>> (probably) three (or two, if the driver does not set
> >>>>> min_reqbufs_allocation). Three buffers makes sense for full
> >>>>> streaming, of course.
> >>>>>
> >>>>>> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
> >>>>>> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
> >
> > I agree it's better to handle it in the core than in drivers, even if I
> > dislike the feature in the first place.
> >
> >>>>>> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
> >>>>>> with this ioctl the number of buffers will never be more than requested, although it
> >>>>>> may be less if you run out of memory.
> >
> > On a side note, we should transition libcamera to use VIDIOC_CREATE_BUFS
> > unconditionally.
> >
> >>>>>>
> >>>>>> I really should go through all existing drivers and fix them up if they try to
> >>>>>> handle this in the queue_setup function, I suspect a lot of them are quite messy.
> >>>>>>
> >>>>>> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
> >>>>>> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
> >>>>>
> >>>>> Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?
> >>>>
> >>>> The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
> >>>> then it can never return a buffer to userspace! So you need one more. That's the absolute
> >>>> minimum. For smooth capture you need two more to allow time for userspace to process the
> >>>> buffer.
> >>>
> >>> Hmm, ok, I see. Well, I guess my "I want to capture just a single frame" is not a very common case.
> >
> > It's not that uncommon, see above.
> >
> >>>
> >>> Can I queue one buffer, start streaming, stop streaming, and get the
> >>> filled buffer? But then I guess I don't when the buffer has been
> >>> filled, i.e. when to call stop streaming.
> >>
> >> Exactly. If you really want that, then the driver has to be adapted in the way that Laurent
> >> suggested, i.e. with one or more scratch buffers. But that is not always possible, esp. with
> >> older hardware without an IOMMU.
> >
> > Drivers can always allocate a full-frame scratch buffer in the worst
> > case. That can waste memory though, which is less than ideal.
> >
> >>> So, never mind, I don't actually have any use case for this, just wondering.
> >>>
> >>>>>
> >>>>>> to use CREATE_BUFS you need that information so you know that you have to create
> >>>>>> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
> >>>>>> but it is effectively codec specific. This probably should be clarified.
> >>>>>>
> >>>>>> I wonder if it wouldn't be better to add a min_num_buffers field to
> >>>>>> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
> >
> > Don't add the +1. We should give userspace the information it needs to
> > make informed decisions, not make decisions on its behalf.
> >
> >>>>>
> >>>>> I think this makes sense (although I still don't get the +1).
> >>>>>
> >>>>> However, based on the experiences from adding the streams features
> >>>>> to various ioctls, let's be very careful =). The new
> >>>>> 'min_num_buffers' can be filled with garbage by the userspace. If
> >>>>> we define the 'min_num_buffers' field to be always filled by the
> >>>>> kernel, and any value provided from the userspace to be ignored, I
> >>>>> think it should work.
> >>>>
> >>>> I've posted an RFC for this.
> >>>
> >>> Thanks, I'll check it out.
> >>>
> >>> For the original issue in this thread, I think the correct fix is to
> >>> remove the lines from cfe_queue_setup(), and add
> >>> "q->min_reqbufs_allocation = 3".
> >
> > Or just don't set min_reqbufs_allocation ? This is a new driver, and it
> > requires a device-specific userspace to operate the ISP. I don't think
> > we need to care about applications blindly calling VIDIOC_REQBUFS(1) and
> > expecting to get more buffers.
>
> It doesn't require a device-specific userspace for plain CSI-2 capture.
>
> If I understood right, the expected behavior for VIDIOC_REQBUFS is to
> return enough buffers for "smooth streaming". So even if device-specific
> userspace would be required, doesn't it still make sense to have
> min_reqbufs_allocation = 3?
"Smooth streaming" is use case-dependent, you will need different number
of buffers for different use cases. That's why I don't like hardcoding
this in a video capture driver. I'd rather expose information about the
driver behaviour (in particular, how many buffers it will hold on
without returning anything to userspace until a new buffer gets queued)
and let applications make a decision. I don't expect applications
relying on VIDIOC_REQBUFS(1) to work out-of-the-box on Pi 5 anyway, as
the media graph needs to be configured.
> Or is your point that even a device-specific userspace, which knows
> exactly what it's doing, would use VIDIOC_REQBUFS, instead of
> VIDIOC_CREATE_BUFS?
I expect a device-specific userspace not to require drivers to make
policy decisions on its behalf.
> Also, if I don't set min_reqbufs_allocation, VIDIOC_REQBUFS(1) would
> still allocate two buffers, not one.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-28 16:32 ` Laurent Pinchart
@ 2024-10-29 8:23 ` Hans Verkuil
2024-10-29 9:53 ` Laurent Pinchart
0 siblings, 1 reply; 21+ messages in thread
From: Hans Verkuil @ 2024-10-29 8:23 UTC (permalink / raw)
To: Laurent Pinchart, Tomi Valkeinen
Cc: linux-media, linux-kernel, devicetree, linux-rpi-kernel,
linux-arm-kernel, Naushir Patuck, Sakari Ailus, Jacopo Mondi,
Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
On 28/10/2024 17:32, Laurent Pinchart wrote:
> On Mon, Oct 28, 2024 at 05:32:27PM +0200, Tomi Valkeinen wrote:
>> On 28/10/2024 17:17, Laurent Pinchart wrote:
>>> On Mon, Oct 28, 2024 at 12:30:45PM +0100, Hans Verkuil wrote:
>>>> On 28/10/2024 12:25, Tomi Valkeinen wrote:
>>>>> On 28/10/2024 13:13, Hans Verkuil wrote:
>>>>>> On 28/10/2024 12:05, Tomi Valkeinen wrote:
>>>>>>> On 28/10/2024 12:11, Hans Verkuil wrote:
>>>>>>>> On 28/10/2024 10:21, Tomi Valkeinen wrote:
>>>>>>>>> On 24/10/2024 11:20, Hans Verkuil wrote:
>>>>>>>>>> Hi Tomi,
>>>>>>>>>>
>>>>>>>>>> I know this driver is already merged, but while checking for drivers that use
>>>>>>>>>> q->max_num_buffers I stumbled on this cfe code:
>>>>>>>>>>
>>>>>>>>>> <snip>
>>>>>>>>>>
>>>>>>>>>>> +/*
>>>>>>>>>>> + * vb2 ops
>>>>>>>>>>> + */
>>>>>>>>>>> +
>>>>>>>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
>>>>>>>>>>> + unsigned int *nplanes, unsigned int sizes[],
>>>>>>>>>>> + struct device *alloc_devs[])
>>>>>>>>>>> +{
>>>>>>>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
>>>>>>>>>>> + struct cfe_device *cfe = node->cfe;
>>>>>>>>>>> + unsigned int size = is_image_node(node) ?
>>>>>>>>>>> + node->vid_fmt.fmt.pix.sizeimage :
>>>>>>>>>>> + node->meta_fmt.fmt.meta.buffersize;
>>>>>>>>>>> +
>>>>>>>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
>>>>>>>>>>> + node->buffer_queue.type);
>>>>>>>>>>> +
>>>>>>>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
>>>>>>>>>>> + *nbuffers = 3 - vq->max_num_buffers;
>>>>>>>>>>
>>>>>>>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
>>>>>>>>>> is called. So 32 + *nbuffers is never < 3.
>>>>>>>>>>
>>>>>>>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
>>>>>>>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
>>>>>>>>>> for you.
>>>>>>>>>>
>>>>>>>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
>>>>>>>>>> since the code is almost always wrong.
>>>>>>>>>
>>>>>>>>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
>>>>>>>>>
>>>>>>>>> if (vq->num_buffers + *nbuffers < 3)
>>>>>>>>> *nbuffers = 3 - vq->num_buffers;
>>>>>>>>>
>>>>>>>>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
>>>>>>>>>
>>>>>>>>> The distinction between min_queued_buffers and
>>>>>>>>> min_reqbufs_allocation, or rather the need for the latter, still
>>>>>>>>> escapes me. If the HW/SW requires N buffers to be queued, why
>>>>>>>>> would we require allocating more than N buffers?
>>>>>>>>
>>>>>>>> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
>>>>>>>> engine, i.e. how many buffers much be queued before the DMA engine can be started.
>>>>>>>> Typically it is 0, 1 or 2.
>>>
>>> That's partly true only. Even if the hardware requires 2 buffers, a
>>> driver can allocate scratch buffers to lower the requirement for
>>> userspace. Setting min_queued_buffers to 1 is usually fine, as there are
>>> few use cases for userspace to start the hardware before a buffer is
>>> available to capture a frame to. A value of 2 is much more problematic,
>>> as it prevents operating with a single buffer. I know using a single
>>> buffer results in frame drops, but there are resource-constrained
>>> systems where application don't always need all the frames (such as the
>>> Raspberry Pi Zero for instance). I very strongly encourage drivers to
>>> never set a min_queued_buffers value higher than 1.
>>>
>>>>>>>>
>>>>>>>> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
>>>>>>>> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
>>>>>>>> or dropping frames.
>>>>>>>>
>>>>>>>> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
>>>>>>>> and the third is being processed by userspace. But sometimes drivers have other
>>>>>>>> requirements.
>>>
>>> This is exactly why I dislike min_reqbufs_allocation when set based on
>>> this logic, it encodes assumption on userspace use cases that a capture
>>> driver really shouldn't make.
>>>
>>>>>>>>
>>>>>>>> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
>>>>>>>> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
>>>>>>>> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
>>>>>>>>
>>>>>>>> "It can be smaller than the number requested, even zero, when the driver runs out of
>>>>>>>> free memory. A larger number is also possible when the driver requires more buffers
>>>>>>>> to function correctly."
>>>>>>>>
>>>>>>>> How drivers implement this is a mess, and usually the code in the driver is wrong as
>>>>>>>> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
>>>>>>>> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
>>>>>>>
>>>>>>> Thanks, this was educational!
>>>>>>>
>>>>>>> So. If I have a driver that has min_queued_buffers = 1, I can use
>>>>>>> VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture
>>>>>>> just one buffer, right? Whereas VIDIOC_REQBUFS would give me
>>>>>>> (probably) three (or two, if the driver does not set
>>>>>>> min_reqbufs_allocation). Three buffers makes sense for full
>>>>>>> streaming, of course.
>>>>>>>
>>>>>>>> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
>>>>>>>> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
>>>
>>> I agree it's better to handle it in the core than in drivers, even if I
>>> dislike the feature in the first place.
>>>
>>>>>>>> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
>>>>>>>> with this ioctl the number of buffers will never be more than requested, although it
>>>>>>>> may be less if you run out of memory.
>>>
>>> On a side note, we should transition libcamera to use VIDIOC_CREATE_BUFS
>>> unconditionally.
>>>
>>>>>>>>
>>>>>>>> I really should go through all existing drivers and fix them up if they try to
>>>>>>>> handle this in the queue_setup function, I suspect a lot of them are quite messy.
>>>>>>>>
>>>>>>>> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
>>>>>>>> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
>>>>>>>
>>>>>>> Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?
>>>>>>
>>>>>> The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
>>>>>> then it can never return a buffer to userspace! So you need one more. That's the absolute
>>>>>> minimum. For smooth capture you need two more to allow time for userspace to process the
>>>>>> buffer.
>>>>>
>>>>> Hmm, ok, I see. Well, I guess my "I want to capture just a single frame" is not a very common case.
>>>
>>> It's not that uncommon, see above.
>>>
>>>>>
>>>>> Can I queue one buffer, start streaming, stop streaming, and get the
>>>>> filled buffer? But then I guess I don't when the buffer has been
>>>>> filled, i.e. when to call stop streaming.
>>>>
>>>> Exactly. If you really want that, then the driver has to be adapted in the way that Laurent
>>>> suggested, i.e. with one or more scratch buffers. But that is not always possible, esp. with
>>>> older hardware without an IOMMU.
>>>
>>> Drivers can always allocate a full-frame scratch buffer in the worst
>>> case. That can waste memory though, which is less than ideal.
>>>
>>>>> So, never mind, I don't actually have any use case for this, just wondering.
>>>>>
>>>>>>>
>>>>>>>> to use CREATE_BUFS you need that information so you know that you have to create
>>>>>>>> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
>>>>>>>> but it is effectively codec specific. This probably should be clarified.
>>>>>>>>
>>>>>>>> I wonder if it wouldn't be better to add a min_num_buffers field to
>>>>>>>> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
>>>
>>> Don't add the +1. We should give userspace the information it needs to
>>> make informed decisions, not make decisions on its behalf.
>>>
>>>>>>>
>>>>>>> I think this makes sense (although I still don't get the +1).
>>>>>>>
>>>>>>> However, based on the experiences from adding the streams features
>>>>>>> to various ioctls, let's be very careful =). The new
>>>>>>> 'min_num_buffers' can be filled with garbage by the userspace. If
>>>>>>> we define the 'min_num_buffers' field to be always filled by the
>>>>>>> kernel, and any value provided from the userspace to be ignored, I
>>>>>>> think it should work.
>>>>>>
>>>>>> I've posted an RFC for this.
>>>>>
>>>>> Thanks, I'll check it out.
>>>>>
>>>>> For the original issue in this thread, I think the correct fix is to
>>>>> remove the lines from cfe_queue_setup(), and add
>>>>> "q->min_reqbufs_allocation = 3".
>>>
>>> Or just don't set min_reqbufs_allocation ? This is a new driver, and it
>>> requires a device-specific userspace to operate the ISP. I don't think
>>> we need to care about applications blindly calling VIDIOC_REQBUFS(1) and
>>> expecting to get more buffers.
>>
>> It doesn't require a device-specific userspace for plain CSI-2 capture.
>>
>> If I understood right, the expected behavior for VIDIOC_REQBUFS is to
>> return enough buffers for "smooth streaming". So even if device-specific
>> userspace would be required, doesn't it still make sense to have
>> min_reqbufs_allocation = 3?
>
> "Smooth streaming" is use case-dependent, you will need different number
> of buffers for different use cases. That's why I don't like hardcoding
> this in a video capture driver. I'd rather expose information about the
> driver behaviour (in particular, how many buffers it will hold on
> without returning anything to userspace until a new buffer gets queued)
> and let applications make a decision. I don't expect applications
> relying on VIDIOC_REQBUFS(1) to work out-of-the-box on Pi 5 anyway, as
> the media graph needs to be configured.
>
>> Or is your point that even a device-specific userspace, which knows
>> exactly what it's doing, would use VIDIOC_REQBUFS, instead of
>> VIDIOC_CREATE_BUFS?
>
> I expect a device-specific userspace not to require drivers to make
> policy decisions on its behalf.
Remember that libcamera is a specialized library that indeed wants to
make policy decisions itself. But many other drivers for much simpler
pipelines (typically for video receivers) don't need this and can use
the standard V4L2 API.
My goal is to have the standard V4L2 API behave in a well-defined manner,
while giving enough information to specialized userspace code like libcamera
to do their own thing.
Regards,
Hans
>
>> Also, if I don't set min_reqbufs_allocation, VIDIOC_REQBUFS(1) would
>> still allocate two buffers, not one.
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE
2024-10-29 8:23 ` Hans Verkuil
@ 2024-10-29 9:53 ` Laurent Pinchart
0 siblings, 0 replies; 21+ messages in thread
From: Laurent Pinchart @ 2024-10-29 9:53 UTC (permalink / raw)
To: Hans Verkuil
Cc: Tomi Valkeinen, linux-media, linux-kernel, devicetree,
linux-rpi-kernel, linux-arm-kernel, Naushir Patuck, Sakari Ailus,
Jacopo Mondi, Kieran Bingham, Mauro Carvalho Chehab,
Raspberry Pi Kernel Maintenance, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, Rob Herring,
Krzysztof Kozlowski
On Tue, Oct 29, 2024 at 09:23:35AM +0100, Hans Verkuil wrote:
> On 28/10/2024 17:32, Laurent Pinchart wrote:
> > On Mon, Oct 28, 2024 at 05:32:27PM +0200, Tomi Valkeinen wrote:
> >> On 28/10/2024 17:17, Laurent Pinchart wrote:
> >>> On Mon, Oct 28, 2024 at 12:30:45PM +0100, Hans Verkuil wrote:
> >>>> On 28/10/2024 12:25, Tomi Valkeinen wrote:
> >>>>> On 28/10/2024 13:13, Hans Verkuil wrote:
> >>>>>> On 28/10/2024 12:05, Tomi Valkeinen wrote:
> >>>>>>> On 28/10/2024 12:11, Hans Verkuil wrote:
> >>>>>>>> On 28/10/2024 10:21, Tomi Valkeinen wrote:
> >>>>>>>>> On 24/10/2024 11:20, Hans Verkuil wrote:
> >>>>>>>>>> Hi Tomi,
> >>>>>>>>>>
> >>>>>>>>>> I know this driver is already merged, but while checking for drivers that use
> >>>>>>>>>> q->max_num_buffers I stumbled on this cfe code:
> >>>>>>>>>>
> >>>>>>>>>> <snip>
> >>>>>>>>>>
> >>>>>>>>>>> +/*
> >>>>>>>>>>> + * vb2 ops
> >>>>>>>>>>> + */
> >>>>>>>>>>> +
> >>>>>>>>>>> +static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> >>>>>>>>>>> + unsigned int *nplanes, unsigned int sizes[],
> >>>>>>>>>>> + struct device *alloc_devs[])
> >>>>>>>>>>> +{
> >>>>>>>>>>> + struct cfe_node *node = vb2_get_drv_priv(vq);
> >>>>>>>>>>> + struct cfe_device *cfe = node->cfe;
> >>>>>>>>>>> + unsigned int size = is_image_node(node) ?
> >>>>>>>>>>> + node->vid_fmt.fmt.pix.sizeimage :
> >>>>>>>>>>> + node->meta_fmt.fmt.meta.buffersize;
> >>>>>>>>>>> +
> >>>>>>>>>>> + cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
> >>>>>>>>>>> + node->buffer_queue.type);
> >>>>>>>>>>> +
> >>>>>>>>>>> + if (vq->max_num_buffers + *nbuffers < 3)
> >>>>>>>>>>> + *nbuffers = 3 - vq->max_num_buffers;
> >>>>>>>>>>
> >>>>>>>>>> This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
> >>>>>>>>>> is called. So 32 + *nbuffers is never < 3.
> >>>>>>>>>>
> >>>>>>>>>> If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
> >>>>>>>>>> q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
> >>>>>>>>>> for you.
> >>>>>>>>>>
> >>>>>>>>>> Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
> >>>>>>>>>> since the code is almost always wrong.
> >>>>>>>>>
> >>>>>>>>> Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:
> >>>>>>>>>
> >>>>>>>>> if (vq->num_buffers + *nbuffers < 3)
> >>>>>>>>> *nbuffers = 3 - vq->num_buffers;
> >>>>>>>>>
> >>>>>>>>> So... I think that is the same as "q->min_reqbufs_allocation = 3"?
> >>>>>>>>>
> >>>>>>>>> The distinction between min_queued_buffers and
> >>>>>>>>> min_reqbufs_allocation, or rather the need for the latter, still
> >>>>>>>>> escapes me. If the HW/SW requires N buffers to be queued, why
> >>>>>>>>> would we require allocating more than N buffers?
> >>>>>>>>
> >>>>>>>> min_queued_buffers is easiest to explain: that represents the requirements of the DMA
> >>>>>>>> engine, i.e. how many buffers much be queued before the DMA engine can be started.
> >>>>>>>> Typically it is 0, 1 or 2.
> >>>
> >>> That's partly true only. Even if the hardware requires 2 buffers, a
> >>> driver can allocate scratch buffers to lower the requirement for
> >>> userspace. Setting min_queued_buffers to 1 is usually fine, as there are
> >>> few use cases for userspace to start the hardware before a buffer is
> >>> available to capture a frame to. A value of 2 is much more problematic,
> >>> as it prevents operating with a single buffer. I know using a single
> >>> buffer results in frame drops, but there are resource-constrained
> >>> systems where application don't always need all the frames (such as the
> >>> Raspberry Pi Zero for instance). I very strongly encourage drivers to
> >>> never set a min_queued_buffers value higher than 1.
> >>>
> >>>>>>>>
> >>>>>>>> min_reqbufs_allocation is the minimum number of buffers that will be allocated when
> >>>>>>>> calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
> >>>>>>>> or dropping frames.
> >>>>>>>>
> >>>>>>>> Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
> >>>>>>>> and the third is being processed by userspace. But sometimes drivers have other
> >>>>>>>> requirements.
> >>>
> >>> This is exactly why I dislike min_reqbufs_allocation when set based on
> >>> this logic, it encodes assumption on userspace use cases that a capture
> >>> driver really shouldn't make.
> >>>
> >>>>>>>>
> >>>>>>>> The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
> >>>>>>>> expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
> >>>>>>>> https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html
> >>>>>>>>
> >>>>>>>> "It can be smaller than the number requested, even zero, when the driver runs out of
> >>>>>>>> free memory. A larger number is also possible when the driver requires more buffers
> >>>>>>>> to function correctly."
> >>>>>>>>
> >>>>>>>> How drivers implement this is a mess, and usually the code in the driver is wrong as
> >>>>>>>> well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
> >>>>>>>> instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.
> >>>>>>>
> >>>>>>> Thanks, this was educational!
> >>>>>>>
> >>>>>>> So. If I have a driver that has min_queued_buffers = 1, I can use
> >>>>>>> VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture
> >>>>>>> just one buffer, right? Whereas VIDIOC_REQBUFS would give me
> >>>>>>> (probably) three (or two, if the driver does not set
> >>>>>>> min_reqbufs_allocation). Three buffers makes sense for full
> >>>>>>> streaming, of course.
> >>>>>>>
> >>>>>>>> When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
> >>>>>>>> to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,
> >>>
> >>> I agree it's better to handle it in the core than in drivers, even if I
> >>> dislike the feature in the first place.
> >>>
> >>>>>>>> if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
> >>>>>>>> with this ioctl the number of buffers will never be more than requested, although it
> >>>>>>>> may be less if you run out of memory.
> >>>
> >>> On a side note, we should transition libcamera to use VIDIOC_CREATE_BUFS
> >>> unconditionally.
> >>>
> >>>>>>>>
> >>>>>>>> I really should go through all existing drivers and fix them up if they try to
> >>>>>>>> handle this in the queue_setup function, I suspect a lot of them are quite messy.
> >>>>>>>>
> >>>>>>>> One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
> >>>>>>>> buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want
> >>>>>>>
> >>>>>>> Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?
> >>>>>>
> >>>>>> The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
> >>>>>> then it can never return a buffer to userspace! So you need one more. That's the absolute
> >>>>>> minimum. For smooth capture you need two more to allow time for userspace to process the
> >>>>>> buffer.
> >>>>>
> >>>>> Hmm, ok, I see. Well, I guess my "I want to capture just a single frame" is not a very common case.
> >>>
> >>> It's not that uncommon, see above.
> >>>
> >>>>>
> >>>>> Can I queue one buffer, start streaming, stop streaming, and get the
> >>>>> filled buffer? But then I guess I don't when the buffer has been
> >>>>> filled, i.e. when to call stop streaming.
> >>>>
> >>>> Exactly. If you really want that, then the driver has to be adapted in the way that Laurent
> >>>> suggested, i.e. with one or more scratch buffers. But that is not always possible, esp. with
> >>>> older hardware without an IOMMU.
> >>>
> >>> Drivers can always allocate a full-frame scratch buffer in the worst
> >>> case. That can waste memory though, which is less than ideal.
> >>>
> >>>>> So, never mind, I don't actually have any use case for this, just wondering.
> >>>>>
> >>>>>>>
> >>>>>>>> to use CREATE_BUFS you need that information so you know that you have to create
> >>>>>>>> at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
> >>>>>>>> but it is effectively codec specific. This probably should be clarified.
> >>>>>>>>
> >>>>>>>> I wonder if it wouldn't be better to add a min_num_buffers field to
> >>>>>>>> struct v4l2_create_buffers and set it to min_queued_buffers + 1.
> >>>
> >>> Don't add the +1. We should give userspace the information it needs to
> >>> make informed decisions, not make decisions on its behalf.
> >>>
> >>>>>>>
> >>>>>>> I think this makes sense (although I still don't get the +1).
> >>>>>>>
> >>>>>>> However, based on the experiences from adding the streams features
> >>>>>>> to various ioctls, let's be very careful =). The new
> >>>>>>> 'min_num_buffers' can be filled with garbage by the userspace. If
> >>>>>>> we define the 'min_num_buffers' field to be always filled by the
> >>>>>>> kernel, and any value provided from the userspace to be ignored, I
> >>>>>>> think it should work.
> >>>>>>
> >>>>>> I've posted an RFC for this.
> >>>>>
> >>>>> Thanks, I'll check it out.
> >>>>>
> >>>>> For the original issue in this thread, I think the correct fix is to
> >>>>> remove the lines from cfe_queue_setup(), and add
> >>>>> "q->min_reqbufs_allocation = 3".
> >>>
> >>> Or just don't set min_reqbufs_allocation ? This is a new driver, and it
> >>> requires a device-specific userspace to operate the ISP. I don't think
> >>> we need to care about applications blindly calling VIDIOC_REQBUFS(1) and
> >>> expecting to get more buffers.
> >>
> >> It doesn't require a device-specific userspace for plain CSI-2 capture.
> >>
> >> If I understood right, the expected behavior for VIDIOC_REQBUFS is to
> >> return enough buffers for "smooth streaming". So even if device-specific
> >> userspace would be required, doesn't it still make sense to have
> >> min_reqbufs_allocation = 3?
> >
> > "Smooth streaming" is use case-dependent, you will need different number
> > of buffers for different use cases. That's why I don't like hardcoding
> > this in a video capture driver. I'd rather expose information about the
> > driver behaviour (in particular, how many buffers it will hold on
> > without returning anything to userspace until a new buffer gets queued)
> > and let applications make a decision. I don't expect applications
> > relying on VIDIOC_REQBUFS(1) to work out-of-the-box on Pi 5 anyway, as
> > the media graph needs to be configured.
> >
> >> Or is your point that even a device-specific userspace, which knows
> >> exactly what it's doing, would use VIDIOC_REQBUFS, instead of
> >> VIDIOC_CREATE_BUFS?
> >
> > I expect a device-specific userspace not to require drivers to make
> > policy decisions on its behalf.
>
> Remember that libcamera is a specialized library that indeed wants to
> make policy decisions itself. But many other drivers for much simpler
> pipelines (typically for video receivers) don't need this and can use
> the standard V4L2 API.
>
> My goal is to have the standard V4L2 API behave in a well-defined manner,
> while giving enough information to specialized userspace code like libcamera
> to do their own thing.
I think we all agree on well-defined :-) Where we may not agree is that,
regardless of whether an application goes through libcamera, interfaces
with V4L2 through another framework, or directly, I don't think the
kernel should make policy decisions on behalf of userspace. I'm fine
keeping VIDIOC_REQBUFS(1) operating as it was meant to because we have
to ensure backward compatibility, but I don't think it's the kind of API
we should design today.
Let's start by making sure we expose the information userspace needs,
and then we can discuss the next step.
> >> Also, if I don't set min_reqbufs_allocation, VIDIOC_REQBUFS(1) would
> >> still allocate two buffers, not one.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-10-29 9:57 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-03 10:31 [PATCH v6 0/4] media: raspberrypi: Support RPi5's CFE Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 1/4] media: uapi: Add meta formats for PiSP FE config and stats Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 2/4] dt-bindings: media: Add bindings for raspberrypi,rp1-cfe Tomi Valkeinen
2024-10-03 10:31 ` [PATCH v6 4/4] media: admin-guide: Document the Raspberry Pi CFE (rp1-cfe) Tomi Valkeinen
[not found] ` <20241003-rp1-cfe-v6-3-d6762edd98a8@ideasonboard.com>
2024-10-24 8:20 ` [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE Hans Verkuil
2024-10-24 11:08 ` Tomi Valkeinen
2024-10-24 11:21 ` Hans Verkuil
2024-10-24 12:05 ` Laurent Pinchart
2024-10-24 12:13 ` Hans Verkuil
2024-10-24 12:26 ` Laurent Pinchart
2024-10-28 9:21 ` Tomi Valkeinen
2024-10-28 10:11 ` Hans Verkuil
2024-10-28 11:05 ` Tomi Valkeinen
2024-10-28 11:13 ` Hans Verkuil
2024-10-28 11:25 ` Tomi Valkeinen
2024-10-28 11:30 ` Hans Verkuil
2024-10-28 15:17 ` Laurent Pinchart
2024-10-28 15:32 ` Tomi Valkeinen
2024-10-28 16:32 ` Laurent Pinchart
2024-10-29 8:23 ` Hans Verkuil
2024-10-29 9:53 ` Laurent Pinchart
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox