* [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
@ 2026-03-23 12:58 ` Loic Poulain
2026-03-23 12:58 ` [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE) Loic Poulain
` (4 more replies)
0 siblings, 5 replies; 47+ messages in thread
From: Loic Poulain @ 2026-03-23 12:58 UTC (permalink / raw)
To: bod, vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab, Loic Poulain
This RFC series introduces initial support for the Qualcomm CAMSS
Offline Processing Engine (OPE), as found on Agatti-based platforms.
Boards such as Arduino UNO-Q use this SoC family and will benefit
from hardware-assisted image processing enabled by this work.
This represents the first step toward enabling image processing beyond
raw capture on Qualcomm platforms by using hardware blocks for
operations such as debayering, 3A, and scaling.
The OPE sits outside the live capture pipeline. It operates on frames
fetched from system memory and writes processed results back to memory.
Because of this design, the OPE is not tied to any specific capture
interface: frames may come from CAMSS RDI or PIX paths, or from any
other producer capable of providing memory-backed buffers.
The hardware can sustain up to 580 megapixels per second, which is
sufficient to process a 10MPix stream at 60 fps or to handle four
parallel 2MPix (HD) streams at 60 fps.
The initial driver implementation relies on the V4L2 m2m framework
to keep the design simple while already enabling practical offline
processing workflows. This model also provides time-sharing across
multiple contexts through its built-in scheduling.
This first version is intentionally minimalistic. It provides a working
configuration using a fixed set of static processing parameters, mainly
to achieve correct and good-quality debayering.
Support for more advanced use-cases (dynamic parameters, statistics
outputs, additional data endpoints) will require evolving the driver
model beyond a pure m2m design. This may involve either moving away
from m2m, as other ISP drivers do, or extending it to support auxiliary
endpoints for parameters and statistics.
This series includes:
- dt-binding schema for CAMSS OPE
- initial CAMSS OPE driver
- QCM2290 device tree node describing the hardware block.
Feedback on the architecture and expected uAPI direction is especially
welcome.
Loic Poulain (3):
dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE)
media: qcom: camss: Add CAMSS Offline Processing Engine driver
arm64: dts: qcom: qcm2290: Add CAMSS OPE node
.../bindings/media/qcom,camss-ope.yaml | 87 +
arch/arm64/boot/dts/qcom/agatti.dtsi | 72 +
drivers/media/platform/qcom/camss/Makefile | 4 +
drivers/media/platform/qcom/camss/camss-ope.c | 2058 +++++++++++++++++
4 files changed, 2221 insertions(+)
create mode 100644 Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
create mode 100644 drivers/media/platform/qcom/camss/camss-ope.c
--
2.34.1
^ permalink raw reply [flat|nested] 47+ messages in thread
* [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE)
2026-03-23 12:58 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Loic Poulain
@ 2026-03-23 12:58 ` Loic Poulain
2026-03-23 13:03 ` Krzysztof Kozlowski
2026-03-23 13:03 ` Bryan O'Donoghue
2026-03-23 12:58 ` [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver Loic Poulain
` (3 subsequent siblings)
4 siblings, 2 replies; 47+ messages in thread
From: Loic Poulain @ 2026-03-23 12:58 UTC (permalink / raw)
To: bod, vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab, Loic Poulain
Add Devicetree binding documentation for the Qualcomm Camera Subsystem
Offline Processing Engine (OPE) found on platforms such as Agatti.
The OPE is a memory-to-memory image processing block which operates
on frames read from and written back to system memory.
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
.../bindings/media/qcom,camss-ope.yaml | 86 +++++++++++++++++++
1 file changed, 86 insertions(+)
create mode 100644 Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
diff --git a/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
new file mode 100644
index 000000000000..509b4e89a88a
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
@@ -0,0 +1,86 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/media/qcom,camss-ope.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm Camera Subsystem Offline Processing Engine
+
+maintainers:
+ - Loic Poulain <loic.poulain@oss.qualcomm.com>
+
+description:
+ The Qualcomm Camera Subsystem (CAMSS) Offline Processing Engine (OPE)
+ is a memory-to-memory image processing block used. It supports a
+ range of pixel-processing operations such as scaling, cropping, gain
+ adjustments, white-balancing, and various format conversions. The OPE
+ does not interface directly with image sensors, instead, it processes
+ frames sourced from and written back to system memory.
+
+properties:
+ compatible:
+ const: qcom,qcm2290-camss-ope
+
+ reg:
+ maxItems: 5
+
+ reg-names:
+ items:
+ - const: top
+ - const: bus_read
+ - const: bus_write
+ - const: pipeline
+ - const: qos
+
+ clocks:
+ maxItems: 5
+
+ clock-names:
+ items:
+ - const: axi
+ - const: core
+ - const: iface
+ - const: nrt
+ - const: top
+
+ interrupts:
+ maxItems: 1
+
+ interconnects:
+ maxItems: 2
+
+ interconnect-names:
+ items:
+ - const: config
+ - const: data
+
+ iommus:
+ maxItems: 2
+
+ operating-points-v2: true
+
+ opp-table:
+ type: object
+
+ power-domains:
+ maxItems: 2
+
+ power-domain-names:
+ items:
+ - const: camss
+ - const: cx
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - clocks
+ - clock-names
+ - interrupts
+ - interconnects
+ - interconnect-names
+ - iommus
+ - power-domains
+ - power-domain-names
+
+additionalProperties: true
--
2.34.1
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-23 12:58 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Loic Poulain
2026-03-23 12:58 ` [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE) Loic Poulain
@ 2026-03-23 12:58 ` Loic Poulain
2026-03-23 13:43 ` Bryan O'Donoghue
2026-03-23 12:58 ` [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node Loic Poulain
` (2 subsequent siblings)
4 siblings, 1 reply; 47+ messages in thread
From: Loic Poulain @ 2026-03-23 12:58 UTC (permalink / raw)
To: bod, vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab, Loic Poulain
Provide a initial implementation for the Qualcomm Offline Processing
Engine (OPE). OPE is a memory-to-memory hardware block designed for
image processing on a source frame. Typically, the input frame
originates from the SoC CSI capture path, though not limited to.
The hardware architecture consists of Fetch Engines and Write Engines,
connected through intermediate pipeline modules:
[FETCH ENGINES] => [Pipeline Modules] => [WRITE ENGINES]
Current Configuration:
Fetch Engine: One fetch engine is used for Bayer frame input.
Write Engines: Two display write engines for Y and UV planes output.
Enabled Pipeline Modules:
CLC_WB: White balance (channel gain configuration)
CLC_DEMO: Demosaic (Bayer to RGB conversion)
CLC_CHROMA_ENHAN: RGB to YUV conversion
CLC_DOWNSCALE*: Downscaling for UV and Y planes
Default configuration values are based on public standards such as BT.601.
Processing Model:
OPE processes frames in stripes of up to 336 pixels. Therefore, frames must
be split into stripes for processing. Each stripe is configured after the
previous one has been acquired (double buffered registers). To minimize
inter-stripe latency, stripe configurations are generated ahead of time.
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
drivers/media/platform/qcom/camss/Makefile | 4 +
drivers/media/platform/qcom/camss/camss-ope.c | 2058 +++++++++++++++++
2 files changed, 2062 insertions(+)
create mode 100644 drivers/media/platform/qcom/camss/camss-ope.c
diff --git a/drivers/media/platform/qcom/camss/Makefile b/drivers/media/platform/qcom/camss/Makefile
index 5e349b491513..67f261ae0855 100644
--- a/drivers/media/platform/qcom/camss/Makefile
+++ b/drivers/media/platform/qcom/camss/Makefile
@@ -29,3 +29,7 @@ qcom-camss-objs += \
camss-format.o \
obj-$(CONFIG_VIDEO_QCOM_CAMSS) += qcom-camss.o
+
+qcom-camss-ope-objs += camss-ope.o
+
+obj-$(CONFIG_VIDEO_QCOM_CAMSS) += qcom-camss-ope.o
diff --git a/drivers/media/platform/qcom/camss/camss-ope.c b/drivers/media/platform/qcom/camss/camss-ope.c
new file mode 100644
index 000000000000..f45a16437b6d
--- /dev/null
+++ b/drivers/media/platform/qcom/camss/camss-ope.c
@@ -0,0 +1,2058 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * camss-ope.c
+ *
+ * Qualcomm MSM Camera Subsystem - Offline Processing Engine
+ *
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ */
+
+/*
+ * This driver provides a minimal implementation for the Qualcomm Offline
+ * Processing Engine (OPE). OPE is a memory-to-memory hardware block
+ * designed for image processing on a source frame. Typically, the input
+ * frame originates from the SoC CSI capture path, though not limited to.
+ *
+ * The hardware architecture consists of Fetch Engines and Write Engines,
+ * connected through intermediate pipeline modules:
+ * [FETCH ENGINES] => [Pipeline Modules] => [WRITE ENGINES]
+ *
+ * Current Configuration:
+ * Fetch Engine: One fetch engine is used for Bayer frame input.
+ * Write Engines: Two display write engines for Y and UV planes output.
+ *
+ * Only a subset of the pipeline modules are enabled:
+ * CLC_WB: White balance for channel gain configuration
+ * CLC_DEMO: Demosaic for Bayer to RGB conversion
+ * CLC_CHROMA_ENHAN: for RGB to YUV conversion
+ * CLC_DOWNSCALE*: Downscaling for UV (YUV444 -> YUV422/YUV420) and YUV planes
+ *
+ * Default configuration values are based on public standards such as BT.601.
+ *
+ * Processing Model:
+ * OPE processes frames in stripes of up to 336 pixels. Therefore, frames must
+ * be split into stripes for processing. Each stripe is configured after the
+ * previous one has been acquired (double buffered registers). To minimize
+ * inter-stripe latency, the stripe configurations are generated ahead of time.
+ *
+ */
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/interconnect.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_clock.h>
+#include <linux/pm_domain.h>
+#include <linux/pm_opp.h>
+#include <linux/pm_runtime.h>
+#include <linux/regmap.h>
+#include <linux/slab.h>
+
+#include <media/media-device.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-dma-contig.h>
+
+#define MEM2MEM_NAME "qcom-camss-ope"
+
+/* TOP registers */
+#define OPE_TOP_HW_VERSION 0x000
+#define OPE_TOP_HW_VERSION_STEP GENMASK(15, 0)
+#define OPE_TOP_HW_VERSION_REV GENMASK(27, 16)
+#define OPE_TOP_HW_VERSION_GEN GENMASK(31, 28)
+#define OPE_TOP_RESET_CMD 0x004
+#define OPE_TOP_RESET_CMD_HW BIT(0)
+#define OPE_TOP_RESET_CMD_SW BIT(1)
+#define OPE_TOP_CORE_CFG 0x010
+#define OPE_TOP_IRQ_STATUS 0x014
+#define OPE_TOP_IRQ_MASK 0x018
+#define OPE_TOP_IRQ_STATUS_RST_DONE BIT(0)
+#define OPE_TOP_IRQ_STATUS_WE BIT(1)
+#define OPE_TOP_IRQ_STATUS_FE BIT(2)
+#define OPE_TOP_IRQ_STATUS_VIOL BIT(3)
+#define OPE_TOP_IRQ_STATUS_IDLE BIT(4)
+#define OPE_TOP_IRQ_CLEAR 0x01c
+#define OPE_TOP_IRQ_SET 0x020
+#define OPE_TOP_IRQ_CMD 0x024
+#define OPE_TOP_IRQ_CMD_CLEAR BIT(0)
+#define OPE_TOP_IRQ_CMD_SET BIT(4)
+#define OPE_TOP_VIOLATION_STATUS 0x028
+#define OPE_TOP_DEBUG(i) (0x0a0 + (i) * 4)
+#define OPE_TOP_DEBUG_CFG 0x0dc
+
+/* Fetch engines */
+#define OPE_BUS_RD_INPUT_IF_IRQ_MASK 0x00c
+#define OPE_BUS_RD_INPUT_IF_IRQ_CLEAR 0x010
+#define OPE_BUS_RD_INPUT_IF_IRQ_CMD 0x014
+#define OPE_BUS_RD_INPUT_IF_IRQ_CMD_CLEAR BIT(0)
+#define OPE_BUS_RD_INPUT_IF_IRQ_CMD_SET BIT(4)
+#define OPE_BUS_RD_INPUT_IF_IRQ_STATUS 0x018
+#define OPE_BUS_RD_INPUT_IF_IRQ_STATUS_RST_DONE BIT(0)
+#define OPE_BUS_RD_INPUT_IF_IRQ_STATUS_RUP_DONE BIT(1)
+#define OPE_BUS_RD_INPUT_IF_IRQ_STATUS_BUF_DONE BIT(2)
+#define OPE_BUS_RD_INPUT_IF_CMD 0x01c
+#define OPE_BUS_RD_INPUT_IF_CMD_GO_CMD BIT(0)
+#define OPE_BUS_RD_CLIENT_0_CORE_CFG 0x050
+#define OPE_BUS_RD_CLIENT_0_CORE_CFG_EN BIT(0)
+#define OPE_BUS_RD_CLIENT_0_CCIF_META_DATA 0x054
+#define OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN GENMASK(7, 2)
+#define OPE_BUS_RD_CLIENT_0_ADDR_IMAGE 0x058
+#define OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE 0x05c
+#define OPE_BUS_RD_CLIENT_0_RD_STRIDE 0x060
+#define OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0 0x064
+
+/* Write engines */
+#define OPE_BUS_WR_INPUT_IF_IRQ_MASK_0 0x018
+#define OPE_BUS_WR_INPUT_IF_IRQ_MASK_1 0x01c
+#define OPE_BUS_WR_INPUT_IF_IRQ_CLEAR_0 0x020
+#define OPE_BUS_WR_INPUT_IF_IRQ_CLEAR_1 0x024
+#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0 0x028
+#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_RUP_DONE BIT(0)
+#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_BUF_DONE BIT(8)
+#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_CONS_VIOL BIT(28)
+#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_VIOL BIT(30)
+#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_IMG_SZ_VIOL BIT(31)
+#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_1 0x02c
+#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_1_CLIENT_DONE(c) BIT(0 + (c))
+#define OPE_BUS_WR_INPUT_IF_IRQ_CMD 0x030
+#define OPE_BUS_WR_INPUT_IF_IRQ_CMD_CLEAR BIT(0)
+#define OPE_BUS_WR_INPUT_IF_IRQ_CMD_SET BIT(1)
+#define OPE_BUS_WR_VIOLATION_STATUS 0x064
+#define OPE_BUS_WR_IMAGE_SIZE_VIOLATION_STATUS 0x070
+#define OPE_BUS_WR_CLIENT_CFG(c) (0x200 + (c) * 0x100)
+#define OPE_BUS_WR_CLIENT_CFG_EN BIT(0)
+#define OPE_BUS_WR_CLIENT_CFG_AUTORECOVER BIT(4)
+#define OPE_BUS_WR_CLIENT_ADDR_IMAGE(c) (0x204 + (c) * 0x100)
+#define OPE_BUS_WR_CLIENT_IMAGE_CFG_0(c) (0x20c + (c) * 0x100)
+#define OPE_BUS_WR_CLIENT_IMAGE_CFG_1(c) (0x210 + (c) * 0x100)
+#define OPE_BUS_WR_CLIENT_IMAGE_CFG_2(c) (0x214 + (c) * 0x100)
+#define OPE_BUS_WR_CLIENT_PACKER_CFG(c) (0x218 + (c) * 0x100)
+#define OPE_BUS_WR_CLIENT_ADDR_FRAME_HEADER(c) (0x220 + (c) * 0x100)
+#define OPE_BUS_WR_CLIENT_MAX 8
+
+/* Pipeline modules */
+#define OPE_PP_CLC_WB_GAIN_MODULE_CFG (0x200 + 0x60)
+#define OPE_PP_CLC_WB_GAIN_MODULE_CFG_EN BIT(0)
+#define OPE_PP_CLC_WB_GAIN_WB_CFG(ch) (0x200 + 0x68 + 4 * (ch))
+
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_C_PRE_BASE 0x1c00
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_Y_DISP_BASE 0x3000
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_C_DISP_BASE 0x3200
+#define OPE_PP_CLC_DOWNSCALE_MN_CFG(ds) ((ds) + 0x60)
+#define OPE_PP_CLC_DOWNSCALE_MN_CFG_EN BIT(0)
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(ds) ((ds) + 0x64)
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN BIT(9)
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN BIT(10)
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(ds) ((ds) + 0x68)
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(ds) ((ds) + 0x6c)
+#define OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(ds) ((ds) + 0x74)
+
+#define OPE_PP_CLC_CHROMA_ENHAN_MODULE_CFG (0x1200 + 0x60)
+#define OPE_PP_CLC_CHROMA_ENHAN_MODULE_CFG_EN BIT(0)
+#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0 (0x1200 + 0x68)
+#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0_V0 GENMASK(11, 0)
+#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0_V1 GENMASK(27, 16)
+#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_1 (0x1200 + 0x6c)
+#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_1_K GENMASK(31, 23)
+#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_2 (0x1200 + 0x70)
+#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_2_V2 GENMASK(11, 0)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG (0x1200 + 0x74)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG_AP GENMASK(11, 0)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG_AM GENMASK(27, 16)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG (0x1200 + 0x78)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG_BP GENMASK(11, 0)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG_BM GENMASK(27, 16)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG (0x1200 + 0x7C)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG_CP GENMASK(11, 0)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG_CM GENMASK(27, 16)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG (0x1200 + 0x80)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG_DP GENMASK(11, 0)
+#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG_DM GENMASK(27, 16)
+#define OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_0 (0x1200 + 0x84)
+#define OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_0_KCB GENMASK(31, 21)
+#define OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_1 (0x1200 + 0x88)
+#define OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_1_KCR GENMASK(31, 21)
+
+#define OPE_STRIPE_MAX_W 336
+#define OPE_STRIPE_MAX_H 8192
+#define OPE_STRIPE_MIN_W 16
+#define OPE_STRIPE_MIN_H OPE_STRIPE_MIN_W
+#define OPE_MAX_STRIPE 16
+#define OPE_ALIGN_H 1
+#define OPE_ALIGN_W 1
+#define OPE_MIN_W 24
+#define OPE_MIN_H 16
+#define OPE_MAX_W (OPE_STRIPE_MAX_W * OPE_MAX_STRIPE)
+#define OPE_MAX_H OPE_STRIPE_MAX_H
+
+#define MEM2MEM_CAPTURE BIT(0)
+#define MEM2MEM_OUTPUT BIT(1)
+
+#define OPE_RESET_TIMEOUT_MS 100
+
+/* Expected framerate for power scaling */
+#define DEFAULT_FRAMERATE 60
+
+/* Downscaler helpers */
+#define Q21(v) (((uint64_t)(v)) << 21)
+#define DS_Q21(n, d) ((uint32_t)(((uint64_t)(n) << 21) / (d)))
+#define DS_RESOLUTION(input, output) \
+ (((output) * 128 <= (input)) ? 0x0 : \
+ ((output) * 16 <= (input)) ? 0x1 : \
+ ((output) * 8 <= (input)) ? 0x2 : 0x3)
+#define DS_OUTPUT_PIX(input, phase_init, phase_step) \
+ ((Q21(input) - (phase_init)) / (phase_step))
+
+enum ope_downscaler {
+ OPE_DS_C_PRE,
+ OPE_DS_C_DISP,
+ OPE_DS_Y_DISP,
+ OPE_DS_MAX,
+};
+
+static const u32 ope_ds_base[OPE_DS_MAX] = { OPE_PP_CLC_DOWNSCALE_MN_DS_C_PRE_BASE,
+ OPE_PP_CLC_DOWNSCALE_MN_DS_C_DISP_BASE,
+ OPE_PP_CLC_DOWNSCALE_MN_DS_Y_DISP_BASE };
+
+enum ope_wr_client {
+ OPE_WR_CLIENT_VID_Y,
+ OPE_WR_CLIENT_VID_C,
+ OPE_WR_CLIENT_DISP_Y,
+ OPE_WR_CLIENT_DISP_C,
+ OPE_WR_CLIENT_ARGB,
+ OPE_WR_CLIENT_MAX
+};
+
+enum ope_pixel_pattern {
+ OPE_PIXEL_PATTERN_RGRGRG,
+ OPE_PIXEL_PATTERN_GRGRGR,
+ OPE_PIXEL_PATTERN_BGBGBG,
+ OPE_PIXEL_PATTERN_GBGBGB,
+ OPE_PIXEL_PATTERN_YCBYCR,
+ OPE_PIXEL_PATTERN_YCRYCB,
+ OPE_PIXEL_PATTERN_CBYCRY,
+ OPE_PIXEL_PATTERN_CRYCBY
+};
+
+enum ope_stripe_location {
+ OPE_STRIPE_LOCATION_FULL,
+ OPE_STRIPE_LOCATION_LEFT,
+ OPE_STRIPE_LOCATION_RIGHT,
+ OPE_STRIPE_LOCATION_MIDDLE
+};
+
+enum ope_unpacker_format {
+ OPE_UNPACKER_FMT_PLAIN_128,
+ OPE_UNPACKER_FMT_PLAIN_8,
+ OPE_UNPACKER_FMT_PLAIN_16_10BPP,
+ OPE_UNPACKER_FMT_PLAIN_16_12BPP,
+ OPE_UNPACKER_FMT_PLAIN_16_14BPP,
+ OPE_UNPACKER_FMT_PLAIN_32_20BPP,
+ OPE_UNPACKER_FMT_ARGB_16_10BPP,
+ OPE_UNPACKER_FMT_ARGB_16_12BPP,
+ OPE_UNPACKER_FMT_ARGB_16_14BPP,
+ OPE_UNPACKER_FMT_PLAIN_32,
+ OPE_UNPACKER_FMT_PLAIN_64,
+ OPE_UNPACKER_FMT_TP_10,
+ OPE_UNPACKER_FMT_MIPI_8,
+ OPE_UNPACKER_FMT_MIPI_10,
+ OPE_UNPACKER_FMT_MIPI_12,
+ OPE_UNPACKER_FMT_MIPI_14,
+ OPE_UNPACKER_FMT_PLAIN_16_16BPP,
+ OPE_UNPACKER_FMT_PLAIN_128_ODD_EVEN,
+ OPE_UNPACKER_FMT_PLAIN_8_ODD_EVEN
+};
+
+enum ope_packer_format {
+ OPE_PACKER_FMT_PLAIN_128,
+ OPE_PACKER_FMT_PLAIN_8,
+ OPE_PACKER_FMT_PLAIN_8_ODD_EVEN,
+ OPE_PACKER_FMT_PLAIN_8_10BPP,
+ OPE_PACKER_FMT_PLAIN_8_10BPP_ODD_EVEN,
+ OPE_PACKER_FMT_PLAIN_16_10BPP,
+ OPE_PACKER_FMT_PLAIN_16_12BPP,
+ OPE_PACKER_FMT_PLAIN_16_14BPP,
+ OPE_PACKER_FMT_PLAIN_16_16BPP,
+ OPE_PACKER_FMT_PLAIN_32,
+ OPE_PACKER_FMT_PLAIN_64,
+ OPE_PACKER_FMT_TP_10,
+ OPE_PACKER_FMT_MIPI_10,
+ OPE_PACKER_FMT_MIPI_12
+};
+
+struct ope_fmt {
+ u32 fourcc;
+ unsigned int types;
+ enum ope_pixel_pattern pattern;
+ enum ope_unpacker_format unpacker_format;
+ enum ope_packer_format packer_format;
+ unsigned int depth;
+ unsigned int align; /* pix alignment = 2^align */
+};
+
+static const struct ope_fmt formats[] = { /* TODO: add multi-planes formats */
+ /* Output - Bayer MIPI 10 */
+ { V4L2_PIX_FMT_SBGGR10P, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_BGBGBG,
+ OPE_UNPACKER_FMT_MIPI_10, OPE_PACKER_FMT_MIPI_10, 10, 2 },
+ { V4L2_PIX_FMT_SGBRG10P, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_GBGBGB,
+ OPE_UNPACKER_FMT_MIPI_10, OPE_PACKER_FMT_MIPI_10, 10, 2 },
+ { V4L2_PIX_FMT_SGRBG10P, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_GRGRGR,
+ OPE_UNPACKER_FMT_MIPI_10, OPE_PACKER_FMT_MIPI_10, 10, 2 },
+ { V4L2_PIX_FMT_SRGGB10P, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_RGRGRG,
+ OPE_UNPACKER_FMT_MIPI_10, OPE_PACKER_FMT_MIPI_10, 10, 2 },
+ /* Output - Bayer MIPI/Plain 8 */
+ { V4L2_PIX_FMT_SRGGB8, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_RGRGRG,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
+ { V4L2_PIX_FMT_SBGGR8, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_BGBGBG,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
+ { V4L2_PIX_FMT_SGBRG8, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_GBGBGB,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
+ { V4L2_PIX_FMT_SGRBG8, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_GRGRGR,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
+ /* Capture - YUV 8-bit per component */
+ { V4L2_PIX_FMT_NV24, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_YCBYCR,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 24, 0 },
+ { V4L2_PIX_FMT_NV42, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_YCRYCB,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8_ODD_EVEN, 24, 0 },
+ { V4L2_PIX_FMT_NV16, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_CBYCRY,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 16, 1 },
+ { V4L2_PIX_FMT_NV61, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_CBYCRY,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8_ODD_EVEN, 16, 1 },
+ { V4L2_PIX_FMT_NV12, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_CBYCRY,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 12, 1 },
+ { V4L2_PIX_FMT_NV21, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_CBYCRY,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8_ODD_EVEN, 12, 1 },
+ /* Capture - Greyscale 8-bit */
+ { V4L2_PIX_FMT_GREY, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_RGRGRG,
+ OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
+};
+
+#define OPE_NUM_FORMATS ARRAY_SIZE(formats)
+
+#define OPE_WB(n, d) (((n) << 10) / (d))
+
+/* Per-queue, driver-specific private data */
+struct ope_q_data {
+ unsigned int width;
+ unsigned int height;
+ unsigned int bytesperline;
+ unsigned int sizeimage;
+ const struct ope_fmt *fmt;
+ enum v4l2_ycbcr_encoding ycbcr_enc;
+ enum v4l2_quantization quant;
+ unsigned int sequence;
+};
+
+struct ope_dev {
+ struct device *dev;
+ struct v4l2_device v4l2_dev;
+ struct video_device vfd;
+ struct media_device mdev;
+ struct v4l2_m2m_dev *m2m_dev;
+
+ void __iomem *base;
+ void __iomem *base_rd;
+ void __iomem *base_wr;
+ void __iomem *base_pp;
+
+ struct completion reset_complete;
+
+ struct icc_path *icc_data;
+ struct icc_path *icc_config;
+
+ struct mutex mutex;
+ struct list_head ctx_list;
+ void *context;
+};
+
+struct ope_dsc_config {
+ u32 input_width;
+ u32 input_height;
+ u32 output_width;
+ u32 output_height;
+ u32 phase_step_h;
+ u32 phase_step_v;
+};
+
+struct ope_stripe {
+ struct {
+ dma_addr_t addr;
+ u32 width;
+ u32 height;
+ u32 stride;
+ enum ope_stripe_location location;
+ enum ope_pixel_pattern pattern;
+ enum ope_unpacker_format format;
+ } src;
+ struct {
+ dma_addr_t addr;
+ u32 width;
+ u32 height;
+ u32 stride;
+ u32 x_init;
+ enum ope_packer_format format;
+ bool enabled;
+ } dst[OPE_WR_CLIENT_MAX];
+ struct ope_dsc_config dsc[OPE_DS_MAX];
+};
+
+struct ope_ctx {
+ struct v4l2_fh fh;
+ struct ope_dev *ope;
+
+ /* Processing mode */
+ int mode;
+ u8 alpha_component;
+ u8 rotation;
+ unsigned int framerate;
+
+ enum v4l2_colorspace colorspace;
+ enum v4l2_xfer_func xfer_func;
+
+ /* Source and destination queue data */
+ struct ope_q_data q_data_src;
+ struct ope_q_data q_data_dst;
+
+ u8 current_stripe;
+ struct ope_stripe stripe[OPE_MAX_STRIPE];
+
+ bool started;
+
+ struct list_head list;
+};
+
+struct ope_freq_tbl {
+ unsigned int load;
+ unsigned long freq;
+};
+
+static inline char *print_fourcc(u32 fmt)
+{
+ static char code[5];
+
+ code[0] = (unsigned char)(fmt & 0xff);
+ code[1] = (unsigned char)((fmt >> 8) & 0xff);
+ code[2] = (unsigned char)((fmt >> 16) & 0xff);
+ code[3] = (unsigned char)((fmt >> 24) & 0xff);
+ code[4] = '\0';
+
+ return code;
+}
+
+static inline enum ope_stripe_location ope_stripe_location(unsigned int index,
+ unsigned int count)
+{
+ if (count == 1)
+ return OPE_STRIPE_LOCATION_FULL;
+ if (index == 0)
+ return OPE_STRIPE_LOCATION_LEFT;
+ if (index == (count - 1))
+ return OPE_STRIPE_LOCATION_RIGHT;
+
+ return OPE_STRIPE_LOCATION_MIDDLE;
+}
+
+static inline bool ope_stripe_is_last(struct ope_stripe *stripe)
+{
+ if (!stripe)
+ return false;
+
+ if (stripe->src.location == OPE_STRIPE_LOCATION_RIGHT ||
+ stripe->src.location == OPE_STRIPE_LOCATION_FULL)
+ return true;
+
+ return false;
+}
+
+static inline struct ope_stripe *ope_get_stripe(struct ope_ctx *ctx, unsigned int index)
+{
+ return &ctx->stripe[index];
+}
+
+static inline struct ope_stripe *ope_current_stripe(struct ope_ctx *ctx)
+{
+ if (!ctx)
+ return NULL;
+
+ if (ctx->current_stripe >= OPE_MAX_STRIPE)
+ return NULL;
+
+ return ope_get_stripe(ctx, ctx->current_stripe);
+}
+
+static inline unsigned int ope_stripe_index(struct ope_ctx *ctx, struct ope_stripe *stripe)
+{
+ return stripe - &ctx->stripe[0];
+}
+
+static inline struct ope_stripe *ope_prev_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
+{
+ unsigned int index = ope_stripe_index(ctx, stripe);
+
+ return index ? ope_get_stripe(ctx, index - 1) : NULL;
+}
+
+static inline struct ope_q_data *get_q_data(struct ope_ctx *ctx, enum v4l2_buf_type type)
+{
+ if (V4L2_TYPE_IS_OUTPUT(type))
+ return &ctx->q_data_src;
+ else
+ return &ctx->q_data_dst;
+}
+
+static inline unsigned long __q_data_pixclk(struct ope_q_data *q, unsigned int fps)
+{
+ return (unsigned long)q->width * q->height * fps;
+}
+
+static inline unsigned int __q_data_load_avg(struct ope_q_data *q, unsigned int fps)
+{
+ /* Data load in kBps, calculated from pixel clock and bits per pixel */
+ return mult_frac(__q_data_pixclk(q, fps), q->fmt->depth, 1000) / 8;
+}
+
+static inline unsigned int __q_data_load_peak(struct ope_q_data *q, unsigned int fps)
+{
+ return __q_data_load_avg(q, fps) * 2;
+}
+
+static inline unsigned int __q_data_load_config(struct ope_q_data *q, unsigned int fps)
+{
+ unsigned int stripe_count = q->width / OPE_STRIPE_MAX_W + 1;
+ unsigned int stripe_load = 50 * 4 * fps; /* about 50 x 32-bit registers to configure */
+
+ /* Return config load in kBps */
+ return mult_frac(stripe_count, stripe_load, 1000);
+}
+
+static inline struct ope_ctx *file2ctx(struct file *file)
+{
+ return container_of(file->private_data, struct ope_ctx, fh);
+}
+
+static inline u32 ope_read(struct ope_dev *ope, u32 reg)
+{
+ return readl(ope->base + reg);
+}
+
+static inline void ope_write(struct ope_dev *ope, u32 reg, u32 value)
+{
+ writel(value, ope->base + reg);
+}
+
+static inline u32 ope_read_wr(struct ope_dev *ope, u32 reg)
+{
+ return readl_relaxed(ope->base_wr + reg);
+}
+
+static inline void ope_write_wr(struct ope_dev *ope, u32 reg, u32 value)
+{
+ writel_relaxed(value, ope->base_wr + reg);
+}
+
+static inline u32 ope_read_rd(struct ope_dev *ope, u32 reg)
+{
+ return readl_relaxed(ope->base_rd + reg);
+}
+
+static inline void ope_write_rd(struct ope_dev *ope, u32 reg, u32 value)
+{
+ writel_relaxed(value, ope->base_rd + reg);
+}
+
+static inline u32 ope_read_pp(struct ope_dev *ope, u32 reg)
+{
+ return readl_relaxed(ope->base_pp + reg);
+}
+
+static inline void ope_write_pp(struct ope_dev *ope, u32 reg, u32 value)
+{
+ writel_relaxed(value, ope->base_pp + reg);
+}
+
+static inline void ope_start(struct ope_dev *ope)
+{
+ wmb(); /* Ensure the next write occurs only after all prior normal memory accesses */
+ ope_write_rd(ope, OPE_BUS_RD_INPUT_IF_CMD, OPE_BUS_RD_INPUT_IF_CMD_GO_CMD);
+}
+
+static bool ope_pix_fmt_is_yuv(u32 fourcc)
+{
+ switch (fourcc) {
+ case V4L2_PIX_FMT_NV16:
+ case V4L2_PIX_FMT_NV12:
+ case V4L2_PIX_FMT_NV24:
+ case V4L2_PIX_FMT_NV61:
+ case V4L2_PIX_FMT_NV21:
+ case V4L2_PIX_FMT_NV42:
+ case V4L2_PIX_FMT_GREY:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static const struct ope_fmt *find_format(unsigned int pixelformat)
+{
+ const struct ope_fmt *fmt;
+ unsigned int i;
+
+ for (i = 0; i < OPE_NUM_FORMATS; i++) {
+ fmt = &formats[i];
+ if (fmt->fourcc == pixelformat)
+ break;
+ }
+
+ if (i == OPE_NUM_FORMATS)
+ return NULL;
+
+ return &formats[i];
+}
+
+static inline void ope_dbg_print_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
+{
+ struct ope_dev *ope = ctx->ope;
+ int i;
+
+ dev_dbg(ope->dev, "S%u/FE0: addr=%pad;W=%ub;H=%u;stride=%u;loc=%u;pattern=%u;fmt=%u\n",
+ ope_stripe_index(ctx, stripe), &stripe->src.addr, stripe->src.width,
+ stripe->src.height, stripe->src.stride, stripe->src.location, stripe->src.pattern,
+ stripe->src.format);
+
+ for (i = 0; i < OPE_DS_MAX; i++) {
+ struct ope_dsc_config *c = &stripe->dsc[i];
+
+ dev_dbg(ope->dev, "S%u/DSC%d: %ux%u => %ux%u\n",
+ ope_stripe_index(ctx, stripe), i, c->input_width, c->input_height,
+ c->output_width, c->output_height);
+ }
+
+ for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
+ if (!stripe->dst[i].enabled)
+ continue;
+
+ dev_dbg(ope->dev,
+ "S%u/WE%d: addr=%pad;X=%u;W=%upx;H=%u;stride=%u;fmt=%u\n",
+ ope_stripe_index(ctx, stripe), i, &stripe->dst[i].addr,
+ stripe->dst[i].x_init, stripe->dst[i].width, stripe->dst[i].height,
+ stripe->dst[i].stride, stripe->dst[i].format);
+ }
+}
+
+static void ope_gen_stripe_argb_dst(struct ope_ctx *ctx, struct ope_stripe *stripe, dma_addr_t dst)
+{
+ unsigned int index = ope_stripe_index(ctx, stripe);
+ unsigned int img_width = ctx->q_data_dst.width;
+ unsigned int width, height;
+ dma_addr_t addr;
+
+ /* This is GBRA64 format (le16)G (le16)B (le16)R (le16)A */
+
+ stripe->dst[OPE_WR_CLIENT_ARGB].enabled = true;
+
+ width = stripe->src.width;
+ height = stripe->src.height;
+
+ if (!index) {
+ addr = dst;
+ } else {
+ struct ope_stripe *prev = ope_get_stripe(ctx, index - 1);
+
+ addr = prev->dst[OPE_WR_CLIENT_ARGB].addr + prev->dst[OPE_WR_CLIENT_ARGB].width * 8;
+ }
+
+ stripe->dst[OPE_WR_CLIENT_ARGB].addr = addr;
+ stripe->dst[OPE_WR_CLIENT_ARGB].x_init = 0;
+ stripe->dst[OPE_WR_CLIENT_ARGB].width = width;
+ stripe->dst[OPE_WR_CLIENT_ARGB].height = height;
+ stripe->dst[OPE_WR_CLIENT_ARGB].stride = img_width * 8;
+ stripe->dst[OPE_WR_CLIENT_ARGB].format = OPE_PACKER_FMT_PLAIN_64;
+}
+
+static void ope_gen_stripe_yuv_dst(struct ope_ctx *ctx, struct ope_stripe *stripe, dma_addr_t dst)
+{
+ struct ope_stripe *prev = ope_prev_stripe(ctx, stripe);
+ unsigned int img_width = ctx->q_data_dst.width;
+ unsigned int img_height = ctx->q_data_dst.height;
+ unsigned int width, height;
+ u32 x_init = 0;
+
+ stripe->dst[OPE_WR_CLIENT_DISP_Y].enabled = true;
+ stripe->dst[OPE_WR_CLIENT_DISP_C].enabled = true;
+
+ /* Y */
+ width = stripe->dsc[OPE_DS_Y_DISP].output_width;
+ height = stripe->dsc[OPE_DS_Y_DISP].output_height;
+
+ if (prev)
+ x_init = prev->dst[OPE_WR_CLIENT_DISP_Y].x_init +
+ prev->dst[OPE_WR_CLIENT_DISP_Y].width;
+
+ stripe->dst[OPE_WR_CLIENT_DISP_Y].addr = dst;
+ stripe->dst[OPE_WR_CLIENT_DISP_Y].x_init = x_init;
+ stripe->dst[OPE_WR_CLIENT_DISP_Y].width = width;
+ stripe->dst[OPE_WR_CLIENT_DISP_Y].height = height;
+ stripe->dst[OPE_WR_CLIENT_DISP_Y].stride = img_width;
+ stripe->dst[OPE_WR_CLIENT_DISP_Y].format = OPE_PACKER_FMT_PLAIN_8;
+
+ /* UV */
+ width = stripe->dsc[OPE_DS_C_DISP].output_width;
+ height = stripe->dsc[OPE_DS_C_DISP].output_height;
+
+ if (prev)
+ x_init = prev->dst[OPE_WR_CLIENT_DISP_C].x_init +
+ prev->dst[OPE_WR_CLIENT_DISP_C].width;
+
+ stripe->dst[OPE_WR_CLIENT_DISP_C].addr = dst + img_width * img_height;
+ stripe->dst[OPE_WR_CLIENT_DISP_C].x_init = x_init;
+ stripe->dst[OPE_WR_CLIENT_DISP_C].format = ctx->q_data_dst.fmt->packer_format;
+ stripe->dst[OPE_WR_CLIENT_DISP_C].width = width * 2;
+ stripe->dst[OPE_WR_CLIENT_DISP_C].height = height;
+
+ switch (ctx->q_data_dst.fmt->fourcc) {
+ case V4L2_PIX_FMT_NV42:
+ case V4L2_PIX_FMT_NV24: /* YUV 4:4:4 */
+ stripe->dst[OPE_WR_CLIENT_DISP_C].stride = img_width * 2;
+ break;
+ case V4L2_PIX_FMT_GREY: /* No UV */
+ stripe->dst[OPE_WR_CLIENT_DISP_C].enabled = false;
+ break;
+ default:
+ stripe->dst[OPE_WR_CLIENT_DISP_C].stride = img_width;
+ }
+}
+
+static void ope_gen_stripe_dsc(struct ope_ctx *ctx, struct ope_stripe *stripe,
+ unsigned int h_scale, unsigned int v_scale)
+{
+ struct ope_dsc_config *dsc_c, *dsc_y;
+
+ dsc_c = &stripe->dsc[OPE_DS_C_DISP];
+ dsc_y = &stripe->dsc[OPE_DS_Y_DISP];
+
+ dsc_c->phase_step_h = dsc_y->phase_step_h = h_scale;
+ dsc_c->phase_step_v = dsc_y->phase_step_v = v_scale;
+
+ dsc_c->input_width = stripe->dsc[OPE_DS_C_PRE].output_width;
+ dsc_c->input_height = stripe->dsc[OPE_DS_C_PRE].output_height;
+
+ dsc_y->input_width = stripe->src.width;
+ dsc_y->input_height = stripe->src.height;
+
+ dsc_c->output_width = DS_OUTPUT_PIX(dsc_c->input_width, 0, h_scale);
+ dsc_c->output_height = DS_OUTPUT_PIX(dsc_c->input_height, 0, v_scale);
+
+ dsc_y->output_width = DS_OUTPUT_PIX(dsc_y->input_width, 0, h_scale);
+ dsc_y->output_height = DS_OUTPUT_PIX(dsc_y->input_height, 0, v_scale);
+
+ /* Adjust initial phase ? */
+}
+
+static void ope_gen_stripe_chroma_dsc(struct ope_ctx *ctx, struct ope_stripe *stripe)
+{
+ struct ope_dsc_config *dsc;
+
+ dsc = &stripe->dsc[OPE_DS_C_PRE];
+
+ dsc->input_width = stripe->src.width;
+ dsc->input_height = stripe->src.height;
+
+ switch (ctx->q_data_dst.fmt->fourcc) {
+ case V4L2_PIX_FMT_NV61:
+ case V4L2_PIX_FMT_NV16:
+ dsc->output_width = dsc->input_width / 2;
+ dsc->output_height = dsc->input_height;
+ break;
+ case V4L2_PIX_FMT_NV12:
+ case V4L2_PIX_FMT_NV21:
+ dsc->output_width = dsc->input_width / 2;
+ dsc->output_height = dsc->input_height / 2;
+ break;
+ default:
+ dsc->output_width = dsc->input_width;
+ dsc->output_height = dsc->input_height;
+ }
+
+ dsc->phase_step_h = DS_Q21(dsc->input_width, dsc->output_width);
+ dsc->phase_step_v = DS_Q21(dsc->input_height, dsc->output_height);
+}
+
+static void ope_gen_stripes(struct ope_ctx *ctx, dma_addr_t src, dma_addr_t dst)
+{
+ const struct ope_fmt *src_fmt = ctx->q_data_src.fmt;
+ const struct ope_fmt *dst_fmt = ctx->q_data_dst.fmt;
+ unsigned int num_stripes, width, i;
+ unsigned int h_scale, v_scale;
+
+ width = ctx->q_data_src.width;
+ num_stripes = DIV_ROUND_UP(ctx->q_data_src.width, OPE_STRIPE_MAX_W);
+ h_scale = DS_Q21(ctx->q_data_src.width, ctx->q_data_dst.width);
+ v_scale = DS_Q21(ctx->q_data_src.height, ctx->q_data_dst.height);
+
+ for (i = 0; i < num_stripes; i++) {
+ struct ope_stripe *stripe = &ctx->stripe[i];
+
+ /* Clear config */
+ memset(stripe, 0, sizeof(*stripe));
+
+ /* Fetch Engine */
+ stripe->src.addr = src;
+ stripe->src.width = width;
+ stripe->src.height = ctx->q_data_src.height;
+ stripe->src.stride = ctx->q_data_src.bytesperline;
+ stripe->src.location = ope_stripe_location(i, num_stripes);
+ stripe->src.pattern = src_fmt->pattern;
+ stripe->src.format = src_fmt->unpacker_format;
+
+ /* Ensure the last stripe will be large enough */
+ if (width > OPE_STRIPE_MAX_W && width < (OPE_STRIPE_MAX_W + OPE_STRIPE_MIN_W))
+ stripe->src.width -= OPE_STRIPE_MIN_W * 2;
+
+ v4l_bound_align_image(&stripe->src.width, src_fmt->align,
+ OPE_STRIPE_MAX_W, src_fmt->align,
+ &stripe->src.height, OPE_STRIPE_MIN_H, OPE_STRIPE_MAX_H,
+ OPE_ALIGN_H, 0);
+
+ width -= stripe->src.width;
+ src += stripe->src.width * src_fmt->depth / 8;
+
+ if (ope_pix_fmt_is_yuv(dst_fmt->fourcc)) {
+ /* YUV Chroma downscaling */
+ ope_gen_stripe_chroma_dsc(ctx, stripe);
+
+ /* YUV downscaling */
+ ope_gen_stripe_dsc(ctx, stripe, h_scale, v_scale);
+
+ /* Write Engines */
+ ope_gen_stripe_yuv_dst(ctx, stripe, dst);
+ } else {
+ ope_gen_stripe_argb_dst(ctx, stripe, dst);
+ }
+
+ /* Source width is in byte unit, not pixel */
+ stripe->src.width = stripe->src.width * src_fmt->depth / 8;
+
+ ope_dbg_print_stripe(ctx, stripe);
+ }
+}
+
+static void ope_prog_rgb2yuv(struct ope_dev *ope)
+{
+ /* Default RGB to YUV - no special effect - CF BT.601 */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0_V0, 0x04d) | /* R to Y 0.299 12sQ8 */
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0_V1, 0x096)); /* G to Y 0.587 12sQ8 */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_2,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_2_V2, 0x01d)); /* B to Y 0.114 12sQ8 */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_1,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_1_K, 0)); /* Y offset 0 9sQ0 */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG_AP, 0x0e6) |
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG_AM, 0x0e6)); /* 0.886 12sQ8 (Cb) */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG_BP, 0xfb3) |
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG_BM, 0xfb3)); /* -0.338 12sQ8 (Cb) */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG_CP, 0xb3) |
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG_CM, 0xb3)); /* 0.701 12sQ8 (Cr) */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG_DP, 0xfe3) |
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG_DM, 0xfe3)); /* -0.114 12sQ8 (Cr) */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_1,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_1_KCR, 128)); /* KCR 128 11s */
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_0,
+ FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_0_KCB, 128)); /* KCB 128 11s */
+
+ ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_MODULE_CFG,
+ OPE_PP_CLC_CHROMA_ENHAN_MODULE_CFG_EN);
+}
+
+static void ope_prog_bayer2rgb(struct ope_dev *ope)
+{
+ /* Fixed Settings */
+ ope_write_pp(ope, 0x860, 0x4001);
+ ope_write_pp(ope, 0x868, 128);
+ ope_write_pp(ope, 0x86c, 128 << 20);
+ ope_write_pp(ope, 0x870, 102);
+}
+
+static void ope_prog_wb(struct ope_dev *ope)
+{
+ /* Default white balance config */
+ u32 g_gain = OPE_WB(1, 1);
+ u32 b_gain = OPE_WB(3, 2);
+ u32 r_gain = OPE_WB(3, 2);
+
+ ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(0), g_gain);
+ ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(1), b_gain);
+ ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(2), r_gain);
+
+ ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_MODULE_CFG, OPE_PP_CLC_WB_GAIN_MODULE_CFG_EN);
+}
+
+static void ope_prog_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
+{
+ struct ope_dev *ope = ctx->ope;
+ int i;
+
+ dev_dbg(ope->dev, "Context %p - Programming S%u\n", ctx, ope_stripe_index(ctx, stripe));
+
+ /* Fetch Engine */
+ ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0, stripe->src.format);
+ ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE,
+ (stripe->src.width << 16) + stripe->src.height);
+ ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_ADDR_IMAGE, stripe->src.addr);
+ ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_STRIDE, stripe->src.stride);
+ ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CCIF_META_DATA,
+ FIELD_PREP(OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN, stripe->src.pattern));
+ ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CORE_CFG, OPE_BUS_RD_CLIENT_0_CORE_CFG_EN);
+
+ /* Write Engines */
+ for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
+ if (!stripe->dst[i].enabled) {
+ ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i), 0);
+ continue;
+ }
+
+ ope_write_wr(ope, OPE_BUS_WR_CLIENT_ADDR_IMAGE(i), stripe->dst[i].addr);
+ ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_0(i),
+ (stripe->dst[i].height << 16) + stripe->dst[i].width);
+ ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_1(i), stripe->dst[i].x_init);
+ ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_2(i), stripe->dst[i].stride);
+ ope_write_wr(ope, OPE_BUS_WR_CLIENT_PACKER_CFG(i), stripe->dst[i].format);
+ ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i),
+ OPE_BUS_WR_CLIENT_CFG_EN + OPE_BUS_WR_CLIENT_CFG_AUTORECOVER);
+ }
+
+ /* Downscalers */
+ for (i = 0; i < OPE_DS_MAX; i++) {
+ struct ope_dsc_config *dsc = &stripe->dsc[i];
+ u32 base = ope_ds_base[i];
+ u32 cfg = 0;
+
+ if (dsc->input_width != dsc->output_width) {
+ dsc->phase_step_h |= DS_RESOLUTION(dsc->input_width,
+ dsc->output_width) << 30;
+ cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN;
+ }
+
+ if (dsc->input_height != dsc->output_height) {
+ dsc->phase_step_v |= DS_RESOLUTION(dsc->input_height,
+ dsc->output_height) << 30;
+ cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN;
+ }
+
+ ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(base), cfg);
+ ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(base),
+ ((dsc->input_width - 1) << 16) + dsc->input_height - 1);
+ ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(base), dsc->phase_step_h);
+ ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(base), dsc->phase_step_v);
+ ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_CFG(base),
+ cfg ? OPE_PP_CLC_DOWNSCALE_MN_CFG_EN : 0);
+ }
+}
+
+/*
+ * mem2mem callbacks
+ */
+static void ope_device_run(void *priv)
+{
+ struct vb2_v4l2_buffer *src_buf, *dst_buf;
+ struct ope_ctx *ctx = priv;
+ struct ope_dev *ope = ctx->ope;
+ dma_addr_t src, dst;
+
+ dev_dbg(ope->dev, "Start context %p", ctx);
+
+ src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
+ if (!src_buf)
+ return;
+
+ dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+ if (!dst_buf)
+ return;
+
+ src = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
+ dst = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+
+ /* Generate stripes from full frame */
+ ope_gen_stripes(ctx, src, dst);
+
+ if (priv != ope->context) {
+ /* If context changed, reprogram the submodules */
+ ope_prog_wb(ope);
+ ope_prog_bayer2rgb(ope);
+ ope_prog_rgb2yuv(ope);
+ ope->context = priv;
+ }
+
+ /* Program the first stripe */
+ ope_prog_stripe(ctx, &ctx->stripe[0]);
+
+ /* Go! */
+ ope_start(ope);
+}
+
+static void ope_job_done(struct ope_ctx *ctx, enum vb2_buffer_state vbstate)
+{
+ struct vb2_v4l2_buffer *src, *dst;
+
+ if (!ctx)
+ return;
+
+ src = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+ dst = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+
+ if (dst && src)
+ dst->vb2_buf.timestamp = src->vb2_buf.timestamp;
+
+ if (src)
+ v4l2_m2m_buf_done(src, vbstate);
+ if (dst)
+ v4l2_m2m_buf_done(dst, vbstate);
+
+ v4l2_m2m_job_finish(ctx->ope->m2m_dev, ctx->fh.m2m_ctx);
+}
+
+static void ope_buf_done(struct ope_ctx *ctx)
+{
+ struct ope_stripe *stripe = ope_current_stripe(ctx);
+
+ if (!ctx)
+ return;
+
+ dev_dbg(ctx->ope->dev, "Context %p Stripe %u done\n",
+ ctx, ope_stripe_index(ctx, stripe));
+
+ if (ope_stripe_is_last(stripe)) {
+ ctx->current_stripe = 0;
+ ope_job_done(ctx, VB2_BUF_STATE_DONE);
+ } else {
+ ctx->current_stripe++;
+ ope_start(ctx->ope);
+ }
+}
+
+static void ope_job_abort(void *priv)
+{
+ struct ope_ctx *ctx = priv;
+
+ /* reset to abort */
+ ope_write(ctx->ope, OPE_TOP_RESET_CMD, OPE_TOP_RESET_CMD_SW);
+}
+
+static void ope_rup_done(struct ope_ctx *ctx)
+{
+ struct ope_stripe *stripe = ope_current_stripe(ctx);
+
+ /* We can program next stripe (double buffered registers) */
+ if (!ope_stripe_is_last(stripe))
+ ope_prog_stripe(ctx, ++stripe);
+}
+
+/*
+ * interrupt handler
+ */
+static void ope_fe_irq(struct ope_dev *ope)
+{
+ u32 status = ope_read_rd(ope, OPE_BUS_RD_INPUT_IF_IRQ_STATUS);
+
+ ope_write_rd(ope, OPE_BUS_RD_INPUT_IF_IRQ_CLEAR, status);
+ ope_write_rd(ope, OPE_BUS_RD_INPUT_IF_IRQ_CMD, OPE_BUS_RD_INPUT_IF_IRQ_CMD_CLEAR);
+
+ /* Nothing to do */
+}
+
+static void ope_we_irq(struct ope_ctx *ctx)
+{
+ struct ope_dev *ope;
+ u32 status0;
+
+ if (!ctx) {
+ pr_err("Instance released before the end of transaction\n");
+ return;
+ }
+
+ ope = ctx->ope;
+
+ status0 = ope_read_wr(ope, OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0);
+ ope_write_wr(ope, OPE_BUS_WR_INPUT_IF_IRQ_CLEAR_0, status0);
+ ope_write_wr(ope, OPE_BUS_WR_INPUT_IF_IRQ_CMD, OPE_BUS_WR_INPUT_IF_IRQ_CMD_CLEAR);
+
+ if (status0 & OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_CONS_VIOL) {
+ dev_err_ratelimited(ope->dev, "Write Engine configuration violates constrains\n");
+ ope_job_abort(ctx);
+ }
+
+ if (status0 & OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_IMG_SZ_VIOL) {
+ u32 status = ope_read_wr(ope, OPE_BUS_WR_IMAGE_SIZE_VIOLATION_STATUS);
+ int i;
+
+ for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
+ if (BIT(i) & status)
+ dev_err_ratelimited(ope->dev,
+ "Write Engine (WE%d) image size violation\n", i);
+ }
+
+ ope_job_abort(ctx);
+ }
+
+ if (status0 & OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_VIOL) {
+ dev_err_ratelimited(ope->dev, "Write Engine fatal violation\n");
+ ope_job_abort(ctx);
+ }
+
+ if (status0 & OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_RUP_DONE)
+ ope_rup_done(ctx);
+}
+
+static irqreturn_t ope_irq(int irq, void *dev_id)
+{
+ struct ope_dev *ope = dev_id;
+ struct ope_ctx *ctx = ope->m2m_dev ? v4l2_m2m_get_curr_priv(ope->m2m_dev) : NULL;
+ u32 status;
+
+ status = ope_read(ope, OPE_TOP_IRQ_STATUS);
+ ope_write(ope, OPE_TOP_IRQ_CLEAR, status);
+ ope_write(ope, OPE_TOP_IRQ_CMD, OPE_TOP_IRQ_CMD_CLEAR);
+
+ if (status & OPE_TOP_IRQ_STATUS_RST_DONE) {
+ ope_job_done(ctx, VB2_BUF_STATE_ERROR);
+ complete(&ope->reset_complete);
+ }
+
+ if (status & OPE_TOP_IRQ_STATUS_VIOL) {
+ u32 violation = ope_read(ope, OPE_TOP_VIOLATION_STATUS);
+
+ dev_warn(ope->dev, "OPE Violation: %u", violation);
+ }
+
+ if (status & OPE_TOP_IRQ_STATUS_FE)
+ ope_fe_irq(ope);
+
+ if (status & OPE_TOP_IRQ_STATUS_WE)
+ ope_we_irq(ctx);
+
+ if (status & OPE_TOP_IRQ_STATUS_IDLE)
+ ope_buf_done(ctx);
+
+ return IRQ_HANDLED;
+}
+
+static void ope_irq_init(struct ope_dev *ope)
+{
+ ope_write(ope, OPE_TOP_IRQ_MASK,
+ OPE_TOP_IRQ_STATUS_RST_DONE | OPE_TOP_IRQ_STATUS_WE |
+ OPE_TOP_IRQ_STATUS_VIOL | OPE_TOP_IRQ_STATUS_IDLE);
+
+ ope_write_wr(ope, OPE_BUS_WR_INPUT_IF_IRQ_MASK_0,
+ OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_RUP_DONE |
+ OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_CONS_VIOL |
+ OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_VIOL |
+ OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_IMG_SZ_VIOL);
+}
+
+/*
+ * video ioctls
+ */
+static int ope_querycap(struct file *file, void *priv, struct v4l2_capability *cap)
+{
+ strscpy(cap->driver, MEM2MEM_NAME, sizeof(cap->driver));
+ strscpy(cap->card, "Qualcomm Offline Processing Engine", sizeof(cap->card));
+ return 0;
+}
+
+static int ope_enum_fmt(struct v4l2_fmtdesc *f, u32 type)
+{
+ const struct ope_fmt *fmt;
+ int i, num = 0;
+
+ for (i = 0; i < OPE_NUM_FORMATS; ++i) {
+ if (formats[i].types & type) {
+ if (num == f->index)
+ break;
+ ++num;
+ }
+ }
+
+ if (i < OPE_NUM_FORMATS) {
+ fmt = &formats[i];
+ f->pixelformat = fmt->fourcc;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static int ope_enum_fmt_vid_cap(struct file *file, void *priv,
+ struct v4l2_fmtdesc *f)
+{
+ f->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+
+ return ope_enum_fmt(f, MEM2MEM_CAPTURE);
+}
+
+static int ope_enum_fmt_vid_out(struct file *file, void *priv,
+ struct v4l2_fmtdesc *f)
+{
+ f->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+
+ return ope_enum_fmt(f, MEM2MEM_OUTPUT);
+}
+
+static int ope_g_fmt(struct ope_ctx *ctx, struct v4l2_format *f)
+{
+ struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
+ struct ope_q_data *q_data;
+ struct vb2_queue *vq;
+
+ vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
+ if (!vq)
+ return -EINVAL;
+
+ q_data = get_q_data(ctx, f->type);
+
+ pix_mp->width = q_data->width;
+ pix_mp->height = q_data->height;
+ pix_mp->pixelformat = q_data->fmt->fourcc;
+ pix_mp->num_planes = 1;
+ pix_mp->field = V4L2_FIELD_NONE;
+ pix_mp->colorspace = ctx->colorspace;
+ pix_mp->xfer_func = ctx->xfer_func;
+ pix_mp->ycbcr_enc = q_data->ycbcr_enc;
+ pix_mp->quantization = q_data->quant;
+ pix_mp->plane_fmt[0].bytesperline = q_data->bytesperline;
+ pix_mp->plane_fmt[0].sizeimage = q_data->sizeimage;
+
+ return 0;
+}
+
+static int ope_g_fmt_vid_out(struct file *file, void *priv,
+ struct v4l2_format *f)
+{
+ return ope_g_fmt(file2ctx(file), f);
+}
+
+static int ope_g_fmt_vid_cap(struct file *file, void *priv,
+ struct v4l2_format *f)
+{
+ return ope_g_fmt(file2ctx(file), f);
+}
+
+static int ope_try_fmt(struct v4l2_format *f, const struct ope_fmt *fmt)
+{
+ struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
+ unsigned int stride = pix_mp->plane_fmt[0].bytesperline;
+ unsigned int size;
+
+ pix_mp->num_planes = 1;
+ pix_mp->field = V4L2_FIELD_NONE;
+
+ v4l_bound_align_image(&pix_mp->width, OPE_MIN_W, OPE_MAX_W, fmt->align,
+ &pix_mp->height, OPE_MIN_H, OPE_MAX_H, OPE_ALIGN_H, 0);
+
+ if (ope_pix_fmt_is_yuv(pix_mp->pixelformat)) {
+ stride = MAX(pix_mp->width, stride);
+ size = fmt->depth * pix_mp->width / 8 * pix_mp->height;
+ } else {
+ stride = MAX(pix_mp->width * fmt->depth / 8, stride);
+ size = stride * pix_mp->height;
+ }
+
+ pix_mp->plane_fmt[0].bytesperline = stride;
+ pix_mp->plane_fmt[0].sizeimage = size;
+
+ return 0;
+}
+
+static int ope_try_fmt_vid_cap(struct file *file, void *priv,
+ struct v4l2_format *f)
+{
+ struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
+ struct ope_ctx *ctx = file2ctx(file);
+ const struct ope_fmt *fmt;
+ int ret;
+
+ dev_dbg(ctx->ope->dev, "Try capture format: %ux%u-%s (planes:%u bpl:%u size:%u)\n",
+ pix_mp->width, pix_mp->height, print_fourcc(pix_mp->pixelformat),
+ pix_mp->num_planes, pix_mp->plane_fmt[0].bytesperline,
+ pix_mp->plane_fmt[0].sizeimage);
+
+ fmt = find_format(pix_mp->pixelformat);
+ if (!fmt) {
+ pix_mp->pixelformat = ctx->q_data_dst.fmt->fourcc;
+ fmt = ctx->q_data_dst.fmt;
+ }
+
+ if (!(fmt->types & MEM2MEM_CAPTURE) && (fmt != ctx->q_data_src.fmt))
+ return -EINVAL;
+
+ if (pix_mp->width > ctx->q_data_src.width ||
+ pix_mp->height > ctx->q_data_src.height) {
+ pix_mp->width = ctx->q_data_src.width;
+ pix_mp->height = ctx->q_data_src.height;
+ }
+
+ pix_mp->colorspace = ope_pix_fmt_is_yuv(pix_mp->pixelformat) ?
+ ctx->colorspace : V4L2_COLORSPACE_RAW;
+ pix_mp->xfer_func = ctx->xfer_func;
+ pix_mp->ycbcr_enc = V4L2_MAP_YCBCR_ENC_DEFAULT(pix_mp->colorspace);
+ pix_mp->quantization = V4L2_MAP_QUANTIZATION_DEFAULT(true,
+ pix_mp->colorspace, pix_mp->ycbcr_enc);
+
+ ret = ope_try_fmt(f, fmt);
+
+ dev_dbg(ctx->ope->dev, "Fixed capture format: %ux%u-%s (planes:%u bpl:%u size:%u)\n",
+ pix_mp->width, pix_mp->height, print_fourcc(pix_mp->pixelformat),
+ pix_mp->num_planes, pix_mp->plane_fmt[0].bytesperline,
+ pix_mp->plane_fmt[0].sizeimage);
+
+ return ret;
+}
+
+static int ope_try_fmt_vid_out(struct file *file, void *priv,
+ struct v4l2_format *f)
+{
+ struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
+ const struct ope_fmt *fmt;
+ struct ope_ctx *ctx = file2ctx(file);
+ int ret;
+
+ dev_dbg(ctx->ope->dev, "Try output format: %ux%u-%s (planes:%u bpl:%u size:%u)\n",
+ pix_mp->width, pix_mp->height, print_fourcc(pix_mp->pixelformat),
+ pix_mp->num_planes, pix_mp->plane_fmt[0].bytesperline,
+ pix_mp->plane_fmt[0].sizeimage);
+
+ fmt = find_format(pix_mp->pixelformat);
+ if (!fmt) {
+ pix_mp->pixelformat = ctx->q_data_src.fmt->fourcc;
+ fmt = ctx->q_data_src.fmt;
+ }
+ if (!(fmt->types & MEM2MEM_OUTPUT))
+ return -EINVAL;
+
+ if (!pix_mp->colorspace)
+ pix_mp->colorspace = V4L2_COLORSPACE_SRGB;
+
+ pix_mp->ycbcr_enc = V4L2_MAP_YCBCR_ENC_DEFAULT(pix_mp->colorspace);
+ pix_mp->quantization = V4L2_MAP_QUANTIZATION_DEFAULT(true,
+ pix_mp->colorspace, pix_mp->ycbcr_enc);
+
+ ret = ope_try_fmt(f, fmt);
+
+ dev_dbg(ctx->ope->dev, "Fixed output format: %ux%u-%s (planes:%u bpl:%u size:%u)\n",
+ pix_mp->width, pix_mp->height, print_fourcc(pix_mp->pixelformat),
+ pix_mp->num_planes, pix_mp->plane_fmt[0].bytesperline,
+ pix_mp->plane_fmt[0].sizeimage);
+
+ return ret;
+}
+
+static int ope_s_fmt(struct ope_ctx *ctx, struct v4l2_format *f)
+{
+ struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
+ struct ope_q_data *q_data;
+ struct vb2_queue *vq;
+
+ vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
+ if (!vq)
+ return -EINVAL;
+
+ q_data = get_q_data(ctx, f->type);
+ if (!q_data)
+ return -EINVAL;
+
+ if (vb2_is_busy(vq)) {
+ v4l2_err(&ctx->ope->v4l2_dev, "%s queue busy\n", __func__);
+ return -EBUSY;
+ }
+
+ q_data->fmt = find_format(pix_mp->pixelformat);
+ if (!q_data->fmt)
+ return -EINVAL;
+ q_data->width = pix_mp->width;
+ q_data->height = pix_mp->height;
+ q_data->bytesperline = pix_mp->plane_fmt[0].bytesperline;
+ q_data->sizeimage = pix_mp->plane_fmt[0].sizeimage;
+
+ dev_dbg(ctx->ope->dev, "Set %s format: %ux%u %s (%u bytes)\n",
+ V4L2_TYPE_IS_OUTPUT(f->type) ? "output" : "capture",
+ q_data->width, q_data->height, print_fourcc(q_data->fmt->fourcc),
+ q_data->sizeimage);
+
+ return 0;
+}
+
+static int ope_s_fmt_vid_cap(struct file *file, void *priv,
+ struct v4l2_format *f)
+{
+ struct ope_ctx *ctx = file2ctx(file);
+ int ret;
+
+ ret = ope_try_fmt_vid_cap(file, priv, f);
+ if (ret)
+ return ret;
+
+ ret = ope_s_fmt(file2ctx(file), f);
+ if (ret)
+ return ret;
+
+ ctx->q_data_dst.ycbcr_enc = f->fmt.pix_mp.ycbcr_enc;
+ ctx->q_data_dst.quant = f->fmt.pix_mp.quantization;
+
+ return 0;
+}
+
+static int ope_s_fmt_vid_out(struct file *file, void *priv,
+ struct v4l2_format *f)
+{
+ struct ope_ctx *ctx = file2ctx(file);
+ int ret;
+
+ ret = ope_try_fmt_vid_out(file, priv, f);
+ if (ret)
+ return ret;
+
+ ret = ope_s_fmt(file2ctx(file), f);
+ if (ret)
+ return ret;
+
+ ctx->colorspace = f->fmt.pix_mp.colorspace;
+ ctx->xfer_func = f->fmt.pix_mp.xfer_func;
+ ctx->q_data_src.ycbcr_enc = f->fmt.pix_mp.ycbcr_enc;
+ ctx->q_data_src.quant = f->fmt.pix_mp.quantization;
+
+ return 0;
+}
+
+static int ope_enum_framesizes(struct file *file, void *fh,
+ struct v4l2_frmsizeenum *fsize)
+{
+ if (fsize->index > 0)
+ return -EINVAL;
+
+ if (!find_format(fsize->pixel_format))
+ return -EINVAL;
+
+ fsize->type = V4L2_FRMSIZE_TYPE_STEPWISE;
+ fsize->stepwise.min_width = OPE_MIN_W;
+ fsize->stepwise.max_width = OPE_MAX_W;
+ fsize->stepwise.step_width = 1 << OPE_ALIGN_W;
+ fsize->stepwise.min_height = OPE_MIN_H;
+ fsize->stepwise.max_height = OPE_MAX_H;
+ fsize->stepwise.step_height = 1 << OPE_ALIGN_H;
+
+ return 0;
+}
+
+static int ope_enum_frameintervals(struct file *file, void *fh,
+ struct v4l2_frmivalenum *fival)
+{
+ fival->type = V4L2_FRMIVAL_TYPE_STEPWISE;
+ fival->stepwise.min.numerator = 1;
+ fival->stepwise.min.denominator = 120;
+ fival->stepwise.max.numerator = 1;
+ fival->stepwise.max.denominator = 1;
+ fival->stepwise.step.numerator = 1;
+ fival->stepwise.step.denominator = 1;
+
+ return 0;
+}
+
+static int ope_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+ return -EINVAL;
+}
+
+static const struct v4l2_ctrl_ops ope_ctrl_ops = {
+ .s_ctrl = ope_s_ctrl,
+};
+
+static const struct v4l2_ioctl_ops ope_ioctl_ops = {
+ .vidioc_querycap = ope_querycap,
+
+ .vidioc_enum_fmt_vid_cap = ope_enum_fmt_vid_cap,
+ .vidioc_g_fmt_vid_cap_mplane = ope_g_fmt_vid_cap,
+ .vidioc_try_fmt_vid_cap_mplane = ope_try_fmt_vid_cap,
+ .vidioc_s_fmt_vid_cap_mplane = ope_s_fmt_vid_cap,
+
+ .vidioc_enum_fmt_vid_out = ope_enum_fmt_vid_out,
+ .vidioc_g_fmt_vid_out_mplane = ope_g_fmt_vid_out,
+ .vidioc_try_fmt_vid_out_mplane = ope_try_fmt_vid_out,
+ .vidioc_s_fmt_vid_out_mplane = ope_s_fmt_vid_out,
+
+ .vidioc_enum_framesizes = ope_enum_framesizes,
+ .vidioc_enum_frameintervals = ope_enum_frameintervals,
+
+ .vidioc_reqbufs = v4l2_m2m_ioctl_reqbufs,
+ .vidioc_querybuf = v4l2_m2m_ioctl_querybuf,
+ .vidioc_qbuf = v4l2_m2m_ioctl_qbuf,
+ .vidioc_dqbuf = v4l2_m2m_ioctl_dqbuf,
+ .vidioc_prepare_buf = v4l2_m2m_ioctl_prepare_buf,
+ .vidioc_create_bufs = v4l2_m2m_ioctl_create_bufs,
+ .vidioc_expbuf = v4l2_m2m_ioctl_expbuf,
+
+ .vidioc_streamon = v4l2_m2m_ioctl_streamon,
+ .vidioc_streamoff = v4l2_m2m_ioctl_streamoff,
+
+ .vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
+ .vidioc_unsubscribe_event = v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int ope_queue_setup(struct vb2_queue *vq,
+ unsigned int *nbuffers, unsigned int *nplanes,
+ unsigned int sizes[], struct device *alloc_devs[])
+{
+ struct ope_ctx *ctx = vb2_get_drv_priv(vq);
+ struct ope_q_data *q_data = get_q_data(ctx, vq->type);
+ unsigned int size = q_data->sizeimage;
+
+ if (*nplanes) {
+ if (*nplanes != 1)
+ return -EINVAL;
+ } else {
+ *nplanes = 1;
+ }
+
+ if (sizes[0]) {
+ if (sizes[0] < size)
+ return -EINVAL;
+ } else {
+ sizes[0] = size;
+ }
+
+ dev_dbg(ctx->ope->dev, "get %d buffer(s) of size %d each.\n", *nbuffers, size);
+
+ return 0;
+}
+
+static int ope_buf_prepare(struct vb2_buffer *vb)
+{
+ struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
+ struct ope_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+ struct ope_dev *ope = ctx->ope;
+ struct ope_q_data *q_data;
+
+ q_data = get_q_data(ctx, vb->vb2_queue->type);
+ if (V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)) {
+ if (vbuf->field == V4L2_FIELD_ANY)
+ vbuf->field = V4L2_FIELD_NONE;
+ if (vbuf->field != V4L2_FIELD_NONE) {
+ v4l2_err(&ope->v4l2_dev, "Field isn't supported\n");
+ return -EINVAL;
+ }
+ }
+
+ if (vb2_plane_size(vb, 0) < q_data->sizeimage) {
+ v4l2_err(&ope->v4l2_dev, "Data will not fit into plane (%lu < %lu)\n",
+ vb2_plane_size(vb, 0), (long)q_data->sizeimage);
+ return -EINVAL;
+ }
+
+ if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type))
+ vb2_set_plane_payload(vb, 0, q_data->sizeimage);
+
+ vbuf->sequence = q_data->sequence++;
+
+ return 0;
+}
+
+static void ope_adjust_power(struct ope_dev *ope)
+{
+ int ret;
+ unsigned long pixclk = 0;
+ unsigned int loadavg = 0;
+ unsigned int loadpeak = 0;
+ unsigned int loadconfig = 0;
+ struct ope_ctx *ctx;
+
+ lockdep_assert_held(&ope->mutex);
+
+ list_for_each_entry(ctx, &ope->ctx_list, list) {
+ if (!ctx->started)
+ continue;
+
+ if (!ctx->framerate)
+ ctx->framerate = DEFAULT_FRAMERATE;
+
+ pixclk += __q_data_pixclk(&ctx->q_data_src, ctx->framerate);
+ loadavg += __q_data_load_avg(&ctx->q_data_src, ctx->framerate);
+ loadavg += __q_data_load_avg(&ctx->q_data_dst, ctx->framerate);
+ loadpeak += __q_data_load_peak(&ctx->q_data_src, ctx->framerate);
+ loadpeak += __q_data_load_peak(&ctx->q_data_dst, ctx->framerate);
+ loadconfig += __q_data_load_config(&ctx->q_data_src, ctx->framerate);
+ }
+
+ /* 30% margin for overhead */
+ pixclk = mult_frac(pixclk, 13, 10);
+
+ dev_dbg(ope->dev, "Adjusting clock:%luHz avg:%uKBps peak:%uKBps config:%uKBps\n",
+ pixclk, loadavg, loadpeak, loadconfig);
+
+ ret = dev_pm_opp_set_rate(ope->dev, pixclk);
+ if (ret)
+ dev_warn(ope->dev, "Failed to adjust OPP rate: %d\n", ret);
+
+ ret = icc_set_bw(ope->icc_data, loadavg, loadpeak);
+ if (ret)
+ dev_warn(ope->dev, "Failed to set data path bandwidth: %d\n", ret);
+
+ ret = icc_set_bw(ope->icc_config, loadconfig, loadconfig * 5);
+ if (ret)
+ dev_warn(ope->dev, "Failed to set config path bandwidth: %d\n", ret);
+}
+
+static void ope_buf_queue(struct vb2_buffer *vb)
+{
+ struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
+ struct ope_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+
+ v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+}
+
+static int ope_start_streaming(struct vb2_queue *q, unsigned int count)
+{
+ struct ope_ctx *ctx = vb2_get_drv_priv(q);
+ struct ope_dev *ope = ctx->ope;
+ struct ope_q_data *q_data;
+ int ret;
+
+ dev_dbg(ope->dev, "Start streaming (ctx %p/%u)\n", ctx, q->type);
+
+ lockdep_assert_held(&ope->mutex);
+
+ q_data = get_q_data(ctx, q->type);
+ q_data->sequence = 0;
+
+ if (V4L2_TYPE_IS_OUTPUT(q->type)) {
+ ctx->started = true;
+ ope_adjust_power(ctx->ope);
+ }
+
+ ret = pm_runtime_resume_and_get(ctx->ope->dev);
+ if (ret) {
+ dev_err(ope->dev, "Could not resume\n");
+ return ret;
+ }
+
+ ope_irq_init(ope);
+
+ return 0;
+}
+
+static void ope_stop_streaming(struct vb2_queue *q)
+{
+ struct ope_ctx *ctx = vb2_get_drv_priv(q);
+ struct ope_dev *ope = ctx->ope;
+ struct vb2_v4l2_buffer *vbuf;
+
+ dev_dbg(ctx->ope->dev, "Stop streaming (ctx %p/%u)\n", ctx, q->type);
+
+ lockdep_assert_held(&ope->mutex);
+
+ if (ope->context == ctx)
+ ope->context = NULL;
+
+ if (V4L2_TYPE_IS_OUTPUT(q->type)) {
+ ctx->started = false;
+ ope_adjust_power(ctx->ope);
+ }
+
+ pm_runtime_put(ctx->ope->dev);
+
+ for (;;) {
+ if (V4L2_TYPE_IS_OUTPUT(q->type))
+ vbuf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+ else
+ vbuf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+ if (vbuf == NULL)
+ return;
+
+ v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_ERROR);
+ }
+}
+
+static const struct vb2_ops ope_qops = {
+ .queue_setup = ope_queue_setup,
+ .buf_prepare = ope_buf_prepare,
+ .buf_queue = ope_buf_queue,
+ .start_streaming = ope_start_streaming,
+ .stop_streaming = ope_stop_streaming,
+};
+
+static int queue_init(void *priv, struct vb2_queue *src_vq,
+ struct vb2_queue *dst_vq)
+{
+ struct ope_ctx *ctx = priv;
+ int ret;
+
+ src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+ src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+ src_vq->drv_priv = ctx;
+ src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+ src_vq->ops = &ope_qops;
+ src_vq->mem_ops = &vb2_dma_contig_memops;
+ src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+ src_vq->lock = &ctx->ope->mutex;
+ src_vq->dev = ctx->ope->v4l2_dev.dev;
+
+ ret = vb2_queue_init(src_vq);
+ if (ret)
+ return ret;
+
+ dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+ dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+ dst_vq->drv_priv = ctx;
+ dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+ dst_vq->ops = &ope_qops;
+ dst_vq->mem_ops = &vb2_dma_contig_memops;
+ dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+ dst_vq->lock = &ctx->ope->mutex;
+ dst_vq->dev = ctx->ope->v4l2_dev.dev;
+
+ return vb2_queue_init(dst_vq);
+}
+
+/*
+ * File operations
+ */
+static int ope_open(struct file *file)
+{
+ struct ope_dev *ope = video_drvdata(file);
+ struct ope_ctx *ctx = NULL;
+ int rc = 0;
+
+ if (mutex_lock_interruptible(&ope->mutex))
+ return -ERESTARTSYS;
+
+ ctx = kvzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx) {
+ rc = -ENOMEM;
+ goto open_unlock;
+ }
+
+ v4l2_fh_init(&ctx->fh, video_devdata(file));
+ file->private_data = &ctx->fh;
+ ctx->ope = ope;
+ ctx->colorspace = V4L2_COLORSPACE_RAW;
+
+ ctx->q_data_src.fmt = find_format(V4L2_PIX_FMT_SRGGB8);
+ ctx->q_data_src.width = 640;
+ ctx->q_data_src.height = 480;
+ ctx->q_data_src.bytesperline = 640;
+ ctx->q_data_src.sizeimage = 640 * 480;
+
+ ctx->q_data_dst.fmt = find_format(V4L2_PIX_FMT_NV12);
+ ctx->q_data_dst.width = 640;
+ ctx->q_data_dst.height = 480;
+ ctx->q_data_dst.bytesperline = 640;
+ ctx->q_data_dst.sizeimage = 640 * 480 * 3 / 2;
+
+ ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(ope->m2m_dev, ctx, &queue_init);
+ if (IS_ERR(ctx->fh.m2m_ctx)) {
+ rc = PTR_ERR(ctx->fh.m2m_ctx);
+ v4l2_fh_exit(&ctx->fh);
+ kvfree(ctx);
+ goto open_unlock;
+ }
+
+ v4l2_fh_add(&ctx->fh, file);
+
+ list_add(&ctx->list, &ope->ctx_list);
+
+ dev_dbg(ope->dev, "Created ctx %p\n", ctx);
+
+open_unlock:
+ mutex_unlock(&ope->mutex);
+ return rc;
+}
+
+static int ope_release(struct file *file)
+{
+ struct ope_dev *ope = video_drvdata(file);
+ struct ope_ctx *ctx = file2ctx(file);
+
+ dev_dbg(ope->dev, "Releasing ctx %p\n", ctx);
+
+ guard(mutex)(&ope->mutex);
+
+ if (ope->context == ctx)
+ ope->context = NULL;
+
+ list_del(&ctx->list);
+ v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+ v4l2_fh_del(&ctx->fh, file);
+ v4l2_fh_exit(&ctx->fh);
+ kvfree(ctx);
+
+ return 0;
+}
+
+static const struct v4l2_file_operations ope_fops = {
+ .owner = THIS_MODULE,
+ .open = ope_open,
+ .release = ope_release,
+ .poll = v4l2_m2m_fop_poll,
+ .unlocked_ioctl = video_ioctl2,
+ .mmap = v4l2_m2m_fop_mmap,
+};
+
+static const struct video_device ope_videodev = {
+ .name = MEM2MEM_NAME,
+ .vfl_dir = VFL_DIR_M2M,
+ .fops = &ope_fops,
+ .device_caps = V4L2_CAP_STREAMING | V4L2_CAP_VIDEO_M2M_MPLANE,
+ .ioctl_ops = &ope_ioctl_ops,
+ .minor = -1,
+ .release = video_device_release_empty,
+};
+
+static const struct v4l2_m2m_ops m2m_ops = {
+ .device_run = ope_device_run,
+ .job_abort = ope_job_abort,
+};
+
+static int ope_soft_reset(struct ope_dev *ope)
+{
+ u32 version;
+ int ret = 0;
+
+ ret = pm_runtime_resume_and_get(ope->dev);
+ if (ret) {
+ dev_err(ope->dev, "Could not resume\n");
+ return ret;
+ }
+
+ version = ope_read(ope, OPE_TOP_HW_VERSION);
+
+ dev_dbg(ope->dev, "HW Version = %u.%u.%u\n",
+ (u32)FIELD_GET(OPE_TOP_HW_VERSION_GEN, version),
+ (u32)FIELD_GET(OPE_TOP_HW_VERSION_REV, version),
+ (u32)FIELD_GET(OPE_TOP_HW_VERSION_STEP, version));
+
+ reinit_completion(&ope->reset_complete);
+
+ ope_write(ope, OPE_TOP_RESET_CMD, OPE_TOP_RESET_CMD_SW);
+
+ if (!wait_for_completion_timeout(&ope->reset_complete,
+ msecs_to_jiffies(OPE_RESET_TIMEOUT_MS))) {
+ dev_err(ope->dev, "Reset timeout\n");
+ ret = -ETIMEDOUT;
+ }
+
+ pm_runtime_put(ope->dev);
+
+ return ret;
+}
+
+static int ope_init_power(struct ope_dev *ope)
+{
+ struct dev_pm_domain_list *pmdomains;
+ struct device *dev = ope->dev;
+ int ret;
+
+ ope->icc_data = devm_of_icc_get(dev, "data");
+ if (IS_ERR(ope->icc_data))
+ return dev_err_probe(dev, PTR_ERR(ope->icc_data),
+ "failed to get interconnect data path\n");
+
+ ope->icc_config = devm_of_icc_get(dev, "config");
+ if (IS_ERR(ope->icc_config))
+ return dev_err_probe(dev, PTR_ERR(ope->icc_config),
+ "failed to get interconnect config path\n");
+
+ /* Devices with multiple PM domains must be attached separately */
+ devm_pm_domain_attach_list(dev, NULL, &pmdomains);
+
+ /* core clock is scaled as part of operating points */
+ ret = devm_pm_opp_set_clkname(dev, "core");
+ if (ret)
+ return ret;
+
+ ret = devm_pm_opp_of_add_table(dev);
+ if (ret && ret != -ENODEV)
+ return dev_err_probe(dev, ret, "invalid OPP table\n");
+
+ ret = devm_pm_runtime_enable(dev);
+ if (ret)
+ return ret;
+
+ ret = devm_pm_clk_create(dev);
+ if (ret)
+ return ret;
+
+ ret = of_pm_clk_add_clks(dev);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static int ope_init_mmio(struct ope_dev *ope)
+{
+ struct platform_device *pdev = to_platform_device(ope->dev);
+
+ ope->base = devm_platform_ioremap_resource_byname(pdev, "top");
+ if (IS_ERR(ope->base))
+ return PTR_ERR(ope->base);
+
+ ope->base_rd = devm_platform_ioremap_resource_byname(pdev, "bus_read");
+ if (IS_ERR(ope->base_rd))
+ return PTR_ERR(ope->base_rd);
+
+ ope->base_wr = devm_platform_ioremap_resource_byname(pdev, "bus_write");
+ if (IS_ERR(ope->base_wr))
+ return PTR_ERR(ope->base_wr);
+
+ ope->base_pp = devm_platform_ioremap_resource_byname(pdev, "pipeline");
+ if (IS_ERR(ope->base_pp))
+ return PTR_ERR(ope->base_pp);
+
+ return 0;
+}
+
+static int ope_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct video_device *vfd;
+ struct ope_dev *ope;
+ int ret, irq;
+
+ ope = devm_kzalloc(&pdev->dev, sizeof(*ope), GFP_KERNEL);
+ if (!ope)
+ return -ENOMEM;
+
+ ope->dev = dev;
+ init_completion(&ope->reset_complete);
+
+ ret = ope_init_power(ope);
+ if (ret)
+ return dev_err_probe(dev, ret, "Power init failed\n");
+
+ ret = ope_init_mmio(ope);
+ if (ret)
+ return dev_err_probe(dev, ret, "MMIO init failed\n");
+
+ irq = platform_get_irq(pdev, 0);
+ if (irq < 0)
+ return dev_err_probe(dev, irq, "Unable to get IRQ\n");
+
+ ret = devm_request_irq(dev, irq, ope_irq, IRQF_TRIGGER_RISING, "ope", ope);
+ if (ret < 0)
+ return dev_err_probe(dev, ret, "Requesting IRQ failed\n");
+
+ ret = ope_soft_reset(ope);
+ if (ret < 0)
+ return ret;
+
+ ret = v4l2_device_register(&pdev->dev, &ope->v4l2_dev);
+ if (ret)
+ return dev_err_probe(dev, ret, "Registering V4L2 device failed\n");
+
+ mutex_init(&ope->mutex);
+ INIT_LIST_HEAD(&ope->ctx_list);
+
+ ope->vfd = ope_videodev;
+ vfd = &ope->vfd;
+ vfd->lock = &ope->mutex;
+ vfd->v4l2_dev = &ope->v4l2_dev;
+ video_set_drvdata(vfd, ope);
+ snprintf(vfd->name, sizeof(vfd->name), "%s", ope_videodev.name);
+
+ platform_set_drvdata(pdev, ope);
+
+ ope->m2m_dev = v4l2_m2m_init(&m2m_ops);
+ if (IS_ERR(ope->m2m_dev)) {
+ ret = dev_err_probe(dev, PTR_ERR(ope->m2m_dev), "Failed to init mem2mem device\n");
+ goto err_unregister_v4l2;
+ }
+
+ ret = video_register_device(vfd, VFL_TYPE_VIDEO, 0);
+ if (ret) {
+ dev_err(dev, "Failed to refgister video device\n");
+ goto err_release_m2m;
+ }
+
+ /* TODO: Add stat device and link it to media */
+ ope->mdev.dev = dev;
+ strscpy(ope->mdev.model, MEM2MEM_NAME, sizeof(ope->mdev.model));
+ media_device_init(&ope->mdev);
+ ope->v4l2_dev.mdev = &ope->mdev;
+
+ ret = v4l2_m2m_register_media_controller(ope->m2m_dev, vfd,
+ MEDIA_ENT_F_PROC_VIDEO_PIXEL_FORMATTER);
+ if (ret) {
+ dev_err(&pdev->dev, "Failed to register m2m media controller\n");
+ goto err_unregister_video;
+ }
+
+ ret = media_device_register(&ope->mdev);
+ if (ret) {
+ dev_err(&pdev->dev, "Failed to register media device\n");
+ goto err_unregister_m2m_mc;
+ }
+
+ return 0;
+
+err_unregister_m2m_mc:
+ v4l2_m2m_unregister_media_controller(ope->m2m_dev);
+err_unregister_video:
+ video_unregister_device(&ope->vfd);
+err_release_m2m:
+ v4l2_m2m_release(ope->m2m_dev);
+err_unregister_v4l2:
+ v4l2_device_unregister(&ope->v4l2_dev);
+
+ return ret;
+}
+
+static void ope_remove(struct platform_device *pdev)
+{
+ struct ope_dev *ope = platform_get_drvdata(pdev);
+
+ media_device_unregister(&ope->mdev);
+ v4l2_m2m_unregister_media_controller(ope->m2m_dev);
+ video_unregister_device(&ope->vfd);
+ v4l2_m2m_release(ope->m2m_dev);
+ v4l2_device_unregister(&ope->v4l2_dev);
+}
+
+static const struct of_device_id ope_dt_ids[] = {
+ { .compatible = "qcom,qcm2290-camss-ope"},
+ { },
+};
+MODULE_DEVICE_TABLE(of, ope_dt_ids);
+
+static const struct dev_pm_ops ope_pm_ops = {
+ SET_RUNTIME_PM_OPS(pm_clk_suspend, pm_clk_resume, NULL)
+};
+
+static struct platform_driver ope_driver = {
+ .probe = ope_probe,
+ .remove = ope_remove,
+ .driver = {
+ .name = MEM2MEM_NAME,
+ .of_match_table = ope_dt_ids,
+ .pm = &ope_pm_ops,
+ },
+};
+
+module_platform_driver(ope_driver);
+
+MODULE_DESCRIPTION("CAMSS Offline Processing Engine");
+MODULE_AUTHOR("Loic Poulain <loic.poulain@oss.qualcomm.com>");
+MODULE_LICENSE("GPL");
--
2.34.1
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node
2026-03-23 12:58 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Loic Poulain
2026-03-23 12:58 ` [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE) Loic Poulain
2026-03-23 12:58 ` [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver Loic Poulain
@ 2026-03-23 12:58 ` Loic Poulain
2026-03-23 13:03 ` Bryan O'Donoghue
2026-03-23 13:24 ` Konrad Dybcio
2026-03-24 12:54 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Bryan O'Donoghue
2026-04-05 19:57 ` Laurent Pinchart
4 siblings, 2 replies; 47+ messages in thread
From: Loic Poulain @ 2026-03-23 12:58 UTC (permalink / raw)
To: bod, vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab, Loic Poulain
Add the Qualcomm CAMSS Offline Processing Engine (OPE) node for
QCM2290. The OPE is a memory-to-memory image processing block used in
offline imaging pipelines.
The node includes register regions, clocks, interconnects, IOMMU
mappings, power domains, interrupts, and an associated OPP table.
At the moment we assign a fixed rate to GCC_CAMSS_AXI_CLK since this
clock is shared across multiple CAMSS components and there is currently
no support for dynamically scaling it.
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
arch/arm64/boot/dts/qcom/agatti.dtsi | 72 ++++++++++++++++++++++++++++
1 file changed, 72 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/agatti.dtsi b/arch/arm64/boot/dts/qcom/agatti.dtsi
index f9b46cf1c646..358ebfc99552 100644
--- a/arch/arm64/boot/dts/qcom/agatti.dtsi
+++ b/arch/arm64/boot/dts/qcom/agatti.dtsi
@@ -1935,6 +1935,78 @@ port@1 {
};
};
+ isp_ope: isp@5c42400 {
+ compatible = "qcom,qcm2290-camss-ope";
+
+ reg = <0x0 0x5c42400 0x0 0x200>,
+ <0x0 0x5c46c00 0x0 0x190>,
+ <0x0 0x5c46d90 0x0 0xa00>,
+ <0x0 0x5c42800 0x0 0x4400>,
+ <0x0 0x5c42600 0x0 0x200>;
+ reg-names = "top",
+ "bus_read",
+ "bus_write",
+ "pipeline",
+ "qos";
+
+ clocks = <&gcc GCC_CAMSS_AXI_CLK>,
+ <&gcc GCC_CAMSS_OPE_CLK>,
+ <&gcc GCC_CAMSS_OPE_AHB_CLK>,
+ <&gcc GCC_CAMSS_NRT_AXI_CLK>,
+ <&gcc GCC_CAMSS_TOP_AHB_CLK>;
+ clock-names = "axi", "core", "iface", "nrt", "top";
+ assigned-clocks = <&gcc GCC_CAMSS_AXI_CLK>;
+ assigned-clock-rates = <300000000>;
+
+ interrupts = <GIC_SPI 209 IRQ_TYPE_EDGE_RISING>;
+
+ interconnects = <&bimc MASTER_APPSS_PROC RPM_ACTIVE_TAG
+ &config_noc SLAVE_CAMERA_CFG RPM_ACTIVE_TAG>,
+ <&mmnrt_virt MASTER_CAMNOC_SF RPM_ALWAYS_TAG
+ &bimc SLAVE_EBI1 RPM_ALWAYS_TAG>;
+ interconnect-names = "config",
+ "data";
+
+ iommus = <&apps_smmu 0x820 0x0>,
+ <&apps_smmu 0x840 0x0>;
+
+ operating-points-v2 = <&ope_opp_table>;
+ power-domains = <&gcc GCC_CAMSS_TOP_GDSC>,
+ <&rpmpd QCM2290_VDDCX>;
+ power-domain-names = "camss",
+ "cx";
+
+ ope_opp_table: opp-table {
+ compatible = "operating-points-v2";
+
+ opp-19200000 {
+ opp-hz = /bits/ 64 <19200000>;
+ required-opps = <&rpmpd_opp_min_svs>;
+ };
+
+ opp-200000000 {
+ opp-hz = /bits/ 64 <200000000>;
+ required-opps = <&rpmpd_opp_svs>;
+ };
+
+ opp-266600000 {
+ opp-hz = /bits/ 64 <266600000>;
+ required-opps = <&rpmpd_opp_svs_plus>;
+ };
+
+ opp-465000000 {
+ opp-hz = /bits/ 64 <465000000>;
+ required-opps = <&rpmpd_opp_nom>;
+ };
+
+ opp-580000000 {
+ opp-hz = /bits/ 64 <580000000>;
+ required-opps = <&rpmpd_opp_turbo>;
+ turbo-mode;
+ };
+ };
+ };
+
mdss: display-subsystem@5e00000 {
compatible = "qcom,qcm2290-mdss";
reg = <0x0 0x05e00000 0x0 0x1000>;
--
2.34.1
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node
2026-03-23 12:58 ` [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node Loic Poulain
@ 2026-03-23 13:03 ` Bryan O'Donoghue
2026-03-23 13:24 ` Konrad Dybcio
1 sibling, 0 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-23 13:03 UTC (permalink / raw)
To: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 23/03/2026 12:58, Loic Poulain wrote:
> Add the Qualcomm CAMSS Offline Processing Engine (OPE) node for
> QCM2290. The OPE is a memory-to-memory image processing block used in
> offline imaging pipelines.
>
> The node includes register regions, clocks, interconnects, IOMMU
> mappings, power domains, interrupts, and an associated OPP table.
>
> At the moment we assign a fixed rate to GCC_CAMSS_AXI_CLK since this
> clock is shared across multiple CAMSS components and there is currently
> no support for dynamically scaling it.
>
> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
> ---
> arch/arm64/boot/dts/qcom/agatti.dtsi | 72 ++++++++++++++++++++++++++++
> 1 file changed, 72 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/agatti.dtsi b/arch/arm64/boot/dts/qcom/agatti.dtsi
> index f9b46cf1c646..358ebfc99552 100644
> --- a/arch/arm64/boot/dts/qcom/agatti.dtsi
> +++ b/arch/arm64/boot/dts/qcom/agatti.dtsi
> @@ -1935,6 +1935,78 @@ port@1 {
> };
> };
>
> + isp_ope: isp@5c42400 {
Should be a sub-node of CAMSS.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE)
2026-03-23 12:58 ` [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE) Loic Poulain
@ 2026-03-23 13:03 ` Krzysztof Kozlowski
2026-03-23 16:03 ` Loic Poulain
2026-03-23 13:03 ` Bryan O'Donoghue
1 sibling, 1 reply; 47+ messages in thread
From: Krzysztof Kozlowski @ 2026-03-23 13:03 UTC (permalink / raw)
To: Loic Poulain, bod, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 23/03/2026 13:58, Loic Poulain wrote:
> Add Devicetree binding documentation for the Qualcomm Camera Subsystem
> Offline Processing Engine (OPE) found on platforms such as Agatti.
> The OPE is a memory-to-memory image processing block which operates
> on frames read from and written back to system memory.
>
> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
I don't see explanation in cover letter why this is RFC, so I assume
this is not ready, thus not a full review but just few nits to spare you
resubmits later when this becomes reviewable.
> ---
> .../bindings/media/qcom,camss-ope.yaml | 86 +++++++++++++++++++
> 1 file changed, 86 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
>
> diff --git a/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
> new file mode 100644
> index 000000000000..509b4e89a88a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
Filename must match compatible.
> @@ -0,0 +1,86 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
...
> +
> +required:
> + - compatible
> + - reg
> + - reg-names
> + - clocks
> + - clock-names
> + - interrupts
> + - interconnects
> + - interconnect-names
> + - iommus
> + - power-domains
> + - power-domain-names
> +
> +additionalProperties: true
There are no bindings like that. You cannot have here true.
Also, lack of example is a no-go.
BTW, also remember about proper versioning of your patchset. b4 would do
that for you, but since you did not use it, you must handle it.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE)
2026-03-23 12:58 ` [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE) Loic Poulain
2026-03-23 13:03 ` Krzysztof Kozlowski
@ 2026-03-23 13:03 ` Bryan O'Donoghue
1 sibling, 0 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-23 13:03 UTC (permalink / raw)
To: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 23/03/2026 12:58, Loic Poulain wrote:
> Add Devicetree binding documentation for the Qualcomm Camera Subsystem
> Offline Processing Engine (OPE) found on platforms such as Agatti.
> The OPE is a memory-to-memory image processing block which operates
> on frames read from and written back to system memory.
>
> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
> ---
> .../bindings/media/qcom,camss-ope.yaml | 86 +++++++++++++++++++
> 1 file changed, 86 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
>
> diff --git a/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
> new file mode 100644
> index 000000000000..509b4e89a88a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
> @@ -0,0 +1,86 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/media/qcom,camss-ope.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Qualcomm Camera Subsystem Offline Processing Engine
> +
> +maintainers:
> + - Loic Poulain <loic.poulain@oss.qualcomm.com>
> +
> +description:
> + The Qualcomm Camera Subsystem (CAMSS) Offline Processing Engine (OPE)
> + is a memory-to-memory image processing block used. It supports a
> + range of pixel-processing operations such as scaling, cropping, gain
> + adjustments, white-balancing, and various format conversions. The OPE
> + does not interface directly with image sensors, instead, it processes
> + frames sourced from and written back to system memory.
> +
> +properties:
> + compatible:
> + const: qcom,qcm2290-camss-ope
> +
> + reg:
> + maxItems: 5
> +
> + reg-names:
> + items:
> + - const: top
> + - const: bus_read
> + - const: bus_write
> + - const: pipeline
> + - const: qos
> +
> + clocks:
> + maxItems: 5
> +
> + clock-names:
> + items:
> + - const: axi
> + - const: core
> + - const: iface
> + - const: nrt
> + - const: top
> +
> + interrupts:
> + maxItems: 1
> +
> + interconnects:
> + maxItems: 2
> +
> + interconnect-names:
> + items:
> + - const: config
> + - const: data
> +
> + iommus:
> + maxItems: 2
These should be described.
> +
> + operating-points-v2: true
> +
> + opp-table:
> + type: object
> +
> + power-domains:
> + maxItems: 2
> +
> + power-domain-names:
> + items:
> + - const: camss
> + - const: cx
> +
> +required:
> + - compatible
> + - reg
> + - reg-names
> + - clocks
> + - clock-names
> + - interrupts
> + - interconnects
> + - interconnect-names
> + - iommus
> + - power-domains
> + - power-domain-names
> +
> +additionalProperties: true
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node
2026-03-23 12:58 ` [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node Loic Poulain
2026-03-23 13:03 ` Bryan O'Donoghue
@ 2026-03-23 13:24 ` Konrad Dybcio
2026-03-23 13:33 ` Bryan O'Donoghue
2026-03-23 16:31 ` Loic Poulain
1 sibling, 2 replies; 47+ messages in thread
From: Konrad Dybcio @ 2026-03-23 13:24 UTC (permalink / raw)
To: Loic Poulain, bod, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 3/23/26 1:58 PM, Loic Poulain wrote:
> Add the Qualcomm CAMSS Offline Processing Engine (OPE) node for
> QCM2290. The OPE is a memory-to-memory image processing block used in
> offline imaging pipelines.
>
> The node includes register regions, clocks, interconnects, IOMMU
> mappings, power domains, interrupts, and an associated OPP table.
>
> At the moment we assign a fixed rate to GCC_CAMSS_AXI_CLK since this
> clock is shared across multiple CAMSS components and there is currently
> no support for dynamically scaling it.
>
> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
> ---
> arch/arm64/boot/dts/qcom/agatti.dtsi | 72 ++++++++++++++++++++++++++++
> 1 file changed, 72 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/agatti.dtsi b/arch/arm64/boot/dts/qcom/agatti.dtsi
> index f9b46cf1c646..358ebfc99552 100644
> --- a/arch/arm64/boot/dts/qcom/agatti.dtsi
> +++ b/arch/arm64/boot/dts/qcom/agatti.dtsi
> @@ -1935,6 +1935,78 @@ port@1 {
> };
> };
>
> + isp_ope: isp@5c42400 {
"camss_ope"? Label's don't need to be generic, but they need to be
meaningful - currently one could assume that there's a non-ISP OPE
as well (and I'm intentionally stretching it a bit to prove a point)
> + compatible = "qcom,qcm2290-camss-ope";
> +
> + reg = <0x0 0x5c42400 0x0 0x200>,
> + <0x0 0x5c46c00 0x0 0x190>,
> + <0x0 0x5c46d90 0x0 0xa00>,
> + <0x0 0x5c42800 0x0 0x4400>,
> + <0x0 0x5c42600 0x0 0x200>;
> + reg-names = "top",
> + "bus_read",
> + "bus_write",
> + "pipeline",
> + "qos";
This is a completely arbitrary choice, but I think it's easier to compare
against the docs if the reg entries are sorted by the 'reg' (which isn't
always easy to do since that can very between SoCs but this module is not
very common)
> +
> + clocks = <&gcc GCC_CAMSS_AXI_CLK>,
> + <&gcc GCC_CAMSS_OPE_CLK>,
> + <&gcc GCC_CAMSS_OPE_AHB_CLK>,
> + <&gcc GCC_CAMSS_NRT_AXI_CLK>,
> + <&gcc GCC_CAMSS_TOP_AHB_CLK>;
> + clock-names = "axi", "core", "iface", "nrt", "top";
Similarly, in the arbitrary choice of indices, I think putting "core"
first is "neat"
> + assigned-clocks = <&gcc GCC_CAMSS_AXI_CLK>;
> + assigned-clock-rates = <300000000>;
I really think we shouldn't be doing this here for a clock that covers
so much hw
[...]
> +
> + interrupts = <GIC_SPI 209 IRQ_TYPE_EDGE_RISING>;
> +
> + interconnects = <&bimc MASTER_APPSS_PROC RPM_ACTIVE_TAG
> + &config_noc SLAVE_CAMERA_CFG RPM_ACTIVE_TAG>,
> + <&mmnrt_virt MASTER_CAMNOC_SF RPM_ALWAYS_TAG
> + &bimc SLAVE_EBI1 RPM_ALWAYS_TAG>;
> + interconnect-names = "config",
> + "data";
> +
> + iommus = <&apps_smmu 0x820 0x0>,
> + <&apps_smmu 0x840 0x0>;
> +
> + operating-points-v2 = <&ope_opp_table>;
> + power-domains = <&gcc GCC_CAMSS_TOP_GDSC>,
Moving this under camss should let you remove the TOP_GDSC and TOP_AHB (and
perhaps some other) references
> + <&rpmpd QCM2290_VDDCX>;
> + power-domain-names = "camss",
> + "cx";> +
> + ope_opp_table: opp-table {
> + compatible = "operating-points-v2";
> +
> + opp-19200000 {
> + opp-hz = /bits/ 64 <19200000>;
> + required-opps = <&rpmpd_opp_min_svs>;
> + };
> +
> + opp-200000000 {
> + opp-hz = /bits/ 64 <200000000>;
> + required-opps = <&rpmpd_opp_svs>;
> + };
> +
> + opp-266600000 {
> + opp-hz = /bits/ 64 <266600000>;
> + required-opps = <&rpmpd_opp_svs_plus>;
> + };
> +
> + opp-465000000 {
> + opp-hz = /bits/ 64 <465000000>;
> + required-opps = <&rpmpd_opp_nom>;
> + };
> +
> + opp-580000000 {
> + opp-hz = /bits/ 64 <580000000>;
> + required-opps = <&rpmpd_opp_turbo>;
> + turbo-mode;
Are we going to act on this property? Otherwise I think it's just a naming
collision with Qualcomm's TURBO (which may? have previously??? had some
special implications)
Konrad
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node
2026-03-23 13:24 ` Konrad Dybcio
@ 2026-03-23 13:33 ` Bryan O'Donoghue
2026-03-23 16:15 ` Krzysztof Kozlowski
2026-03-23 16:31 ` Loic Poulain
1 sibling, 1 reply; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-23 13:33 UTC (permalink / raw)
To: Konrad Dybcio, Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 23/03/2026 13:24, Konrad Dybcio wrote:
> + isp_ope: isp@5c42400 {
ope@5c42400 isp@ is already used.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-23 12:58 ` [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver Loic Poulain
@ 2026-03-23 13:43 ` Bryan O'Donoghue
2026-03-23 15:31 ` Loic Poulain
0 siblings, 1 reply; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-23 13:43 UTC (permalink / raw)
To: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 23/03/2026 12:58, Loic Poulain wrote:
> Provide a initial implementation for the Qualcomm Offline Processing
> Engine (OPE). OPE is a memory-to-memory hardware block designed for
> image processing on a source frame. Typically, the input frame
> originates from the SoC CSI capture path, though not limited to.
>
> The hardware architecture consists of Fetch Engines and Write Engines,
> connected through intermediate pipeline modules:
> [FETCH ENGINES] => [Pipeline Modules] => [WRITE ENGINES]
>
> Current Configuration:
> Fetch Engine: One fetch engine is used for Bayer frame input.
> Write Engines: Two display write engines for Y and UV planes output.
>
> Enabled Pipeline Modules:
> CLC_WB: White balance (channel gain configuration)
> CLC_DEMO: Demosaic (Bayer to RGB conversion)
> CLC_CHROMA_ENHAN: RGB to YUV conversion
> CLC_DOWNSCALE*: Downscaling for UV and Y planes
>
> Default configuration values are based on public standards such as BT.601.
>
> Processing Model:
> OPE processes frames in stripes of up to 336 pixels. Therefore, frames must
> be split into stripes for processing. Each stripe is configured after the
> previous one has been acquired (double buffered registers). To minimize
> inter-stripe latency, stripe configurations are generated ahead of time.
A yavata command set showing usage would be appreciated.
>
> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
> ---
> drivers/media/platform/qcom/camss/Makefile | 4 +
> drivers/media/platform/qcom/camss/camss-ope.c | 2058 +++++++++++++++++
> 2 files changed, 2062 insertions(+)
> create mode 100644 drivers/media/platform/qcom/camss/camss-ope.c
>
> diff --git a/drivers/media/platform/qcom/camss/Makefile b/drivers/media/platform/qcom/camss/Makefile
> index 5e349b491513..67f261ae0855 100644
> --- a/drivers/media/platform/qcom/camss/Makefile
> +++ b/drivers/media/platform/qcom/camss/Makefile
> @@ -29,3 +29,7 @@ qcom-camss-objs += \
> camss-format.o \
>
> obj-$(CONFIG_VIDEO_QCOM_CAMSS) += qcom-camss.o
> +
> +qcom-camss-ope-objs += camss-ope.o
> +
> +obj-$(CONFIG_VIDEO_QCOM_CAMSS) += qcom-camss-ope.o
Needs a Kconfig entry.
> diff --git a/drivers/media/platform/qcom/camss/camss-ope.c b/drivers/media/platform/qcom/camss/camss-ope.c
> new file mode 100644
> index 000000000000..f45a16437b6d
> --- /dev/null
> +++ b/drivers/media/platform/qcom/camss/camss-ope.c
> @@ -0,0 +1,2058 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * camss-ope.c
> + *
> + * Qualcomm MSM Camera Subsystem - Offline Processing Engine
> + *
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +/*
> + * This driver provides a minimal implementation for the Qualcomm Offline
> + * Processing Engine (OPE). OPE is a memory-to-memory hardware block
> + * designed for image processing on a source frame. Typically, the input
> + * frame originates from the SoC CSI capture path, though not limited to.
> + *
> + * The hardware architecture consists of Fetch Engines and Write Engines,
> + * connected through intermediate pipeline modules:
> + * [FETCH ENGINES] => [Pipeline Modules] => [WRITE ENGINES]
> + *
> + * Current Configuration:
> + * Fetch Engine: One fetch engine is used for Bayer frame input.
> + * Write Engines: Two display write engines for Y and UV planes output.
> + *
> + * Only a subset of the pipeline modules are enabled:
> + * CLC_WB: White balance for channel gain configuration
> + * CLC_DEMO: Demosaic for Bayer to RGB conversion
> + * CLC_CHROMA_ENHAN: for RGB to YUV conversion
> + * CLC_DOWNSCALE*: Downscaling for UV (YUV444 -> YUV422/YUV420) and YUV planes
> + *
> + * Default configuration values are based on public standards such as BT.601.
> + *
> + * Processing Model:
> + * OPE processes frames in stripes of up to 336 pixels. Therefore, frames must
> + * be split into stripes for processing. Each stripe is configured after the
> + * previous one has been acquired (double buffered registers). To minimize
> + * inter-stripe latency, the stripe configurations are generated ahead of time.
> + *
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/clk.h>
> +#include <linux/completion.h>
> +#include <linux/delay.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/interconnect.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/iopoll.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_clock.h>
> +#include <linux/pm_domain.h>
> +#include <linux/pm_opp.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/regmap.h>
> +#include <linux/slab.h>
> +
> +#include <media/media-device.h>
> +#include <media/v4l2-ctrls.h>
> +#include <media/v4l2-device.h>
> +#include <media/v4l2-event.h>
> +#include <media/v4l2-ioctl.h>
> +#include <media/v4l2-mem2mem.h>
> +#include <media/videobuf2-dma-contig.h>
> +
> +#define MEM2MEM_NAME "qcom-camss-ope"
> +
> +/* TOP registers */
> +#define OPE_TOP_HW_VERSION 0x000
> +#define OPE_TOP_HW_VERSION_STEP GENMASK(15, 0)
> +#define OPE_TOP_HW_VERSION_REV GENMASK(27, 16)
> +#define OPE_TOP_HW_VERSION_GEN GENMASK(31, 28)
> +#define OPE_TOP_RESET_CMD 0x004
> +#define OPE_TOP_RESET_CMD_HW BIT(0)
> +#define OPE_TOP_RESET_CMD_SW BIT(1)
> +#define OPE_TOP_CORE_CFG 0x010
> +#define OPE_TOP_IRQ_STATUS 0x014
> +#define OPE_TOP_IRQ_MASK 0x018
> +#define OPE_TOP_IRQ_STATUS_RST_DONE BIT(0)
> +#define OPE_TOP_IRQ_STATUS_WE BIT(1)
> +#define OPE_TOP_IRQ_STATUS_FE BIT(2)
> +#define OPE_TOP_IRQ_STATUS_VIOL BIT(3)
> +#define OPE_TOP_IRQ_STATUS_IDLE BIT(4)
> +#define OPE_TOP_IRQ_CLEAR 0x01c
> +#define OPE_TOP_IRQ_SET 0x020
> +#define OPE_TOP_IRQ_CMD 0x024
> +#define OPE_TOP_IRQ_CMD_CLEAR BIT(0)
> +#define OPE_TOP_IRQ_CMD_SET BIT(4)
> +#define OPE_TOP_VIOLATION_STATUS 0x028
> +#define OPE_TOP_DEBUG(i) (0x0a0 + (i) * 4)
> +#define OPE_TOP_DEBUG_CFG 0x0dc
> +
> +/* Fetch engines */
> +#define OPE_BUS_RD_INPUT_IF_IRQ_MASK 0x00c
> +#define OPE_BUS_RD_INPUT_IF_IRQ_CLEAR 0x010
> +#define OPE_BUS_RD_INPUT_IF_IRQ_CMD 0x014
> +#define OPE_BUS_RD_INPUT_IF_IRQ_CMD_CLEAR BIT(0)
> +#define OPE_BUS_RD_INPUT_IF_IRQ_CMD_SET BIT(4)
> +#define OPE_BUS_RD_INPUT_IF_IRQ_STATUS 0x018
> +#define OPE_BUS_RD_INPUT_IF_IRQ_STATUS_RST_DONE BIT(0)
> +#define OPE_BUS_RD_INPUT_IF_IRQ_STATUS_RUP_DONE BIT(1)
> +#define OPE_BUS_RD_INPUT_IF_IRQ_STATUS_BUF_DONE BIT(2)
> +#define OPE_BUS_RD_INPUT_IF_CMD 0x01c
> +#define OPE_BUS_RD_INPUT_IF_CMD_GO_CMD BIT(0)
> +#define OPE_BUS_RD_CLIENT_0_CORE_CFG 0x050
> +#define OPE_BUS_RD_CLIENT_0_CORE_CFG_EN BIT(0)
> +#define OPE_BUS_RD_CLIENT_0_CCIF_META_DATA 0x054
> +#define OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN GENMASK(7, 2)
> +#define OPE_BUS_RD_CLIENT_0_ADDR_IMAGE 0x058
> +#define OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE 0x05c
> +#define OPE_BUS_RD_CLIENT_0_RD_STRIDE 0x060
> +#define OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0 0x064
> +
> +/* Write engines */
> +#define OPE_BUS_WR_INPUT_IF_IRQ_MASK_0 0x018
> +#define OPE_BUS_WR_INPUT_IF_IRQ_MASK_1 0x01c
> +#define OPE_BUS_WR_INPUT_IF_IRQ_CLEAR_0 0x020
> +#define OPE_BUS_WR_INPUT_IF_IRQ_CLEAR_1 0x024
> +#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0 0x028
> +#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_RUP_DONE BIT(0)
> +#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_BUF_DONE BIT(8)
> +#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_CONS_VIOL BIT(28)
> +#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_VIOL BIT(30)
> +#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_IMG_SZ_VIOL BIT(31)
> +#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_1 0x02c
> +#define OPE_BUS_WR_INPUT_IF_IRQ_STATUS_1_CLIENT_DONE(c) BIT(0 + (c))
> +#define OPE_BUS_WR_INPUT_IF_IRQ_CMD 0x030
> +#define OPE_BUS_WR_INPUT_IF_IRQ_CMD_CLEAR BIT(0)
> +#define OPE_BUS_WR_INPUT_IF_IRQ_CMD_SET BIT(1)
> +#define OPE_BUS_WR_VIOLATION_STATUS 0x064
> +#define OPE_BUS_WR_IMAGE_SIZE_VIOLATION_STATUS 0x070
> +#define OPE_BUS_WR_CLIENT_CFG(c) (0x200 + (c) * 0x100)
> +#define OPE_BUS_WR_CLIENT_CFG_EN BIT(0)
> +#define OPE_BUS_WR_CLIENT_CFG_AUTORECOVER BIT(4)
> +#define OPE_BUS_WR_CLIENT_ADDR_IMAGE(c) (0x204 + (c) * 0x100)
> +#define OPE_BUS_WR_CLIENT_IMAGE_CFG_0(c) (0x20c + (c) * 0x100)
> +#define OPE_BUS_WR_CLIENT_IMAGE_CFG_1(c) (0x210 + (c) * 0x100)
> +#define OPE_BUS_WR_CLIENT_IMAGE_CFG_2(c) (0x214 + (c) * 0x100)
> +#define OPE_BUS_WR_CLIENT_PACKER_CFG(c) (0x218 + (c) * 0x100)
> +#define OPE_BUS_WR_CLIENT_ADDR_FRAME_HEADER(c) (0x220 + (c) * 0x100)
> +#define OPE_BUS_WR_CLIENT_MAX 8
> +
> +/* Pipeline modules */
> +#define OPE_PP_CLC_WB_GAIN_MODULE_CFG (0x200 + 0x60)
> +#define OPE_PP_CLC_WB_GAIN_MODULE_CFG_EN BIT(0)
> +#define OPE_PP_CLC_WB_GAIN_WB_CFG(ch) (0x200 + 0x68 + 4 * (ch))
> +
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_C_PRE_BASE 0x1c00
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_Y_DISP_BASE 0x3000
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_C_DISP_BASE 0x3200
> +#define OPE_PP_CLC_DOWNSCALE_MN_CFG(ds) ((ds) + 0x60)
> +#define OPE_PP_CLC_DOWNSCALE_MN_CFG_EN BIT(0)
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(ds) ((ds) + 0x64)
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN BIT(9)
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN BIT(10)
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(ds) ((ds) + 0x68)
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(ds) ((ds) + 0x6c)
> +#define OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(ds) ((ds) + 0x74)
> +
> +#define OPE_PP_CLC_CHROMA_ENHAN_MODULE_CFG (0x1200 + 0x60)
> +#define OPE_PP_CLC_CHROMA_ENHAN_MODULE_CFG_EN BIT(0)
> +#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0 (0x1200 + 0x68)
> +#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0_V0 GENMASK(11, 0)
> +#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0_V1 GENMASK(27, 16)
> +#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_1 (0x1200 + 0x6c)
> +#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_1_K GENMASK(31, 23)
> +#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_2 (0x1200 + 0x70)
> +#define OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_2_V2 GENMASK(11, 0)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG (0x1200 + 0x74)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG_AP GENMASK(11, 0)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG_AM GENMASK(27, 16)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG (0x1200 + 0x78)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG_BP GENMASK(11, 0)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG_BM GENMASK(27, 16)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG (0x1200 + 0x7C)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG_CP GENMASK(11, 0)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG_CM GENMASK(27, 16)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG (0x1200 + 0x80)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG_DP GENMASK(11, 0)
> +#define OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG_DM GENMASK(27, 16)
> +#define OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_0 (0x1200 + 0x84)
> +#define OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_0_KCB GENMASK(31, 21)
> +#define OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_1 (0x1200 + 0x88)
> +#define OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_1_KCR GENMASK(31, 21)
> +
> +#define OPE_STRIPE_MAX_W 336
> +#define OPE_STRIPE_MAX_H 8192
> +#define OPE_STRIPE_MIN_W 16
> +#define OPE_STRIPE_MIN_H OPE_STRIPE_MIN_W
> +#define OPE_MAX_STRIPE 16
> +#define OPE_ALIGN_H 1
> +#define OPE_ALIGN_W 1
> +#define OPE_MIN_W 24
> +#define OPE_MIN_H 16
> +#define OPE_MAX_W (OPE_STRIPE_MAX_W * OPE_MAX_STRIPE)
> +#define OPE_MAX_H OPE_STRIPE_MAX_H
> +
> +#define MEM2MEM_CAPTURE BIT(0)
> +#define MEM2MEM_OUTPUT BIT(1)
> +
> +#define OPE_RESET_TIMEOUT_MS 100
> +
> +/* Expected framerate for power scaling */
> +#define DEFAULT_FRAMERATE 60
> +
> +/* Downscaler helpers */
> +#define Q21(v) (((uint64_t)(v)) << 21)
> +#define DS_Q21(n, d) ((uint32_t)(((uint64_t)(n) << 21) / (d)))
u64 and u32 here.
> +#define DS_RESOLUTION(input, output) \
> + (((output) * 128 <= (input)) ? 0x0 : \
> + ((output) * 16 <= (input)) ? 0x1 : \
> + ((output) * 8 <= (input)) ? 0x2 : 0x3)
> +#define DS_OUTPUT_PIX(input, phase_init, phase_step) \
> + ((Q21(input) - (phase_init)) / (phase_step))
> +
> +enum ope_downscaler {
> + OPE_DS_C_PRE,
> + OPE_DS_C_DISP,
> + OPE_DS_Y_DISP,
> + OPE_DS_MAX,
> +};
> +
> +static const u32 ope_ds_base[OPE_DS_MAX] = { OPE_PP_CLC_DOWNSCALE_MN_DS_C_PRE_BASE,
> + OPE_PP_CLC_DOWNSCALE_MN_DS_C_DISP_BASE,
> + OPE_PP_CLC_DOWNSCALE_MN_DS_Y_DISP_BASE };
> +
> +enum ope_wr_client {
> + OPE_WR_CLIENT_VID_Y,
> + OPE_WR_CLIENT_VID_C,
> + OPE_WR_CLIENT_DISP_Y,
> + OPE_WR_CLIENT_DISP_C,
> + OPE_WR_CLIENT_ARGB,
> + OPE_WR_CLIENT_MAX
> +};
> +
> +enum ope_pixel_pattern {
> + OPE_PIXEL_PATTERN_RGRGRG,
> + OPE_PIXEL_PATTERN_GRGRGR,
> + OPE_PIXEL_PATTERN_BGBGBG,
> + OPE_PIXEL_PATTERN_GBGBGB,
> + OPE_PIXEL_PATTERN_YCBYCR,
> + OPE_PIXEL_PATTERN_YCRYCB,
> + OPE_PIXEL_PATTERN_CBYCRY,
> + OPE_PIXEL_PATTERN_CRYCBY
> +};
> +
> +enum ope_stripe_location {
> + OPE_STRIPE_LOCATION_FULL,
> + OPE_STRIPE_LOCATION_LEFT,
> + OPE_STRIPE_LOCATION_RIGHT,
> + OPE_STRIPE_LOCATION_MIDDLE
> +};
> +
> +enum ope_unpacker_format {
> + OPE_UNPACKER_FMT_PLAIN_128,
> + OPE_UNPACKER_FMT_PLAIN_8,
> + OPE_UNPACKER_FMT_PLAIN_16_10BPP,
> + OPE_UNPACKER_FMT_PLAIN_16_12BPP,
> + OPE_UNPACKER_FMT_PLAIN_16_14BPP,
> + OPE_UNPACKER_FMT_PLAIN_32_20BPP,
> + OPE_UNPACKER_FMT_ARGB_16_10BPP,
> + OPE_UNPACKER_FMT_ARGB_16_12BPP,
> + OPE_UNPACKER_FMT_ARGB_16_14BPP,
> + OPE_UNPACKER_FMT_PLAIN_32,
> + OPE_UNPACKER_FMT_PLAIN_64,
> + OPE_UNPACKER_FMT_TP_10,
> + OPE_UNPACKER_FMT_MIPI_8,
> + OPE_UNPACKER_FMT_MIPI_10,
> + OPE_UNPACKER_FMT_MIPI_12,
> + OPE_UNPACKER_FMT_MIPI_14,
> + OPE_UNPACKER_FMT_PLAIN_16_16BPP,
> + OPE_UNPACKER_FMT_PLAIN_128_ODD_EVEN,
> + OPE_UNPACKER_FMT_PLAIN_8_ODD_EVEN
> +};
> +
> +enum ope_packer_format {
> + OPE_PACKER_FMT_PLAIN_128,
> + OPE_PACKER_FMT_PLAIN_8,
> + OPE_PACKER_FMT_PLAIN_8_ODD_EVEN,
> + OPE_PACKER_FMT_PLAIN_8_10BPP,
> + OPE_PACKER_FMT_PLAIN_8_10BPP_ODD_EVEN,
> + OPE_PACKER_FMT_PLAIN_16_10BPP,
> + OPE_PACKER_FMT_PLAIN_16_12BPP,
> + OPE_PACKER_FMT_PLAIN_16_14BPP,
> + OPE_PACKER_FMT_PLAIN_16_16BPP,
> + OPE_PACKER_FMT_PLAIN_32,
> + OPE_PACKER_FMT_PLAIN_64,
> + OPE_PACKER_FMT_TP_10,
> + OPE_PACKER_FMT_MIPI_10,
> + OPE_PACKER_FMT_MIPI_12
> +};
> +
> +struct ope_fmt {
> + u32 fourcc;
> + unsigned int types;
> + enum ope_pixel_pattern pattern;
> + enum ope_unpacker_format unpacker_format;
> + enum ope_packer_format packer_format;
> + unsigned int depth;
> + unsigned int align; /* pix alignment = 2^align */
> +};
> +
> +static const struct ope_fmt formats[] = { /* TODO: add multi-planes formats */
> + /* Output - Bayer MIPI 10 */
> + { V4L2_PIX_FMT_SBGGR10P, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_BGBGBG,
> + OPE_UNPACKER_FMT_MIPI_10, OPE_PACKER_FMT_MIPI_10, 10, 2 },
> + { V4L2_PIX_FMT_SGBRG10P, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_GBGBGB,
> + OPE_UNPACKER_FMT_MIPI_10, OPE_PACKER_FMT_MIPI_10, 10, 2 },
> + { V4L2_PIX_FMT_SGRBG10P, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_GRGRGR,
> + OPE_UNPACKER_FMT_MIPI_10, OPE_PACKER_FMT_MIPI_10, 10, 2 },
> + { V4L2_PIX_FMT_SRGGB10P, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_RGRGRG,
> + OPE_UNPACKER_FMT_MIPI_10, OPE_PACKER_FMT_MIPI_10, 10, 2 },
> + /* Output - Bayer MIPI/Plain 8 */
> + { V4L2_PIX_FMT_SRGGB8, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_RGRGRG,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
> + { V4L2_PIX_FMT_SBGGR8, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_BGBGBG,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
> + { V4L2_PIX_FMT_SGBRG8, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_GBGBGB,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
> + { V4L2_PIX_FMT_SGRBG8, MEM2MEM_OUTPUT, OPE_PIXEL_PATTERN_GRGRGR,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
> + /* Capture - YUV 8-bit per component */
> + { V4L2_PIX_FMT_NV24, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_YCBYCR,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 24, 0 },
> + { V4L2_PIX_FMT_NV42, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_YCRYCB,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8_ODD_EVEN, 24, 0 },
> + { V4L2_PIX_FMT_NV16, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_CBYCRY,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 16, 1 },
> + { V4L2_PIX_FMT_NV61, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_CBYCRY,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8_ODD_EVEN, 16, 1 },
> + { V4L2_PIX_FMT_NV12, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_CBYCRY,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 12, 1 },
> + { V4L2_PIX_FMT_NV21, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_CBYCRY,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8_ODD_EVEN, 12, 1 },
> + /* Capture - Greyscale 8-bit */
> + { V4L2_PIX_FMT_GREY, MEM2MEM_CAPTURE, OPE_PIXEL_PATTERN_RGRGRG,
> + OPE_UNPACKER_FMT_PLAIN_8, OPE_PACKER_FMT_PLAIN_8, 8, 0 },
> +};
> +
> +#define OPE_NUM_FORMATS ARRAY_SIZE(formats)
> +
> +#define OPE_WB(n, d) (((n) << 10) / (d))
> +
> +/* Per-queue, driver-specific private data */
> +struct ope_q_data {
> + unsigned int width;
> + unsigned int height;
> + unsigned int bytesperline;
> + unsigned int sizeimage;
> + const struct ope_fmt *fmt;
> + enum v4l2_ycbcr_encoding ycbcr_enc;
> + enum v4l2_quantization quant;
> + unsigned int sequence;
> +};
> +
> +struct ope_dev {
> + struct device *dev;
> + struct v4l2_device v4l2_dev;
> + struct video_device vfd;
> + struct media_device mdev;
> + struct v4l2_m2m_dev *m2m_dev;
> +
> + void __iomem *base;
> + void __iomem *base_rd;
> + void __iomem *base_wr;
> + void __iomem *base_pp;
> +
> + struct completion reset_complete;
> +
> + struct icc_path *icc_data;
> + struct icc_path *icc_config;
> +
> + struct mutex mutex;
> + struct list_head ctx_list;
> + void *context;
> +};
> +
> +struct ope_dsc_config {
> + u32 input_width;
> + u32 input_height;
> + u32 output_width;
> + u32 output_height;
> + u32 phase_step_h;
> + u32 phase_step_v;
> +};
> +
> +struct ope_stripe {
> + struct {
> + dma_addr_t addr;
> + u32 width;
> + u32 height;
> + u32 stride;
> + enum ope_stripe_location location;
> + enum ope_pixel_pattern pattern;
> + enum ope_unpacker_format format;
> + } src;
> + struct {
> + dma_addr_t addr;
> + u32 width;
> + u32 height;
> + u32 stride;
> + u32 x_init;
> + enum ope_packer_format format;
> + bool enabled;
> + } dst[OPE_WR_CLIENT_MAX];
> + struct ope_dsc_config dsc[OPE_DS_MAX];
> +};
> +
> +struct ope_ctx {
> + struct v4l2_fh fh;
> + struct ope_dev *ope;
> +
> + /* Processing mode */
> + int mode;
> + u8 alpha_component;
> + u8 rotation;
> + unsigned int framerate;
> +
> + enum v4l2_colorspace colorspace;
> + enum v4l2_xfer_func xfer_func;
> +
> + /* Source and destination queue data */
> + struct ope_q_data q_data_src;
> + struct ope_q_data q_data_dst;
> +
> + u8 current_stripe;
> + struct ope_stripe stripe[OPE_MAX_STRIPE];
> +
> + bool started;
> +
> + struct list_head list;
> +};
> +
> +struct ope_freq_tbl {
> + unsigned int load;
> + unsigned long freq;
> +};
> +
> +static inline char *print_fourcc(u32 fmt)
> +{
> + static char code[5];
> +
> + code[0] = (unsigned char)(fmt & 0xff);
> + code[1] = (unsigned char)((fmt >> 8) & 0xff);
> + code[2] = (unsigned char)((fmt >> 16) & 0xff);
> + code[3] = (unsigned char)((fmt >> 24) & 0xff);
> + code[4] = '\0';
> +
> + return code;
> +}
This is a bug
> +
> +static inline enum ope_stripe_location ope_stripe_location(unsigned int index,
> + unsigned int count)
> +{
> + if (count == 1)
> + return OPE_STRIPE_LOCATION_FULL;
> + if (index == 0)
> + return OPE_STRIPE_LOCATION_LEFT;
> + if (index == (count - 1))
> + return OPE_STRIPE_LOCATION_RIGHT;
> +
> + return OPE_STRIPE_LOCATION_MIDDLE;
> +}
> +
> +static inline bool ope_stripe_is_last(struct ope_stripe *stripe)
> +{
> + if (!stripe)
> + return false;
> +
> + if (stripe->src.location == OPE_STRIPE_LOCATION_RIGHT ||
> + stripe->src.location == OPE_STRIPE_LOCATION_FULL)
> + return true;
> +
> + return false;
> +}
> +
> +static inline struct ope_stripe *ope_get_stripe(struct ope_ctx *ctx, unsigned int index)
> +{
> + return &ctx->stripe[index];
> +}
> +
> +static inline struct ope_stripe *ope_current_stripe(struct ope_ctx *ctx)
> +{
> + if (!ctx)
> + return NULL;
> +
> + if (ctx->current_stripe >= OPE_MAX_STRIPE)
> + return NULL;
> +
> + return ope_get_stripe(ctx, ctx->current_stripe);
> +}
> +
> +static inline unsigned int ope_stripe_index(struct ope_ctx *ctx, struct ope_stripe *stripe)
> +{
> + return stripe - &ctx->stripe[0];
> +}
> +
> +static inline struct ope_stripe *ope_prev_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
> +{
> + unsigned int index = ope_stripe_index(ctx, stripe);
> +
> + return index ? ope_get_stripe(ctx, index - 1) : NULL;
> +}
> +
> +static inline struct ope_q_data *get_q_data(struct ope_ctx *ctx, enum v4l2_buf_type type)
> +{
> + if (V4L2_TYPE_IS_OUTPUT(type))
> + return &ctx->q_data_src;
> + else
> + return &ctx->q_data_dst;
> +}
> +
> +static inline unsigned long __q_data_pixclk(struct ope_q_data *q, unsigned int fps)
> +{
> + return (unsigned long)q->width * q->height * fps;
> +}
> +
> +static inline unsigned int __q_data_load_avg(struct ope_q_data *q, unsigned int fps)
> +{
> + /* Data load in kBps, calculated from pixel clock and bits per pixel */
> + return mult_frac(__q_data_pixclk(q, fps), q->fmt->depth, 1000) / 8;
> +}
> +
> +static inline unsigned int __q_data_load_peak(struct ope_q_data *q, unsigned int fps)
> +{
> + return __q_data_load_avg(q, fps) * 2;
> +}
> +
> +static inline unsigned int __q_data_load_config(struct ope_q_data *q, unsigned int fps)
> +{
> + unsigned int stripe_count = q->width / OPE_STRIPE_MAX_W + 1;
> + unsigned int stripe_load = 50 * 4 * fps; /* about 50 x 32-bit registers to configure */
> +
> + /* Return config load in kBps */
> + return mult_frac(stripe_count, stripe_load, 1000);
> +}
> +
> +static inline struct ope_ctx *file2ctx(struct file *file)
> +{
> + return container_of(file->private_data, struct ope_ctx, fh);
> +}
> +
> +static inline u32 ope_read(struct ope_dev *ope, u32 reg)
> +{
> + return readl(ope->base + reg);
> +}
> +
> +static inline void ope_write(struct ope_dev *ope, u32 reg, u32 value)
> +{
> + writel(value, ope->base + reg);
> +}
> +
> +static inline u32 ope_read_wr(struct ope_dev *ope, u32 reg)
> +{
> + return readl_relaxed(ope->base_wr + reg);
> +}
> +
> +static inline void ope_write_wr(struct ope_dev *ope, u32 reg, u32 value)
> +{
> + writel_relaxed(value, ope->base_wr + reg);
> +}
> +
> +static inline u32 ope_read_rd(struct ope_dev *ope, u32 reg)
> +{
> + return readl_relaxed(ope->base_rd + reg);
> +}
> +
> +static inline void ope_write_rd(struct ope_dev *ope, u32 reg, u32 value)
> +{
> + writel_relaxed(value, ope->base_rd + reg);
> +}
> +
> +static inline u32 ope_read_pp(struct ope_dev *ope, u32 reg)
> +{
> + return readl_relaxed(ope->base_pp + reg);
> +}
> +
> +static inline void ope_write_pp(struct ope_dev *ope, u32 reg, u32 value)
> +{
> + writel_relaxed(value, ope->base_pp + reg);
> +}
> +
> +static inline void ope_start(struct ope_dev *ope)
> +{
> + wmb(); /* Ensure the next write occurs only after all prior normal memory accesses */
> + ope_write_rd(ope, OPE_BUS_RD_INPUT_IF_CMD, OPE_BUS_RD_INPUT_IF_CMD_GO_CMD);
> +}
> +
> +static bool ope_pix_fmt_is_yuv(u32 fourcc)
> +{
> + switch (fourcc) {
> + case V4L2_PIX_FMT_NV16:
> + case V4L2_PIX_FMT_NV12:
> + case V4L2_PIX_FMT_NV24:
> + case V4L2_PIX_FMT_NV61:
> + case V4L2_PIX_FMT_NV21:
> + case V4L2_PIX_FMT_NV42:
> + case V4L2_PIX_FMT_GREY:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +static const struct ope_fmt *find_format(unsigned int pixelformat)
> +{
> + const struct ope_fmt *fmt;
> + unsigned int i;
> +
> + for (i = 0; i < OPE_NUM_FORMATS; i++) {
> + fmt = &formats[i];
> + if (fmt->fourcc == pixelformat)
> + break;
> + }
> +
> + if (i == OPE_NUM_FORMATS)
> + return NULL;
> +
> + return &formats[i];
> +}
> +
> +static inline void ope_dbg_print_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
> +{
> + struct ope_dev *ope = ctx->ope;
> + int i;
> +
> + dev_dbg(ope->dev, "S%u/FE0: addr=%pad;W=%ub;H=%u;stride=%u;loc=%u;pattern=%u;fmt=%u\n",
> + ope_stripe_index(ctx, stripe), &stripe->src.addr, stripe->src.width,
> + stripe->src.height, stripe->src.stride, stripe->src.location, stripe->src.pattern,
> + stripe->src.format);
> +
> + for (i = 0; i < OPE_DS_MAX; i++) {
> + struct ope_dsc_config *c = &stripe->dsc[i];
> +
> + dev_dbg(ope->dev, "S%u/DSC%d: %ux%u => %ux%u\n",
> + ope_stripe_index(ctx, stripe), i, c->input_width, c->input_height,
> + c->output_width, c->output_height);
> + }
> +
> + for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
> + if (!stripe->dst[i].enabled)
> + continue;
> +
> + dev_dbg(ope->dev,
> + "S%u/WE%d: addr=%pad;X=%u;W=%upx;H=%u;stride=%u;fmt=%u\n",
> + ope_stripe_index(ctx, stripe), i, &stripe->dst[i].addr,
> + stripe->dst[i].x_init, stripe->dst[i].width, stripe->dst[i].height,
> + stripe->dst[i].stride, stripe->dst[i].format);
> + }
> +}
> +
> +static void ope_gen_stripe_argb_dst(struct ope_ctx *ctx, struct ope_stripe *stripe, dma_addr_t dst)
> +{
> + unsigned int index = ope_stripe_index(ctx, stripe);
> + unsigned int img_width = ctx->q_data_dst.width;
> + unsigned int width, height;
> + dma_addr_t addr;
> +
> + /* This is GBRA64 format (le16)G (le16)B (le16)R (le16)A */
> +
> + stripe->dst[OPE_WR_CLIENT_ARGB].enabled = true;
> +
> + width = stripe->src.width;
> + height = stripe->src.height;
> +
> + if (!index) {
> + addr = dst;
> + } else {
> + struct ope_stripe *prev = ope_get_stripe(ctx, index - 1);
> +
> + addr = prev->dst[OPE_WR_CLIENT_ARGB].addr + prev->dst[OPE_WR_CLIENT_ARGB].width * 8;
> + }
> +
> + stripe->dst[OPE_WR_CLIENT_ARGB].addr = addr;
> + stripe->dst[OPE_WR_CLIENT_ARGB].x_init = 0;
> + stripe->dst[OPE_WR_CLIENT_ARGB].width = width;
> + stripe->dst[OPE_WR_CLIENT_ARGB].height = height;
> + stripe->dst[OPE_WR_CLIENT_ARGB].stride = img_width * 8;
> + stripe->dst[OPE_WR_CLIENT_ARGB].format = OPE_PACKER_FMT_PLAIN_64;
> +}
> +
> +static void ope_gen_stripe_yuv_dst(struct ope_ctx *ctx, struct ope_stripe *stripe, dma_addr_t dst)
> +{
> + struct ope_stripe *prev = ope_prev_stripe(ctx, stripe);
> + unsigned int img_width = ctx->q_data_dst.width;
> + unsigned int img_height = ctx->q_data_dst.height;
> + unsigned int width, height;
> + u32 x_init = 0;
> +
> + stripe->dst[OPE_WR_CLIENT_DISP_Y].enabled = true;
> + stripe->dst[OPE_WR_CLIENT_DISP_C].enabled = true;
> +
> + /* Y */
> + width = stripe->dsc[OPE_DS_Y_DISP].output_width;
> + height = stripe->dsc[OPE_DS_Y_DISP].output_height;
> +
> + if (prev)
> + x_init = prev->dst[OPE_WR_CLIENT_DISP_Y].x_init +
> + prev->dst[OPE_WR_CLIENT_DISP_Y].width;
> +
> + stripe->dst[OPE_WR_CLIENT_DISP_Y].addr = dst;
> + stripe->dst[OPE_WR_CLIENT_DISP_Y].x_init = x_init;
> + stripe->dst[OPE_WR_CLIENT_DISP_Y].width = width;
> + stripe->dst[OPE_WR_CLIENT_DISP_Y].height = height;
> + stripe->dst[OPE_WR_CLIENT_DISP_Y].stride = img_width;
> + stripe->dst[OPE_WR_CLIENT_DISP_Y].format = OPE_PACKER_FMT_PLAIN_8;
> +
> + /* UV */
> + width = stripe->dsc[OPE_DS_C_DISP].output_width;
> + height = stripe->dsc[OPE_DS_C_DISP].output_height;
> +
> + if (prev)
> + x_init = prev->dst[OPE_WR_CLIENT_DISP_C].x_init +
> + prev->dst[OPE_WR_CLIENT_DISP_C].width;
> +
> + stripe->dst[OPE_WR_CLIENT_DISP_C].addr = dst + img_width * img_height;
> + stripe->dst[OPE_WR_CLIENT_DISP_C].x_init = x_init;
> + stripe->dst[OPE_WR_CLIENT_DISP_C].format = ctx->q_data_dst.fmt->packer_format;
> + stripe->dst[OPE_WR_CLIENT_DISP_C].width = width * 2;
> + stripe->dst[OPE_WR_CLIENT_DISP_C].height = height;
> +
> + switch (ctx->q_data_dst.fmt->fourcc) {
> + case V4L2_PIX_FMT_NV42:
> + case V4L2_PIX_FMT_NV24: /* YUV 4:4:4 */
> + stripe->dst[OPE_WR_CLIENT_DISP_C].stride = img_width * 2;
> + break;
> + case V4L2_PIX_FMT_GREY: /* No UV */
> + stripe->dst[OPE_WR_CLIENT_DISP_C].enabled = false;
> + break;
> + default:
> + stripe->dst[OPE_WR_CLIENT_DISP_C].stride = img_width;
> + }
> +}
> +
> +static void ope_gen_stripe_dsc(struct ope_ctx *ctx, struct ope_stripe *stripe,
> + unsigned int h_scale, unsigned int v_scale)
> +{
> + struct ope_dsc_config *dsc_c, *dsc_y;
> +
> + dsc_c = &stripe->dsc[OPE_DS_C_DISP];
> + dsc_y = &stripe->dsc[OPE_DS_Y_DISP];
> +
> + dsc_c->phase_step_h = dsc_y->phase_step_h = h_scale;
> + dsc_c->phase_step_v = dsc_y->phase_step_v = v_scale;
> +
> + dsc_c->input_width = stripe->dsc[OPE_DS_C_PRE].output_width;
> + dsc_c->input_height = stripe->dsc[OPE_DS_C_PRE].output_height;
> +
> + dsc_y->input_width = stripe->src.width;
> + dsc_y->input_height = stripe->src.height;
> +
> + dsc_c->output_width = DS_OUTPUT_PIX(dsc_c->input_width, 0, h_scale);
> + dsc_c->output_height = DS_OUTPUT_PIX(dsc_c->input_height, 0, v_scale);
> +
> + dsc_y->output_width = DS_OUTPUT_PIX(dsc_y->input_width, 0, h_scale);
> + dsc_y->output_height = DS_OUTPUT_PIX(dsc_y->input_height, 0, v_scale);
> +
> + /* Adjust initial phase ? */
> +}
> +
> +static void ope_gen_stripe_chroma_dsc(struct ope_ctx *ctx, struct ope_stripe *stripe)
> +{
> + struct ope_dsc_config *dsc;
> +
> + dsc = &stripe->dsc[OPE_DS_C_PRE];
> +
> + dsc->input_width = stripe->src.width;
> + dsc->input_height = stripe->src.height;
> +
> + switch (ctx->q_data_dst.fmt->fourcc) {
> + case V4L2_PIX_FMT_NV61:
> + case V4L2_PIX_FMT_NV16:
> + dsc->output_width = dsc->input_width / 2;
> + dsc->output_height = dsc->input_height;
> + break;
> + case V4L2_PIX_FMT_NV12:
> + case V4L2_PIX_FMT_NV21:
> + dsc->output_width = dsc->input_width / 2;
> + dsc->output_height = dsc->input_height / 2;
> + break;
> + default:
> + dsc->output_width = dsc->input_width;
> + dsc->output_height = dsc->input_height;
> + }
> +
> + dsc->phase_step_h = DS_Q21(dsc->input_width, dsc->output_width);
> + dsc->phase_step_v = DS_Q21(dsc->input_height, dsc->output_height);
> +}
> +
> +static void ope_gen_stripes(struct ope_ctx *ctx, dma_addr_t src, dma_addr_t dst)
> +{
> + const struct ope_fmt *src_fmt = ctx->q_data_src.fmt;
> + const struct ope_fmt *dst_fmt = ctx->q_data_dst.fmt;
> + unsigned int num_stripes, width, i;
> + unsigned int h_scale, v_scale;
> +
> + width = ctx->q_data_src.width;
> + num_stripes = DIV_ROUND_UP(ctx->q_data_src.width, OPE_STRIPE_MAX_W);
> + h_scale = DS_Q21(ctx->q_data_src.width, ctx->q_data_dst.width);
> + v_scale = DS_Q21(ctx->q_data_src.height, ctx->q_data_dst.height);
> +
> + for (i = 0; i < num_stripes; i++) {
> + struct ope_stripe *stripe = &ctx->stripe[i];
> +
> + /* Clear config */
> + memset(stripe, 0, sizeof(*stripe));
> +
> + /* Fetch Engine */
> + stripe->src.addr = src;
> + stripe->src.width = width;
> + stripe->src.height = ctx->q_data_src.height;
> + stripe->src.stride = ctx->q_data_src.bytesperline;
> + stripe->src.location = ope_stripe_location(i, num_stripes);
> + stripe->src.pattern = src_fmt->pattern;
> + stripe->src.format = src_fmt->unpacker_format;
> +
> + /* Ensure the last stripe will be large enough */
> + if (width > OPE_STRIPE_MAX_W && width < (OPE_STRIPE_MAX_W + OPE_STRIPE_MIN_W))
> + stripe->src.width -= OPE_STRIPE_MIN_W * 2;
> +
> + v4l_bound_align_image(&stripe->src.width, src_fmt->align,
> + OPE_STRIPE_MAX_W, src_fmt->align,
> + &stripe->src.height, OPE_STRIPE_MIN_H, OPE_STRIPE_MAX_H,
> + OPE_ALIGN_H, 0);
> +
> + width -= stripe->src.width;
> + src += stripe->src.width * src_fmt->depth / 8;
> +
> + if (ope_pix_fmt_is_yuv(dst_fmt->fourcc)) {
> + /* YUV Chroma downscaling */
> + ope_gen_stripe_chroma_dsc(ctx, stripe);
> +
> + /* YUV downscaling */
> + ope_gen_stripe_dsc(ctx, stripe, h_scale, v_scale);
> +
> + /* Write Engines */
> + ope_gen_stripe_yuv_dst(ctx, stripe, dst);
> + } else {
> + ope_gen_stripe_argb_dst(ctx, stripe, dst);
> + }
> +
> + /* Source width is in byte unit, not pixel */
> + stripe->src.width = stripe->src.width * src_fmt->depth / 8;
> +
> + ope_dbg_print_stripe(ctx, stripe);
> + }
> +}
> +
> +static void ope_prog_rgb2yuv(struct ope_dev *ope)
> +{
> + /* Default RGB to YUV - no special effect - CF BT.601 */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0_V0, 0x04d) | /* R to Y 0.299 12sQ8 */
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_0_V1, 0x096)); /* G to Y 0.587 12sQ8 */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_2,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_2_V2, 0x01d)); /* B to Y 0.114 12sQ8 */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_1,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_LUMA_CFG_1_K, 0)); /* Y offset 0 9sQ0 */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG_AP, 0x0e6) |
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_A_CFG_AM, 0x0e6)); /* 0.886 12sQ8 (Cb) */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG_BP, 0xfb3) |
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_B_CFG_BM, 0xfb3)); /* -0.338 12sQ8 (Cb) */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG_CP, 0xb3) |
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_C_CFG_CM, 0xb3)); /* 0.701 12sQ8 (Cr) */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG_DP, 0xfe3) |
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_COEFF_D_CFG_DM, 0xfe3)); /* -0.114 12sQ8 (Cr) */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_1,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_1_KCR, 128)); /* KCR 128 11s */
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_0,
> + FIELD_PREP(OPE_PP_CLC_CHROMA_ENHAN_CHROMA_CFG_0_KCB, 128)); /* KCB 128 11s */
> +
> + ope_write_pp(ope, OPE_PP_CLC_CHROMA_ENHAN_MODULE_CFG,
> + OPE_PP_CLC_CHROMA_ENHAN_MODULE_CFG_EN);
> +}
> +
> +static void ope_prog_bayer2rgb(struct ope_dev *ope)
> +{
> + /* Fixed Settings */
> + ope_write_pp(ope, 0x860, 0x4001);
> + ope_write_pp(ope, 0x868, 128);
> + ope_write_pp(ope, 0x86c, 128 << 20);
> + ope_write_pp(ope, 0x870, 102);
What are the magic numbers about ? Please define bit-fields and offsets.
Parameters passed in from user-space/libcamera and then translated to
registers etc.
> +}
> +
> +static void ope_prog_wb(struct ope_dev *ope)
> +{
> + /* Default white balance config */
> + u32 g_gain = OPE_WB(1, 1);
> + u32 b_gain = OPE_WB(3, 2);
> + u32 r_gain = OPE_WB(3, 2);
> +
> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(0), g_gain);
> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(1), b_gain);
> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(2), r_gain);
> +
> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_MODULE_CFG, OPE_PP_CLC_WB_GAIN_MODULE_CFG_EN);
> +}
Fixed gains will have to come from real data.
> +
> +static void ope_prog_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
> +{
> + struct ope_dev *ope = ctx->ope;
> + int i;
> +
> + dev_dbg(ope->dev, "Context %p - Programming S%u\n", ctx, ope_stripe_index(ctx, stripe));
> +
> + /* Fetch Engine */
> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0, stripe->src.format);
> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE,
> + (stripe->src.width << 16) + stripe->src.height);
> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_ADDR_IMAGE, stripe->src.addr);
> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_STRIDE, stripe->src.stride);
> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CCIF_META_DATA,
> + FIELD_PREP(OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN, stripe->src.pattern));
> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CORE_CFG, OPE_BUS_RD_CLIENT_0_CORE_CFG_EN);
> +
> + /* Write Engines */
> + for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
> + if (!stripe->dst[i].enabled) {
> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i), 0);
> + continue;
> + }
> +
> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_ADDR_IMAGE(i), stripe->dst[i].addr);
> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_0(i),
> + (stripe->dst[i].height << 16) + stripe->dst[i].width);
> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_1(i), stripe->dst[i].x_init);
> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_2(i), stripe->dst[i].stride);
> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_PACKER_CFG(i), stripe->dst[i].format);
> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i),
> + OPE_BUS_WR_CLIENT_CFG_EN + OPE_BUS_WR_CLIENT_CFG_AUTORECOVER);
> + }
> +
> + /* Downscalers */
> + for (i = 0; i < OPE_DS_MAX; i++) {
> + struct ope_dsc_config *dsc = &stripe->dsc[i];
> + u32 base = ope_ds_base[i];
> + u32 cfg = 0;
> +
> + if (dsc->input_width != dsc->output_width) {
> + dsc->phase_step_h |= DS_RESOLUTION(dsc->input_width,
> + dsc->output_width) << 30;
> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN;
> + }
> +
> + if (dsc->input_height != dsc->output_height) {
> + dsc->phase_step_v |= DS_RESOLUTION(dsc->input_height,
> + dsc->output_height) << 30;
> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN;
> + }
> +
> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(base), cfg);
> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(base),
> + ((dsc->input_width - 1) << 16) + dsc->input_height - 1);
> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(base), dsc->phase_step_h);
> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(base), dsc->phase_step_v);
> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_CFG(base),
> + cfg ? OPE_PP_CLC_DOWNSCALE_MN_CFG_EN : 0);
> + }
> +}
So - this is where the CDM should be used - so that you don't have to do
all of these MMIO writes inside of your ISR.
Is that and additional step after the RFC ?
> +
> +/*
> + * mem2mem callbacks
> + */
> +static void ope_device_run(void *priv)
> +{
> + struct vb2_v4l2_buffer *src_buf, *dst_buf;
> + struct ope_ctx *ctx = priv;
> + struct ope_dev *ope = ctx->ope;
> + dma_addr_t src, dst;
> +
> + dev_dbg(ope->dev, "Start context %p", ctx);
> +
> + src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> + if (!src_buf)
> + return;
> +
> + dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> + if (!dst_buf)
> + return;
> +
> + src = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
> + dst = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +
> + /* Generate stripes from full frame */
> + ope_gen_stripes(ctx, src, dst);
> +
> + if (priv != ope->context) {
> + /* If context changed, reprogram the submodules */
> + ope_prog_wb(ope);
> + ope_prog_bayer2rgb(ope);
> + ope_prog_rgb2yuv(ope);
> + ope->context = priv;
> + }
> +
> + /* Program the first stripe */
> + ope_prog_stripe(ctx, &ctx->stripe[0]);
> +
> + /* Go! */
> + ope_start(ope);
> +}
> +
> +static void ope_job_done(struct ope_ctx *ctx, enum vb2_buffer_state vbstate)
> +{
> + struct vb2_v4l2_buffer *src, *dst;
> +
> + if (!ctx)
> + return;
> +
> + src = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> + dst = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> +
> + if (dst && src)
> + dst->vb2_buf.timestamp = src->vb2_buf.timestamp;
> +
> + if (src)
> + v4l2_m2m_buf_done(src, vbstate);
> + if (dst)
> + v4l2_m2m_buf_done(dst, vbstate);
> +
> + v4l2_m2m_job_finish(ctx->ope->m2m_dev, ctx->fh.m2m_ctx);
> +}
> +
> +static void ope_buf_done(struct ope_ctx *ctx)
> +{
> + struct ope_stripe *stripe = ope_current_stripe(ctx);
> +
> + if (!ctx)
> + return;
> +
> + dev_dbg(ctx->ope->dev, "Context %p Stripe %u done\n",
> + ctx, ope_stripe_index(ctx, stripe));
> +
> + if (ope_stripe_is_last(stripe)) {
> + ctx->current_stripe = 0;
> + ope_job_done(ctx, VB2_BUF_STATE_DONE);
> + } else {
> + ctx->current_stripe++;
> + ope_start(ctx->ope);
> + }
> +}
> +
> +static void ope_job_abort(void *priv)
> +{
> + struct ope_ctx *ctx = priv;
> +
> + /* reset to abort */
> + ope_write(ctx->ope, OPE_TOP_RESET_CMD, OPE_TOP_RESET_CMD_SW);
> +}
Shoudln't this wait for ope_job_done() ?
> +
> +static void ope_rup_done(struct ope_ctx *ctx)
> +{
> + struct ope_stripe *stripe = ope_current_stripe(ctx);
> +
> + /* We can program next stripe (double buffered registers) */
> + if (!ope_stripe_is_last(stripe))
> + ope_prog_stripe(ctx, ++stripe);
> +}
> +
> +/*
> + * interrupt handler
> + */
> +static void ope_fe_irq(struct ope_dev *ope)
> +{
> + u32 status = ope_read_rd(ope, OPE_BUS_RD_INPUT_IF_IRQ_STATUS);
> +
> + ope_write_rd(ope, OPE_BUS_RD_INPUT_IF_IRQ_CLEAR, status);
> + ope_write_rd(ope, OPE_BUS_RD_INPUT_IF_IRQ_CMD, OPE_BUS_RD_INPUT_IF_IRQ_CMD_CLEAR);
> +
> + /* Nothing to do */
> +}
> +
> +static void ope_we_irq(struct ope_ctx *ctx)
> +{
> + struct ope_dev *ope;
> + u32 status0;
> +
> + if (!ctx) {
> + pr_err("Instance released before the end of transaction\n");
> + return;
> + }
> +
> + ope = ctx->ope;
> +
> + status0 = ope_read_wr(ope, OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0);
> + ope_write_wr(ope, OPE_BUS_WR_INPUT_IF_IRQ_CLEAR_0, status0);
> + ope_write_wr(ope, OPE_BUS_WR_INPUT_IF_IRQ_CMD, OPE_BUS_WR_INPUT_IF_IRQ_CMD_CLEAR);
> +
> + if (status0 & OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_CONS_VIOL) {
> + dev_err_ratelimited(ope->dev, "Write Engine configuration violates constrains\n");
> + ope_job_abort(ctx);
> + }
> +
> + if (status0 & OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_IMG_SZ_VIOL) {
> + u32 status = ope_read_wr(ope, OPE_BUS_WR_IMAGE_SIZE_VIOLATION_STATUS);
> + int i;
> +
> + for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
> + if (BIT(i) & status)
> + dev_err_ratelimited(ope->dev,
> + "Write Engine (WE%d) image size violation\n", i);
> + }
> +
> + ope_job_abort(ctx);
> + }
> +
> + if (status0 & OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_VIOL) {
> + dev_err_ratelimited(ope->dev, "Write Engine fatal violation\n");
> + ope_job_abort(ctx);
> + }
> +
> + if (status0 & OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_RUP_DONE)
> + ope_rup_done(ctx);
> +}
> +
> +static irqreturn_t ope_irq(int irq, void *dev_id)
> +{
> + struct ope_dev *ope = dev_id;
> + struct ope_ctx *ctx = ope->m2m_dev ? v4l2_m2m_get_curr_priv(ope->m2m_dev) : NULL;
You have a mutex for this pointer but it doesn't seem to be in-use here
Should this be a threadded IRQ with reference to that mutex then ?
> + u32 status;
> +
> + status = ope_read(ope, OPE_TOP_IRQ_STATUS);
> + ope_write(ope, OPE_TOP_IRQ_CLEAR, status);
> + ope_write(ope, OPE_TOP_IRQ_CMD, OPE_TOP_IRQ_CMD_CLEAR);
> +
> + if (status & OPE_TOP_IRQ_STATUS_RST_DONE) {
> + ope_job_done(ctx, VB2_BUF_STATE_ERROR);
> + complete(&ope->reset_complete);
> + }
> +
> + if (status & OPE_TOP_IRQ_STATUS_VIOL) {
> + u32 violation = ope_read(ope, OPE_TOP_VIOLATION_STATUS);
> +
> + dev_warn(ope->dev, "OPE Violation: %u", violation);
> + }
> +
> + if (status & OPE_TOP_IRQ_STATUS_FE)
> + ope_fe_irq(ope);
> +
> + if (status & OPE_TOP_IRQ_STATUS_WE)
> + ope_we_irq(ctx);
> +
> + if (status & OPE_TOP_IRQ_STATUS_IDLE)
> + ope_buf_done(ctx);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static void ope_irq_init(struct ope_dev *ope)
> +{
> + ope_write(ope, OPE_TOP_IRQ_MASK,
> + OPE_TOP_IRQ_STATUS_RST_DONE | OPE_TOP_IRQ_STATUS_WE |
> + OPE_TOP_IRQ_STATUS_VIOL | OPE_TOP_IRQ_STATUS_IDLE);
> +
> + ope_write_wr(ope, OPE_BUS_WR_INPUT_IF_IRQ_MASK_0,
> + OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_RUP_DONE |
> + OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_CONS_VIOL |
> + OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_VIOL |
> + OPE_BUS_WR_INPUT_IF_IRQ_STATUS_0_IMG_SZ_VIOL);
> +}
> +
> +/*
> + * video ioctls
> + */
> +static int ope_querycap(struct file *file, void *priv, struct v4l2_capability *cap)
> +{
> + strscpy(cap->driver, MEM2MEM_NAME, sizeof(cap->driver));
> + strscpy(cap->card, "Qualcomm Offline Processing Engine", sizeof(cap->card));
> + return 0;
> +}
> +
> +static int ope_enum_fmt(struct v4l2_fmtdesc *f, u32 type)
> +{
> + const struct ope_fmt *fmt;
> + int i, num = 0;
> +
> + for (i = 0; i < OPE_NUM_FORMATS; ++i) {
> + if (formats[i].types & type) {
> + if (num == f->index)
> + break;
> + ++num;
> + }
> + }
> +
> + if (i < OPE_NUM_FORMATS) {
> + fmt = &formats[i];
> + f->pixelformat = fmt->fourcc;
> + return 0;
> + }
> +
> + return -EINVAL;
> +}
> +
> +static int ope_enum_fmt_vid_cap(struct file *file, void *priv,
> + struct v4l2_fmtdesc *f)
> +{
> + f->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
> +
> + return ope_enum_fmt(f, MEM2MEM_CAPTURE);
> +}
> +
> +static int ope_enum_fmt_vid_out(struct file *file, void *priv,
> + struct v4l2_fmtdesc *f)
> +{
> + f->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
> +
> + return ope_enum_fmt(f, MEM2MEM_OUTPUT);
> +}
> +
> +static int ope_g_fmt(struct ope_ctx *ctx, struct v4l2_format *f)
> +{
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + struct ope_q_data *q_data;
> + struct vb2_queue *vq;
> +
> + vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> + if (!vq)
> + return -EINVAL;
> +
> + q_data = get_q_data(ctx, f->type);
> +
> + pix_mp->width = q_data->width;
> + pix_mp->height = q_data->height;
> + pix_mp->pixelformat = q_data->fmt->fourcc;
> + pix_mp->num_planes = 1;
> + pix_mp->field = V4L2_FIELD_NONE;
> + pix_mp->colorspace = ctx->colorspace;
> + pix_mp->xfer_func = ctx->xfer_func;
> + pix_mp->ycbcr_enc = q_data->ycbcr_enc;
> + pix_mp->quantization = q_data->quant;
> + pix_mp->plane_fmt[0].bytesperline = q_data->bytesperline;
> + pix_mp->plane_fmt[0].sizeimage = q_data->sizeimage;
> +
> + return 0;
> +}
> +
> +static int ope_g_fmt_vid_out(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + return ope_g_fmt(file2ctx(file), f);
> +}
> +
> +static int ope_g_fmt_vid_cap(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + return ope_g_fmt(file2ctx(file), f);
> +}
> +
> +static int ope_try_fmt(struct v4l2_format *f, const struct ope_fmt *fmt)
> +{
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + unsigned int stride = pix_mp->plane_fmt[0].bytesperline;
> + unsigned int size;
> +
> + pix_mp->num_planes = 1;
> + pix_mp->field = V4L2_FIELD_NONE;
> +
> + v4l_bound_align_image(&pix_mp->width, OPE_MIN_W, OPE_MAX_W, fmt->align,
> + &pix_mp->height, OPE_MIN_H, OPE_MAX_H, OPE_ALIGN_H, 0);
> +
> + if (ope_pix_fmt_is_yuv(pix_mp->pixelformat)) {
> + stride = MAX(pix_mp->width, stride);
> + size = fmt->depth * pix_mp->width / 8 * pix_mp->height;
> + } else {
> + stride = MAX(pix_mp->width * fmt->depth / 8, stride);
> + size = stride * pix_mp->height;
> + }
> +
> + pix_mp->plane_fmt[0].bytesperline = stride;
> + pix_mp->plane_fmt[0].sizeimage = size;
> +
> + return 0;
> +}
> +
> +static int ope_try_fmt_vid_cap(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + struct ope_ctx *ctx = file2ctx(file);
> + const struct ope_fmt *fmt;
> + int ret;
> +
> + dev_dbg(ctx->ope->dev, "Try capture format: %ux%u-%s (planes:%u bpl:%u size:%u)\n",
> + pix_mp->width, pix_mp->height, print_fourcc(pix_mp->pixelformat),
> + pix_mp->num_planes, pix_mp->plane_fmt[0].bytesperline,
> + pix_mp->plane_fmt[0].sizeimage);
> +
> + fmt = find_format(pix_mp->pixelformat);
> + if (!fmt) {
> + pix_mp->pixelformat = ctx->q_data_dst.fmt->fourcc;
> + fmt = ctx->q_data_dst.fmt;
> + }
> +
> + if (!(fmt->types & MEM2MEM_CAPTURE) && (fmt != ctx->q_data_src.fmt))
> + return -EINVAL;
> +
> + if (pix_mp->width > ctx->q_data_src.width ||
> + pix_mp->height > ctx->q_data_src.height) {
> + pix_mp->width = ctx->q_data_src.width;
> + pix_mp->height = ctx->q_data_src.height;
> + }
> +
> + pix_mp->colorspace = ope_pix_fmt_is_yuv(pix_mp->pixelformat) ?
> + ctx->colorspace : V4L2_COLORSPACE_RAW;
> + pix_mp->xfer_func = ctx->xfer_func;
> + pix_mp->ycbcr_enc = V4L2_MAP_YCBCR_ENC_DEFAULT(pix_mp->colorspace);
> + pix_mp->quantization = V4L2_MAP_QUANTIZATION_DEFAULT(true,
> + pix_mp->colorspace, pix_mp->ycbcr_enc);
> +
> + ret = ope_try_fmt(f, fmt);
> +
> + dev_dbg(ctx->ope->dev, "Fixed capture format: %ux%u-%s (planes:%u bpl:%u size:%u)\n",
> + pix_mp->width, pix_mp->height, print_fourcc(pix_mp->pixelformat),
> + pix_mp->num_planes, pix_mp->plane_fmt[0].bytesperline,
> + pix_mp->plane_fmt[0].sizeimage);
> +
> + return ret;
> +}
> +
> +static int ope_try_fmt_vid_out(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + const struct ope_fmt *fmt;
> + struct ope_ctx *ctx = file2ctx(file);
> + int ret;
> +
> + dev_dbg(ctx->ope->dev, "Try output format: %ux%u-%s (planes:%u bpl:%u size:%u)\n",
> + pix_mp->width, pix_mp->height, print_fourcc(pix_mp->pixelformat),
> + pix_mp->num_planes, pix_mp->plane_fmt[0].bytesperline,
> + pix_mp->plane_fmt[0].sizeimage);
> +
> + fmt = find_format(pix_mp->pixelformat);
> + if (!fmt) {
> + pix_mp->pixelformat = ctx->q_data_src.fmt->fourcc;
> + fmt = ctx->q_data_src.fmt;
> + }
> + if (!(fmt->types & MEM2MEM_OUTPUT))
> + return -EINVAL;
> +
> + if (!pix_mp->colorspace)
> + pix_mp->colorspace = V4L2_COLORSPACE_SRGB;
> +
> + pix_mp->ycbcr_enc = V4L2_MAP_YCBCR_ENC_DEFAULT(pix_mp->colorspace);
> + pix_mp->quantization = V4L2_MAP_QUANTIZATION_DEFAULT(true,
> + pix_mp->colorspace, pix_mp->ycbcr_enc);
> +
> + ret = ope_try_fmt(f, fmt);
> +
> + dev_dbg(ctx->ope->dev, "Fixed output format: %ux%u-%s (planes:%u bpl:%u size:%u)\n",
> + pix_mp->width, pix_mp->height, print_fourcc(pix_mp->pixelformat),
> + pix_mp->num_planes, pix_mp->plane_fmt[0].bytesperline,
> + pix_mp->plane_fmt[0].sizeimage);
> +
> + return ret;
> +}
> +
> +static int ope_s_fmt(struct ope_ctx *ctx, struct v4l2_format *f)
> +{
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + struct ope_q_data *q_data;
> + struct vb2_queue *vq;
> +
> + vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> + if (!vq)
> + return -EINVAL;
> +
> + q_data = get_q_data(ctx, f->type);
> + if (!q_data)
> + return -EINVAL;
> +
> + if (vb2_is_busy(vq)) {
> + v4l2_err(&ctx->ope->v4l2_dev, "%s queue busy\n", __func__);
> + return -EBUSY;
> + }
> +
> + q_data->fmt = find_format(pix_mp->pixelformat);
> + if (!q_data->fmt)
> + return -EINVAL;
> + q_data->width = pix_mp->width;
> + q_data->height = pix_mp->height;
> + q_data->bytesperline = pix_mp->plane_fmt[0].bytesperline;
> + q_data->sizeimage = pix_mp->plane_fmt[0].sizeimage;
> +
> + dev_dbg(ctx->ope->dev, "Set %s format: %ux%u %s (%u bytes)\n",
> + V4L2_TYPE_IS_OUTPUT(f->type) ? "output" : "capture",
> + q_data->width, q_data->height, print_fourcc(q_data->fmt->fourcc),
> + q_data->sizeimage);
> +
> + return 0;
> +}
> +
> +static int ope_s_fmt_vid_cap(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + struct ope_ctx *ctx = file2ctx(file);
> + int ret;
> +
> + ret = ope_try_fmt_vid_cap(file, priv, f);
> + if (ret)
> + return ret;
> +
> + ret = ope_s_fmt(file2ctx(file), f);
> + if (ret)
> + return ret;
> +
> + ctx->q_data_dst.ycbcr_enc = f->fmt.pix_mp.ycbcr_enc;
> + ctx->q_data_dst.quant = f->fmt.pix_mp.quantization;
> +
> + return 0;
> +}
> +
> +static int ope_s_fmt_vid_out(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + struct ope_ctx *ctx = file2ctx(file);
> + int ret;
> +
> + ret = ope_try_fmt_vid_out(file, priv, f);
> + if (ret)
> + return ret;
> +
> + ret = ope_s_fmt(file2ctx(file), f);
> + if (ret)
> + return ret;
> +
> + ctx->colorspace = f->fmt.pix_mp.colorspace;
> + ctx->xfer_func = f->fmt.pix_mp.xfer_func;
> + ctx->q_data_src.ycbcr_enc = f->fmt.pix_mp.ycbcr_enc;
> + ctx->q_data_src.quant = f->fmt.pix_mp.quantization;
> +
> + return 0;
> +}
> +
> +static int ope_enum_framesizes(struct file *file, void *fh,
> + struct v4l2_frmsizeenum *fsize)
> +{
> + if (fsize->index > 0)
> + return -EINVAL;
> +
> + if (!find_format(fsize->pixel_format))
> + return -EINVAL;
> +
> + fsize->type = V4L2_FRMSIZE_TYPE_STEPWISE;
> + fsize->stepwise.min_width = OPE_MIN_W;
> + fsize->stepwise.max_width = OPE_MAX_W;
> + fsize->stepwise.step_width = 1 << OPE_ALIGN_W;
> + fsize->stepwise.min_height = OPE_MIN_H;
> + fsize->stepwise.max_height = OPE_MAX_H;
> + fsize->stepwise.step_height = 1 << OPE_ALIGN_H;
> +
> + return 0;
> +}
> +
> +static int ope_enum_frameintervals(struct file *file, void *fh,
> + struct v4l2_frmivalenum *fival)
> +{
> + fival->type = V4L2_FRMIVAL_TYPE_STEPWISE;
> + fival->stepwise.min.numerator = 1;
> + fival->stepwise.min.denominator = 120;
> + fival->stepwise.max.numerator = 1;
> + fival->stepwise.max.denominator = 1;
> + fival->stepwise.step.numerator = 1;
> + fival->stepwise.step.denominator = 1;
fival->index should return -EINVAL for index > 0
should also valid width and height and pixel format
> + return 0;
> +}
> +
> +static int ope_s_ctrl(struct v4l2_ctrl *ctrl)
> +{
> + return -EINVAL;
> +}
> +
> +static const struct v4l2_ctrl_ops ope_ctrl_ops = {
> + .s_ctrl = ope_s_ctrl,
> +};
Eh - I think you can drop this ..
> +
> +static const struct v4l2_ioctl_ops ope_ioctl_ops = {
> + .vidioc_querycap = ope_querycap,
> +
> + .vidioc_enum_fmt_vid_cap = ope_enum_fmt_vid_cap,
> + .vidioc_g_fmt_vid_cap_mplane = ope_g_fmt_vid_cap,
> + .vidioc_try_fmt_vid_cap_mplane = ope_try_fmt_vid_cap,
> + .vidioc_s_fmt_vid_cap_mplane = ope_s_fmt_vid_cap,
> +
> + .vidioc_enum_fmt_vid_out = ope_enum_fmt_vid_out,
> + .vidioc_g_fmt_vid_out_mplane = ope_g_fmt_vid_out,
> + .vidioc_try_fmt_vid_out_mplane = ope_try_fmt_vid_out,
> + .vidioc_s_fmt_vid_out_mplane = ope_s_fmt_vid_out,
> +
> + .vidioc_enum_framesizes = ope_enum_framesizes,
> + .vidioc_enum_frameintervals = ope_enum_frameintervals,
> +
> + .vidioc_reqbufs = v4l2_m2m_ioctl_reqbufs,
> + .vidioc_querybuf = v4l2_m2m_ioctl_querybuf,
> + .vidioc_qbuf = v4l2_m2m_ioctl_qbuf,
> + .vidioc_dqbuf = v4l2_m2m_ioctl_dqbuf,
> + .vidioc_prepare_buf = v4l2_m2m_ioctl_prepare_buf,
> + .vidioc_create_bufs = v4l2_m2m_ioctl_create_bufs,
> + .vidioc_expbuf = v4l2_m2m_ioctl_expbuf,
> +
> + .vidioc_streamon = v4l2_m2m_ioctl_streamon,
> + .vidioc_streamoff = v4l2_m2m_ioctl_streamoff,
> +
> + .vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
> + .vidioc_unsubscribe_event = v4l2_event_unsubscribe,
> +};
> +
> +/*
> + * Queue operations
> + */
> +static int ope_queue_setup(struct vb2_queue *vq,
> + unsigned int *nbuffers, unsigned int *nplanes,
> + unsigned int sizes[], struct device *alloc_devs[])
> +{
> + struct ope_ctx *ctx = vb2_get_drv_priv(vq);
> + struct ope_q_data *q_data = get_q_data(ctx, vq->type);
> + unsigned int size = q_data->sizeimage;
> +
> + if (*nplanes) {
> + if (*nplanes != 1)
> + return -EINVAL;
> + } else {
> + *nplanes = 1;
> + }
> +
> + if (sizes[0]) {
> + if (sizes[0] < size)
> + return -EINVAL;
> + } else {
> + sizes[0] = size;
> + }
> +
> + dev_dbg(ctx->ope->dev, "get %d buffer(s) of size %d each.\n", *nbuffers, size);
> +
> + return 0;
> +}
> +
> +static int ope_buf_prepare(struct vb2_buffer *vb)
> +{
> + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
> + struct ope_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> + struct ope_dev *ope = ctx->ope;
> + struct ope_q_data *q_data;
> +
> + q_data = get_q_data(ctx, vb->vb2_queue->type);
> + if (V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)) {
> + if (vbuf->field == V4L2_FIELD_ANY)
> + vbuf->field = V4L2_FIELD_NONE;
> + if (vbuf->field != V4L2_FIELD_NONE) {
> + v4l2_err(&ope->v4l2_dev, "Field isn't supported\n");
> + return -EINVAL;
> + }
> + }
> +
> + if (vb2_plane_size(vb, 0) < q_data->sizeimage) {
> + v4l2_err(&ope->v4l2_dev, "Data will not fit into plane (%lu < %lu)\n",
> + vb2_plane_size(vb, 0), (long)q_data->sizeimage);
> + return -EINVAL;
> + }
> +
> + if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type))
> + vb2_set_plane_payload(vb, 0, q_data->sizeimage);
> +
> + vbuf->sequence = q_data->sequence++;
> +
> + return 0;
> +}
> +
> +static void ope_adjust_power(struct ope_dev *ope)
> +{
> + int ret;
> + unsigned long pixclk = 0;
> + unsigned int loadavg = 0;
> + unsigned int loadpeak = 0;
> + unsigned int loadconfig = 0;
> + struct ope_ctx *ctx;
> +
> + lockdep_assert_held(&ope->mutex);
> +
> + list_for_each_entry(ctx, &ope->ctx_list, list) {
> + if (!ctx->started)
> + continue;
> +
> + if (!ctx->framerate)
> + ctx->framerate = DEFAULT_FRAMERATE;
> +
> + pixclk += __q_data_pixclk(&ctx->q_data_src, ctx->framerate);
> + loadavg += __q_data_load_avg(&ctx->q_data_src, ctx->framerate);
> + loadavg += __q_data_load_avg(&ctx->q_data_dst, ctx->framerate);
> + loadpeak += __q_data_load_peak(&ctx->q_data_src, ctx->framerate);
> + loadpeak += __q_data_load_peak(&ctx->q_data_dst, ctx->framerate);
> + loadconfig += __q_data_load_config(&ctx->q_data_src, ctx->framerate);
> + }
> +
> + /* 30% margin for overhead */
> + pixclk = mult_frac(pixclk, 13, 10);
> +
> + dev_dbg(ope->dev, "Adjusting clock:%luHz avg:%uKBps peak:%uKBps config:%uKBps\n",
> + pixclk, loadavg, loadpeak, loadconfig);
> +
> + ret = dev_pm_opp_set_rate(ope->dev, pixclk);
> + if (ret)
> + dev_warn(ope->dev, "Failed to adjust OPP rate: %d\n", ret);
> +
> + ret = icc_set_bw(ope->icc_data, loadavg, loadpeak);
> + if (ret)
> + dev_warn(ope->dev, "Failed to set data path bandwidth: %d\n", ret);
> +
> + ret = icc_set_bw(ope->icc_config, loadconfig, loadconfig * 5);
> + if (ret)
> + dev_warn(ope->dev, "Failed to set config path bandwidth: %d\n", ret);
> +}
> +
> +static void ope_buf_queue(struct vb2_buffer *vb)
> +{
> + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
> + struct ope_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +
> + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
> +}
> +
> +static int ope_start_streaming(struct vb2_queue *q, unsigned int count)
> +{
> + struct ope_ctx *ctx = vb2_get_drv_priv(q);
> + struct ope_dev *ope = ctx->ope;
> + struct ope_q_data *q_data;
> + int ret;
> +
> + dev_dbg(ope->dev, "Start streaming (ctx %p/%u)\n", ctx, q->type);
> +
> + lockdep_assert_held(&ope->mutex);
> +
> + q_data = get_q_data(ctx, q->type);
> + q_data->sequence = 0;
> +
> + if (V4L2_TYPE_IS_OUTPUT(q->type)) {
> + ctx->started = true;
> + ope_adjust_power(ctx->ope);
> + }
> +
> + ret = pm_runtime_resume_and_get(ctx->ope->dev);
> + if (ret) {
> + dev_err(ope->dev, "Could not resume\n");
> + return ret;
> + }
> +
> + ope_irq_init(ope);
> +
> + return 0;
> +}
> +
> +static void ope_stop_streaming(struct vb2_queue *q)
> +{
> + struct ope_ctx *ctx = vb2_get_drv_priv(q);
> + struct ope_dev *ope = ctx->ope;
> + struct vb2_v4l2_buffer *vbuf;
> +
> + dev_dbg(ctx->ope->dev, "Stop streaming (ctx %p/%u)\n", ctx, q->type);
> +
> + lockdep_assert_held(&ope->mutex);
> +
> + if (ope->context == ctx)
> + ope->context = NULL;
> +
> + if (V4L2_TYPE_IS_OUTPUT(q->type)) {
> + ctx->started = false;
> + ope_adjust_power(ctx->ope);
> + }
> +
> + pm_runtime_put(ctx->ope->dev);
> +
> + for (;;) {
> + if (V4L2_TYPE_IS_OUTPUT(q->type))
> + vbuf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> + else
> + vbuf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> + if (vbuf == NULL)
> + return;
> +
> + v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_ERROR);
> + }
> +}
> +
> +static const struct vb2_ops ope_qops = {
> + .queue_setup = ope_queue_setup,
> + .buf_prepare = ope_buf_prepare,
> + .buf_queue = ope_buf_queue,
> + .start_streaming = ope_start_streaming,
> + .stop_streaming = ope_stop_streaming,
> +};
> +
> +static int queue_init(void *priv, struct vb2_queue *src_vq,
> + struct vb2_queue *dst_vq)
> +{
> + struct ope_ctx *ctx = priv;
> + int ret;
> +
> + src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
> + src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> + src_vq->drv_priv = ctx;
> + src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> + src_vq->ops = &ope_qops;
> + src_vq->mem_ops = &vb2_dma_contig_memops;
> + src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> + src_vq->lock = &ctx->ope->mutex;
> + src_vq->dev = ctx->ope->v4l2_dev.dev;
> +
> + ret = vb2_queue_init(src_vq);
> + if (ret)
> + return ret;
> +
> + dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
> + dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> + dst_vq->drv_priv = ctx;
> + dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> + dst_vq->ops = &ope_qops;
> + dst_vq->mem_ops = &vb2_dma_contig_memops;
> + dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> + dst_vq->lock = &ctx->ope->mutex;
> + dst_vq->dev = ctx->ope->v4l2_dev.dev;
> +
> + return vb2_queue_init(dst_vq);
> +}
> +
> +/*
> + * File operations
> + */
> +static int ope_open(struct file *file)
> +{
> + struct ope_dev *ope = video_drvdata(file);
> + struct ope_ctx *ctx = NULL;
> + int rc = 0;
> +
> + if (mutex_lock_interruptible(&ope->mutex))
> + return -ERESTARTSYS;
> +
> + ctx = kvzalloc(sizeof(*ctx), GFP_KERNEL);
> + if (!ctx) {
> + rc = -ENOMEM;
> + goto open_unlock;
> + }
> +
> + v4l2_fh_init(&ctx->fh, video_devdata(file));
> + file->private_data = &ctx->fh;
> + ctx->ope = ope;
> + ctx->colorspace = V4L2_COLORSPACE_RAW;
> +
> + ctx->q_data_src.fmt = find_format(V4L2_PIX_FMT_SRGGB8);
> + ctx->q_data_src.width = 640;
> + ctx->q_data_src.height = 480;
> + ctx->q_data_src.bytesperline = 640;
> + ctx->q_data_src.sizeimage = 640 * 480;
> +
> + ctx->q_data_dst.fmt = find_format(V4L2_PIX_FMT_NV12);
> + ctx->q_data_dst.width = 640;
> + ctx->q_data_dst.height = 480;
> + ctx->q_data_dst.bytesperline = 640;
> + ctx->q_data_dst.sizeimage = 640 * 480 * 3 / 2;
> +
> + ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(ope->m2m_dev, ctx, &queue_init);
> + if (IS_ERR(ctx->fh.m2m_ctx)) {
> + rc = PTR_ERR(ctx->fh.m2m_ctx);
> + v4l2_fh_exit(&ctx->fh);
> + kvfree(ctx);
> + goto open_unlock;
> + }
> +
> + v4l2_fh_add(&ctx->fh, file);
> +
> + list_add(&ctx->list, &ope->ctx_list);
> +
> + dev_dbg(ope->dev, "Created ctx %p\n", ctx);
> +
> +open_unlock:
> + mutex_unlock(&ope->mutex);
> + return rc;
> +}
> +
> +static int ope_release(struct file *file)
> +{
> + struct ope_dev *ope = video_drvdata(file);
> + struct ope_ctx *ctx = file2ctx(file);
> +
> + dev_dbg(ope->dev, "Releasing ctx %p\n", ctx);
> +
> + guard(mutex)(&ope->mutex);
> +
> + if (ope->context == ctx)
> + ope->context = NULL;
> +
> + list_del(&ctx->list);
> + v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> + v4l2_fh_del(&ctx->fh, file);
> + v4l2_fh_exit(&ctx->fh);
> + kvfree(ctx);
> +
> + return 0;
> +}
> +
> +static const struct v4l2_file_operations ope_fops = {
> + .owner = THIS_MODULE,
> + .open = ope_open,
> + .release = ope_release,
> + .poll = v4l2_m2m_fop_poll,
> + .unlocked_ioctl = video_ioctl2,
> + .mmap = v4l2_m2m_fop_mmap,
> +};
> +
> +static const struct video_device ope_videodev = {
> + .name = MEM2MEM_NAME,
> + .vfl_dir = VFL_DIR_M2M,
> + .fops = &ope_fops,
> + .device_caps = V4L2_CAP_STREAMING | V4L2_CAP_VIDEO_M2M_MPLANE,
> + .ioctl_ops = &ope_ioctl_ops,
> + .minor = -1,
> + .release = video_device_release_empty,
> +};
> +
> +static const struct v4l2_m2m_ops m2m_ops = {
> + .device_run = ope_device_run,
> + .job_abort = ope_job_abort,
> +};
> +
> +static int ope_soft_reset(struct ope_dev *ope)
> +{
> + u32 version;
> + int ret = 0;
> +
> + ret = pm_runtime_resume_and_get(ope->dev);
> + if (ret) {
> + dev_err(ope->dev, "Could not resume\n");
> + return ret;
> + }
> +
> + version = ope_read(ope, OPE_TOP_HW_VERSION);
> +
> + dev_dbg(ope->dev, "HW Version = %u.%u.%u\n",
> + (u32)FIELD_GET(OPE_TOP_HW_VERSION_GEN, version),
> + (u32)FIELD_GET(OPE_TOP_HW_VERSION_REV, version),
> + (u32)FIELD_GET(OPE_TOP_HW_VERSION_STEP, version));
> +
> + reinit_completion(&ope->reset_complete);
> +
> + ope_write(ope, OPE_TOP_RESET_CMD, OPE_TOP_RESET_CMD_SW);
> +
> + if (!wait_for_completion_timeout(&ope->reset_complete,
> + msecs_to_jiffies(OPE_RESET_TIMEOUT_MS))) {
> + dev_err(ope->dev, "Reset timeout\n");
> + ret = -ETIMEDOUT;
> + }
> +
> + pm_runtime_put(ope->dev);
> +
> + return ret;
> +}
> +
> +static int ope_init_power(struct ope_dev *ope)
> +{
> + struct dev_pm_domain_list *pmdomains;
> + struct device *dev = ope->dev;
> + int ret;
> +
> + ope->icc_data = devm_of_icc_get(dev, "data");
> + if (IS_ERR(ope->icc_data))
> + return dev_err_probe(dev, PTR_ERR(ope->icc_data),
> + "failed to get interconnect data path\n");
> +
> + ope->icc_config = devm_of_icc_get(dev, "config");
> + if (IS_ERR(ope->icc_config))
> + return dev_err_probe(dev, PTR_ERR(ope->icc_config),
> + "failed to get interconnect config path\n");
> +
> + /* Devices with multiple PM domains must be attached separately */
> + devm_pm_domain_attach_list(dev, NULL, &pmdomains);
> +
> + /* core clock is scaled as part of operating points */
> + ret = devm_pm_opp_set_clkname(dev, "core");
> + if (ret)
> + return ret;
> +
> + ret = devm_pm_opp_of_add_table(dev);
> + if (ret && ret != -ENODEV)
> + return dev_err_probe(dev, ret, "invalid OPP table\n");
> +
> + ret = devm_pm_runtime_enable(dev);
> + if (ret)
> + return ret;
> +
> + ret = devm_pm_clk_create(dev);
> + if (ret)
> + return ret;
> +
> + ret = of_pm_clk_add_clks(dev);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +static int ope_init_mmio(struct ope_dev *ope)
> +{
> + struct platform_device *pdev = to_platform_device(ope->dev);
> +
> + ope->base = devm_platform_ioremap_resource_byname(pdev, "top");
> + if (IS_ERR(ope->base))
> + return PTR_ERR(ope->base);
> +
> + ope->base_rd = devm_platform_ioremap_resource_byname(pdev, "bus_read");
> + if (IS_ERR(ope->base_rd))
> + return PTR_ERR(ope->base_rd);
> +
> + ope->base_wr = devm_platform_ioremap_resource_byname(pdev, "bus_write");
> + if (IS_ERR(ope->base_wr))
> + return PTR_ERR(ope->base_wr);
> +
> + ope->base_pp = devm_platform_ioremap_resource_byname(pdev, "pipeline");
> + if (IS_ERR(ope->base_pp))
> + return PTR_ERR(ope->base_pp);
> +
> + return 0;
> +}
> +
> +static int ope_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + struct video_device *vfd;
> + struct ope_dev *ope;
> + int ret, irq;
> +
> + ope = devm_kzalloc(&pdev->dev, sizeof(*ope), GFP_KERNEL);
> + if (!ope)
> + return -ENOMEM;
> +
> + ope->dev = dev;
> + init_completion(&ope->reset_complete);
> +
> + ret = ope_init_power(ope);
> + if (ret)
> + return dev_err_probe(dev, ret, "Power init failed\n");
> +
> + ret = ope_init_mmio(ope);
> + if (ret)
> + return dev_err_probe(dev, ret, "MMIO init failed\n");
> +
> + irq = platform_get_irq(pdev, 0);
> + if (irq < 0)
> + return dev_err_probe(dev, irq, "Unable to get IRQ\n");
> +
> + ret = devm_request_irq(dev, irq, ope_irq, IRQF_TRIGGER_RISING, "ope", ope);
> + if (ret < 0)
> + return dev_err_probe(dev, ret, "Requesting IRQ failed\n");
> +
> + ret = ope_soft_reset(ope);
> + if (ret < 0)
> + return ret;
> +
> + ret = v4l2_device_register(&pdev->dev, &ope->v4l2_dev);
> + if (ret)
> + return dev_err_probe(dev, ret, "Registering V4L2 device failed\n");
> +
> + mutex_init(&ope->mutex);
> + INIT_LIST_HEAD(&ope->ctx_list);
> +
> + ope->vfd = ope_videodev;
> + vfd = &ope->vfd;
> + vfd->lock = &ope->mutex;
> + vfd->v4l2_dev = &ope->v4l2_dev;
> + video_set_drvdata(vfd, ope);
> + snprintf(vfd->name, sizeof(vfd->name), "%s", ope_videodev.name);
> +
> + platform_set_drvdata(pdev, ope);
> +
> + ope->m2m_dev = v4l2_m2m_init(&m2m_ops);
> + if (IS_ERR(ope->m2m_dev)) {
> + ret = dev_err_probe(dev, PTR_ERR(ope->m2m_dev), "Failed to init mem2mem device\n");
> + goto err_unregister_v4l2;
> + }
> +
> + ret = video_register_device(vfd, VFL_TYPE_VIDEO, 0);
> + if (ret) {
> + dev_err(dev, "Failed to refgister video device\n");
> + goto err_release_m2m;
> + }
> +
> + /* TODO: Add stat device and link it to media */
> + ope->mdev.dev = dev;
> + strscpy(ope->mdev.model, MEM2MEM_NAME, sizeof(ope->mdev.model));
> + media_device_init(&ope->mdev);
> + ope->v4l2_dev.mdev = &ope->mdev;
> +
> + ret = v4l2_m2m_register_media_controller(ope->m2m_dev, vfd,
> + MEDIA_ENT_F_PROC_VIDEO_PIXEL_FORMATTER);
> + if (ret) {
> + dev_err(&pdev->dev, "Failed to register m2m media controller\n");
> + goto err_unregister_video;
> + }
> +
> + ret = media_device_register(&ope->mdev);
> + if (ret) {
> + dev_err(&pdev->dev, "Failed to register media device\n");
> + goto err_unregister_m2m_mc;
> + }
> +
> + return 0;
> +
> +err_unregister_m2m_mc:
> + v4l2_m2m_unregister_media_controller(ope->m2m_dev);
> +err_unregister_video:
> + video_unregister_device(&ope->vfd);
> +err_release_m2m:
> + v4l2_m2m_release(ope->m2m_dev);
> +err_unregister_v4l2:
> + v4l2_device_unregister(&ope->v4l2_dev);
> +
> + return ret;
> +}
> +
> +static void ope_remove(struct platform_device *pdev)
> +{
> + struct ope_dev *ope = platform_get_drvdata(pdev);
> +
> + media_device_unregister(&ope->mdev);
> + v4l2_m2m_unregister_media_controller(ope->m2m_dev);
> + video_unregister_device(&ope->vfd);
> + v4l2_m2m_release(ope->m2m_dev);
> + v4l2_device_unregister(&ope->v4l2_dev);
> +}
> +
> +static const struct of_device_id ope_dt_ids[] = {
> + { .compatible = "qcom,qcm2290-camss-ope"},
> + { },
> +};
> +MODULE_DEVICE_TABLE(of, ope_dt_ids);
> +
> +static const struct dev_pm_ops ope_pm_ops = {
> + SET_RUNTIME_PM_OPS(pm_clk_suspend, pm_clk_resume, NULL)
> +};
> +
> +static struct platform_driver ope_driver = {
> + .probe = ope_probe,
> + .remove = ope_remove,
> + .driver = {
> + .name = MEM2MEM_NAME,
> + .of_match_table = ope_dt_ids,
> + .pm = &ope_pm_ops,
> + },
> +};
> +
> +module_platform_driver(ope_driver);
> +
> +MODULE_DESCRIPTION("CAMSS Offline Processing Engine");
> +MODULE_AUTHOR("Loic Poulain <loic.poulain@oss.qualcomm.com>");
> +MODULE_LICENSE("GPL");
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-23 13:43 ` Bryan O'Donoghue
@ 2026-03-23 15:31 ` Loic Poulain
2026-03-24 11:00 ` Bryan O'Donoghue
0 siblings, 1 reply; 47+ messages in thread
From: Loic Poulain @ 2026-03-23 15:31 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
Hi Bryan,
On Mon, Mar 23, 2026 at 2:43 PM Bryan O'Donoghue <bod@kernel.org> wrote:
>
> On 23/03/2026 12:58, Loic Poulain wrote:
> > Provide a initial implementation for the Qualcomm Offline Processing
> > Engine (OPE). OPE is a memory-to-memory hardware block designed for
> > image processing on a source frame. Typically, the input frame
> > originates from the SoC CSI capture path, though not limited to.
> >
> > The hardware architecture consists of Fetch Engines and Write Engines,
> > connected through intermediate pipeline modules:
> > [FETCH ENGINES] => [Pipeline Modules] => [WRITE ENGINES]
> >
> > Current Configuration:
> > Fetch Engine: One fetch engine is used for Bayer frame input.
> > Write Engines: Two display write engines for Y and UV planes output.
> >
> > Enabled Pipeline Modules:
> > CLC_WB: White balance (channel gain configuration)
> > CLC_DEMO: Demosaic (Bayer to RGB conversion)
> > CLC_CHROMA_ENHAN: RGB to YUV conversion
> > CLC_DOWNSCALE*: Downscaling for UV and Y planes
> >
> > Default configuration values are based on public standards such as BT.601.
> >
> > Processing Model:
> > OPE processes frames in stripes of up to 336 pixels. Therefore, frames must
> > be split into stripes for processing. Each stripe is configured after the
> > previous one has been acquired (double buffered registers). To minimize
> > inter-stripe latency, stripe configurations are generated ahead of time.
>
> A yavata command set showing usage would be appreciated.
AFAIK, yavta does not (yet) support M2M devices, but I can probably
use an other tool.
>
> >
> > Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
> > ---
> > drivers/media/platform/qcom/camss/Makefile | 4 +
> > drivers/media/platform/qcom/camss/camss-ope.c | 2058 +++++++++++++++++
> > 2 files changed, 2062 insertions(+)
> > create mode 100644 drivers/media/platform/qcom/camss/camss-ope.c
> >
> > diff --git a/drivers/media/platform/qcom/camss/Makefile b/drivers/media/platform/qcom/camss/Makefile
> > index 5e349b491513..67f261ae0855 100644
> > --- a/drivers/media/platform/qcom/camss/Makefile
> > +++ b/drivers/media/platform/qcom/camss/Makefile
> > @@ -29,3 +29,7 @@ qcom-camss-objs += \
> > camss-format.o \
> >
> > obj-$(CONFIG_VIDEO_QCOM_CAMSS) += qcom-camss.o
> > +
> > +qcom-camss-ope-objs += camss-ope.o
> > +
> > +obj-$(CONFIG_VIDEO_QCOM_CAMSS) += qcom-camss-ope.o
>
> Needs a Kconfig entry.
ack.
> > +
> > +#define OPE_RESET_TIMEOUT_MS 100
> > +
> > +/* Expected framerate for power scaling */
> > +#define DEFAULT_FRAMERATE 60
> > +
> > +/* Downscaler helpers */
> > +#define Q21(v) (((uint64_t)(v)) << 21)
> > +#define DS_Q21(n, d) ((uint32_t)(((uint64_t)(n) << 21) / (d)))
>
> u64 and u32 here.
ok.
> > +
> > +static inline char *print_fourcc(u32 fmt)
> > +{
> > + static char code[5];
> > +
> > + code[0] = (unsigned char)(fmt & 0xff);
> > + code[1] = (unsigned char)((fmt >> 8) & 0xff);
> > + code[2] = (unsigned char)((fmt >> 16) & 0xff);
> > + code[3] = (unsigned char)((fmt >> 24) & 0xff);
> > + code[4] = '\0';
> > +
> > + return code;
> > +}
>
> This is a bug
Indeed, I will use %p4cc as you recommended in a similar series.
> > +
> > +static void ope_prog_bayer2rgb(struct ope_dev *ope)
> > +{
> > + /* Fixed Settings */
> > + ope_write_pp(ope, 0x860, 0x4001);
> > + ope_write_pp(ope, 0x868, 128);
> > + ope_write_pp(ope, 0x86c, 128 << 20);
> > + ope_write_pp(ope, 0x870, 102);
>
> What are the magic numbers about ? Please define bit-fields and offsets.
There are some registers I can't disclose today, which have to be
configured with working values,
Similarly to some sensor configuration in media/i2c.
> Parameters passed in from user-space/libcamera and then translated to
> registers etc.
The above fixed settings will not be part of the initial parameters.
>
> > +}
> > +
> > +static void ope_prog_wb(struct ope_dev *ope)
> > +{
> > + /* Default white balance config */
> > + u32 g_gain = OPE_WB(1, 1);
> > + u32 b_gain = OPE_WB(3, 2);
> > + u32 r_gain = OPE_WB(3, 2);
> > +
> > + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(0), g_gain);
> > + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(1), b_gain);
> > + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(2), r_gain);
> > +
> > + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_MODULE_CFG, OPE_PP_CLC_WB_GAIN_MODULE_CFG_EN);
> > +}
>
> Fixed gains will have to come from real data.
These gains will indeed need to be configurable, most likely via ISP
parameters, here, they have been adjusted based on colorbar test
pattern from imx219 sensors but also tested with real capture.
>
> > +
> > +static void ope_prog_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
> > +{
> > + struct ope_dev *ope = ctx->ope;
> > + int i;
> > +
> > + dev_dbg(ope->dev, "Context %p - Programming S%u\n", ctx, ope_stripe_index(ctx, stripe));
> > +
> > + /* Fetch Engine */
> > + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0, stripe->src.format);
> > + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE,
> > + (stripe->src.width << 16) + stripe->src.height);
> > + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_ADDR_IMAGE, stripe->src.addr);
> > + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_STRIDE, stripe->src.stride);
> > + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CCIF_META_DATA,
> > + FIELD_PREP(OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN, stripe->src.pattern));
> > + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CORE_CFG, OPE_BUS_RD_CLIENT_0_CORE_CFG_EN);
> > +
> > + /* Write Engines */
> > + for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
> > + if (!stripe->dst[i].enabled) {
> > + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i), 0);
> > + continue;
> > + }
> > +
> > + ope_write_wr(ope, OPE_BUS_WR_CLIENT_ADDR_IMAGE(i), stripe->dst[i].addr);
> > + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_0(i),
> > + (stripe->dst[i].height << 16) + stripe->dst[i].width);
> > + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_1(i), stripe->dst[i].x_init);
> > + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_2(i), stripe->dst[i].stride);
> > + ope_write_wr(ope, OPE_BUS_WR_CLIENT_PACKER_CFG(i), stripe->dst[i].format);
> > + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i),
> > + OPE_BUS_WR_CLIENT_CFG_EN + OPE_BUS_WR_CLIENT_CFG_AUTORECOVER);
> > + }
> > +
> > + /* Downscalers */
> > + for (i = 0; i < OPE_DS_MAX; i++) {
> > + struct ope_dsc_config *dsc = &stripe->dsc[i];
> > + u32 base = ope_ds_base[i];
> > + u32 cfg = 0;
> > +
> > + if (dsc->input_width != dsc->output_width) {
> > + dsc->phase_step_h |= DS_RESOLUTION(dsc->input_width,
> > + dsc->output_width) << 30;
> > + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN;
> > + }
> > +
> > + if (dsc->input_height != dsc->output_height) {
> > + dsc->phase_step_v |= DS_RESOLUTION(dsc->input_height,
> > + dsc->output_height) << 30;
> > + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN;
> > + }
> > +
> > + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(base), cfg);
> > + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(base),
> > + ((dsc->input_width - 1) << 16) + dsc->input_height - 1);
> > + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(base), dsc->phase_step_h);
> > + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(base), dsc->phase_step_v);
> > + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_CFG(base),
> > + cfg ? OPE_PP_CLC_DOWNSCALE_MN_CFG_EN : 0);
> > + }
> > +}
>
> So - this is where the CDM should be used - so that you don't have to do
> all of these MMIO writes inside of your ISR.
Indeed, and that also the reason stripes are computed ahead of time,
so that they can be further 'queued' in a CDM.
>
> Is that and additional step after the RFC ?
The current implementation (without CDM) already provides good results
and performance, so CDM can be viewed as a future enhancement.
As far as I understand, CDM could also be implemented in a generic way
within CAMSS, since other CAMSS blocks make use of CDM as well.
This is something we should discuss further.
>
> > +
> > +/*
> > + * mem2mem callbacks
> > + */
> > +static void ope_device_run(void *priv)
> > +{
> > + struct vb2_v4l2_buffer *src_buf, *dst_buf;
> > + struct ope_ctx *ctx = priv;
> > + struct ope_dev *ope = ctx->ope;
> > + dma_addr_t src, dst;
> > +
> > + dev_dbg(ope->dev, "Start context %p", ctx);
> > +
> > + src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> > + if (!src_buf)
> > + return;
> > +
> > + dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> > + if (!dst_buf)
> > + return;
> > +
> > + src = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
> > + dst = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> > +
> > + /* Generate stripes from full frame */
> > + ope_gen_stripes(ctx, src, dst);
> > +
> > + if (priv != ope->context) {
> > + /* If context changed, reprogram the submodules */
> > + ope_prog_wb(ope);
> > + ope_prog_bayer2rgb(ope);
> > + ope_prog_rgb2yuv(ope);
> > + ope->context = priv;
> > + }
> > +
> > + /* Program the first stripe */
> > + ope_prog_stripe(ctx, &ctx->stripe[0]);
> > +
> > + /* Go! */
> > + ope_start(ope);
> > +}
> > +
> > +static void ope_job_done(struct ope_ctx *ctx, enum vb2_buffer_state vbstate)
> > +{
> > + struct vb2_v4l2_buffer *src, *dst;
> > +
> > + if (!ctx)
> > + return;
> > +
> > + src = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> > + dst = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> > +
> > + if (dst && src)
> > + dst->vb2_buf.timestamp = src->vb2_buf.timestamp;
> > +
> > + if (src)
> > + v4l2_m2m_buf_done(src, vbstate);
> > + if (dst)
> > + v4l2_m2m_buf_done(dst, vbstate);
> > +
> > + v4l2_m2m_job_finish(ctx->ope->m2m_dev, ctx->fh.m2m_ctx);
> > +}
> > +
> > +static void ope_buf_done(struct ope_ctx *ctx)
> > +{
> > + struct ope_stripe *stripe = ope_current_stripe(ctx);
> > +
> > + if (!ctx)
> > + return;
> > +
> > + dev_dbg(ctx->ope->dev, "Context %p Stripe %u done\n",
> > + ctx, ope_stripe_index(ctx, stripe));
> > +
> > + if (ope_stripe_is_last(stripe)) {
> > + ctx->current_stripe = 0;
> > + ope_job_done(ctx, VB2_BUF_STATE_DONE);
> > + } else {
> > + ctx->current_stripe++;
> > + ope_start(ctx->ope);
> > + }
> > +}
> > +
> > +static void ope_job_abort(void *priv)
> > +{
> > + struct ope_ctx *ctx = priv;
> > +
> > + /* reset to abort */
> > + ope_write(ctx->ope, OPE_TOP_RESET_CMD, OPE_TOP_RESET_CMD_SW);
> > +}
>
> Shoudln't this wait for ope_job_done() ?
No, according to v4l2-mem2mem.h:
Informs the driver that it has to abort the currently
running transaction as soon as possible
[...]
This function does not have to (and will usually not) wait
until the device enters a state when it can be stopped.
> > +static irqreturn_t ope_irq(int irq, void *dev_id)
> > +{
> > + struct ope_dev *ope = dev_id;
> > + struct ope_ctx *ctx = ope->m2m_dev ? v4l2_m2m_get_curr_priv(ope->m2m_dev) : NULL;
>
> You have a mutex for this pointer but it doesn't seem to be in-use here
>
> Should this be a threadded IRQ with reference to that mutex then ?
We currently rely on the mem2mem framework to manage context
concurrency, and in particular to ensure that a context cannot be
released while an ope_job_done callback is still pending. This avoids
blocking on the global OPE mutex, which may be held for unrelated
operations such as creating another context.
However, there may still be unsafe paths, so an additional per-context
lock might be worth introducing.
> > +static int ope_enum_frameintervals(struct file *file, void *fh,
> > + struct v4l2_frmivalenum *fival)
> > +{
> > + fival->type = V4L2_FRMIVAL_TYPE_STEPWISE;
> > + fival->stepwise.min.numerator = 1;
> > + fival->stepwise.min.denominator = 120;
> > + fival->stepwise.max.numerator = 1;
> > + fival->stepwise.max.denominator = 1;
> > + fival->stepwise.step.numerator = 1;
> > + fival->stepwise.step.denominator = 1;
>
> fival->index should return -EINVAL for index > 0
>
> should also valid width and height and pixel format
Thanks, I will add that.
>
> > + return 0;
> > +}
> > +
> > +static int ope_s_ctrl(struct v4l2_ctrl *ctrl)
> > +{
> > + return -EINVAL;
> > +}
> > +
> > +static const struct v4l2_ctrl_ops ope_ctrl_ops = {
> > + .s_ctrl = ope_s_ctrl,
> > +};
>
> Eh - I think you can drop this ..
Indeed.
>
[...]
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE)
2026-03-23 13:03 ` Krzysztof Kozlowski
@ 2026-03-23 16:03 ` Loic Poulain
2026-03-23 16:10 ` Krzysztof Kozlowski
0 siblings, 1 reply; 47+ messages in thread
From: Loic Poulain @ 2026-03-23 16:03 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: bod, vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
Hi Krzysztof,
On Mon, Mar 23, 2026 at 2:04 PM Krzysztof Kozlowski <krzk@kernel.org> wrote:
>
> On 23/03/2026 13:58, Loic Poulain wrote:
> > Add Devicetree binding documentation for the Qualcomm Camera Subsystem
> > Offline Processing Engine (OPE) found on platforms such as Agatti.
> > The OPE is a memory-to-memory image processing block which operates
> > on frames read from and written back to system memory.
> >
> > Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
>
> I don't see explanation in cover letter why this is RFC, so I assume
> this is not ready, thus not a full review but just few nits to spare you
> resubmits later when this becomes reviewable.
>
> > ---
> > .../bindings/media/qcom,camss-ope.yaml | 86 +++++++++++++++++++
> > 1 file changed, 86 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
> > new file mode 100644
> > index 000000000000..509b4e89a88a
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
>
> Filename must match compatible.
Some bindings (for example clock/qcom,mmcc.yaml) do not strictly
follow this rule and instead use a more generic filename that groups
multiple device-specific compatibles. I mention this because my
intention with a generic filename was to allow the binding to cover
additional compatibles in the future.
As I understand it, in the current state I should either:
- rename the file so that it matches the specific compatible, e.g.
qcom,qcm2290-camss-ope.yaml, or
- keep the generic filename (qcom,camss-ope.yaml) and add a top-level
const: qcom,camss-ope compatible to justify the generic naming.
Any preferred/valid direction?
>
> > @@ -0,0 +1,86 @@
> > +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> > +%YAML 1.2
>
> ...
> > +
> > +required:
> > + - compatible
> > + - reg
> > + - reg-names
> > + - clocks
> > + - clock-names
> > + - interrupts
> > + - interconnects
> > + - interconnect-names
> > + - iommus
> > + - power-domains
> > + - power-domain-names
> > +
> > +additionalProperties: true
>
> There are no bindings like that. You cannot have here true.
ok.
>
> Also, lack of example is a no-go.
Ouch, yes. Would it make sense to have dt_binding_check catch this
kind of issue?
>
> BTW, also remember about proper versioning of your patchset. b4 would do
> that for you, but since you did not use it, you must handle it.
ack.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE)
2026-03-23 16:03 ` Loic Poulain
@ 2026-03-23 16:10 ` Krzysztof Kozlowski
0 siblings, 0 replies; 47+ messages in thread
From: Krzysztof Kozlowski @ 2026-03-23 16:10 UTC (permalink / raw)
To: Loic Poulain
Cc: bod, vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
On 23/03/2026 17:03, Loic Poulain wrote:
> Hi Krzysztof,
>
> On Mon, Mar 23, 2026 at 2:04 PM Krzysztof Kozlowski <krzk@kernel.org> wrote:
>>
>> On 23/03/2026 13:58, Loic Poulain wrote:
>>> Add Devicetree binding documentation for the Qualcomm Camera Subsystem
>>> Offline Processing Engine (OPE) found on platforms such as Agatti.
>>> The OPE is a memory-to-memory image processing block which operates
>>> on frames read from and written back to system memory.
>>>
>>> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
>>
>> I don't see explanation in cover letter why this is RFC, so I assume
>> this is not ready, thus not a full review but just few nits to spare you
>> resubmits later when this becomes reviewable.
>>
>>> ---
>>> .../bindings/media/qcom,camss-ope.yaml | 86 +++++++++++++++++++
>>> 1 file changed, 86 insertions(+)
>>> create mode 100644 Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
>>>
>>> diff --git a/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
>>> new file mode 100644
>>> index 000000000000..509b4e89a88a
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
>>
>> Filename must match compatible.
>
> Some bindings (for example clock/qcom,mmcc.yaml) do not strictly
> follow this rule and instead use a more generic filename that groups
> multiple device-specific compatibles. I mention this because my
> intention with a generic filename was to allow the binding to cover
> additional compatibles in the future.
>
> As I understand it, in the current state I should either:
> - rename the file so that it matches the specific compatible, e.g.
> qcom,qcm2290-camss-ope.yaml, or
This one.
> - keep the generic filename (qcom,camss-ope.yaml) and add a top-level
> const: qcom,camss-ope compatible to justify the generic naming.
Because this would be a reverse logic... Name of file is never an
argument/reason to add a compatible.
>
> Any preferred/valid direction?
>
>>
>>> @@ -0,0 +1,86 @@
>>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>>> +%YAML 1.2
>>
>> ...
>>> +
>>> +required:
>>> + - compatible
>>> + - reg
>>> + - reg-names
>>> + - clocks
>>> + - clock-names
>>> + - interrupts
>>> + - interconnects
>>> + - interconnect-names
>>> + - iommus
>>> + - power-domains
>>> + - power-domain-names
>>> +
>>> +additionalProperties: true
>>
>> There are no bindings like that. You cannot have here true.
>
> ok.
>
>>
>> Also, lack of example is a no-go.
>
> Ouch, yes. Would it make sense to have dt_binding_check catch this
> kind of issue?
Not sure if worth implementing. Every new binding is a copy of existing
one and 99% of them have examples, so how new binding could be created
without one? This is highly unlikely and most likely there are other
issues as well, because process is broken, so dtschema won't help.
And with LLM you can write whatever will pass dtschema but still make
not sense.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node
2026-03-23 13:33 ` Bryan O'Donoghue
@ 2026-03-23 16:15 ` Krzysztof Kozlowski
2026-03-24 10:30 ` Bryan O'Donoghue
0 siblings, 1 reply; 47+ messages in thread
From: Krzysztof Kozlowski @ 2026-03-23 16:15 UTC (permalink / raw)
To: Bryan O'Donoghue, Konrad Dybcio, Loic Poulain,
vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 23/03/2026 14:33, Bryan O'Donoghue wrote:
> On 23/03/2026 13:24, Konrad Dybcio wrote:
>> + isp_ope: isp@5c42400 {
>
> ope@5c42400 isp@ is already used.
Where? dtc would warn you on that.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node
2026-03-23 13:24 ` Konrad Dybcio
2026-03-23 13:33 ` Bryan O'Donoghue
@ 2026-03-23 16:31 ` Loic Poulain
2026-03-24 10:43 ` Konrad Dybcio
1 sibling, 1 reply; 47+ messages in thread
From: Loic Poulain @ 2026-03-23 16:31 UTC (permalink / raw)
To: Konrad Dybcio
Cc: bod, vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
Hi Konrad,
On Mon, Mar 23, 2026 at 2:24 PM Konrad Dybcio
<konrad.dybcio@oss.qualcomm.com> wrote:
>
> On 3/23/26 1:58 PM, Loic Poulain wrote:
> > Add the Qualcomm CAMSS Offline Processing Engine (OPE) node for
> > QCM2290. The OPE is a memory-to-memory image processing block used in
> > offline imaging pipelines.
> >
> > The node includes register regions, clocks, interconnects, IOMMU
> > mappings, power domains, interrupts, and an associated OPP table.
> >
> > At the moment we assign a fixed rate to GCC_CAMSS_AXI_CLK since this
> > clock is shared across multiple CAMSS components and there is currently
> > no support for dynamically scaling it.
> >
> > Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
> > ---
> > arch/arm64/boot/dts/qcom/agatti.dtsi | 72 ++++++++++++++++++++++++++++
> > 1 file changed, 72 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/agatti.dtsi b/arch/arm64/boot/dts/qcom/agatti.dtsi
> > index f9b46cf1c646..358ebfc99552 100644
> > --- a/arch/arm64/boot/dts/qcom/agatti.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/agatti.dtsi
> > @@ -1935,6 +1935,78 @@ port@1 {
> > };
> > };
> >
> > + isp_ope: isp@5c42400 {
>
> "camss_ope"? Label's don't need to be generic, but they need to be
> meaningful - currently one could assume that there's a non-ISP OPE
> as well (and I'm intentionally stretching it a bit to prove a point)
fair enough.
>
>
>
> > + compatible = "qcom,qcm2290-camss-ope";
> > +
> > + reg = <0x0 0x5c42400 0x0 0x200>,
> > + <0x0 0x5c46c00 0x0 0x190>,
> > + <0x0 0x5c46d90 0x0 0xa00>,
> > + <0x0 0x5c42800 0x0 0x4400>,
> > + <0x0 0x5c42600 0x0 0x200>;
> > + reg-names = "top",
> > + "bus_read",
> > + "bus_write",
> > + "pipeline",
> > + "qos";
>
> This is a completely arbitrary choice, but I think it's easier to compare
> against the docs if the reg entries are sorted by the 'reg' (which isn't
> always easy to do since that can very between SoCs but this module is not
> very common)
>
>
> > +
> > + clocks = <&gcc GCC_CAMSS_AXI_CLK>,
> > + <&gcc GCC_CAMSS_OPE_CLK>,
> > + <&gcc GCC_CAMSS_OPE_AHB_CLK>,
> > + <&gcc GCC_CAMSS_NRT_AXI_CLK>,
> > + <&gcc GCC_CAMSS_TOP_AHB_CLK>;
> > + clock-names = "axi", "core", "iface", "nrt", "top";
>
> Similarly, in the arbitrary choice of indices, I think putting "core"
> first is "neat"
Ok, I thought alphabetical ordering was preferred?
>
> > + assigned-clocks = <&gcc GCC_CAMSS_AXI_CLK>;
> > + assigned-clock-rates = <300000000>;
>
> I really think we shouldn't be doing this here for a clock that covers
> so much hw
Yes, so we probably need some camss framework to scale this, or move
this assigned value to camss main node for now.
>
> [...]
>
>
> > +
> > + interrupts = <GIC_SPI 209 IRQ_TYPE_EDGE_RISING>;
> > +
> > + interconnects = <&bimc MASTER_APPSS_PROC RPM_ACTIVE_TAG
> > + &config_noc SLAVE_CAMERA_CFG RPM_ACTIVE_TAG>,
> > + <&mmnrt_virt MASTER_CAMNOC_SF RPM_ALWAYS_TAG
> > + &bimc SLAVE_EBI1 RPM_ALWAYS_TAG>;
> > + interconnect-names = "config",
> > + "data";
> > +
> > + iommus = <&apps_smmu 0x820 0x0>,
> > + <&apps_smmu 0x840 0x0>;
> > +
> > + operating-points-v2 = <&ope_opp_table>;
> > + power-domains = <&gcc GCC_CAMSS_TOP_GDSC>,
>
> Moving this under camss should let you remove the TOP_GDSC and TOP_AHB (and
> perhaps some other) references
Yes, will move it and remove what we don't need anymore.
>
> > + <&rpmpd QCM2290_VDDCX>;
> > + power-domain-names = "camss",
> > + "cx";> +
> > + ope_opp_table: opp-table {
> > + compatible = "operating-points-v2";
> > +
> > + opp-19200000 {
> > + opp-hz = /bits/ 64 <19200000>;
> > + required-opps = <&rpmpd_opp_min_svs>;
> > + };
> > +
> > + opp-200000000 {
> > + opp-hz = /bits/ 64 <200000000>;
> > + required-opps = <&rpmpd_opp_svs>;
> > + };
> > +
> > + opp-266600000 {
> > + opp-hz = /bits/ 64 <266600000>;
> > + required-opps = <&rpmpd_opp_svs_plus>;
> > + };
> > +
> > + opp-465000000 {
> > + opp-hz = /bits/ 64 <465000000>;
> > + required-opps = <&rpmpd_opp_nom>;
> > + };
> > +
> > + opp-580000000 {
> > + opp-hz = /bits/ 64 <580000000>;
> > + required-opps = <&rpmpd_opp_turbo>;
> > + turbo-mode;
>
> Are we going to act on this property? Otherwise I think it's just a naming
> collision with Qualcomm's TURBO (which may? have previously??? had some
> special implications)
588 MHz is categorized as the "Max Turbo" frequency for the OPE core clock.
At some point we may want to enable this only under specific conditions.
For now, the OPE driver does not make use of this property.
Regards,
Loic
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node
2026-03-23 16:15 ` Krzysztof Kozlowski
@ 2026-03-24 10:30 ` Bryan O'Donoghue
0 siblings, 0 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-24 10:30 UTC (permalink / raw)
To: Krzysztof Kozlowski, Konrad Dybcio, Loic Poulain,
vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 23/03/2026 16:15, Krzysztof Kozlowski wrote:
>> ope@5c42400 isp@ is already used.
> Where? dtc would warn you on that.
>
> Best regards,
> Krzysztof
Oh.
It _should_ be isp@ its camss@ apparently.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node
2026-03-23 16:31 ` Loic Poulain
@ 2026-03-24 10:43 ` Konrad Dybcio
0 siblings, 0 replies; 47+ messages in thread
From: Konrad Dybcio @ 2026-03-24 10:43 UTC (permalink / raw)
To: Loic Poulain
Cc: bod, vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
On 3/23/26 5:31 PM, Loic Poulain wrote:
> Hi Konrad,
>
> On Mon, Mar 23, 2026 at 2:24 PM Konrad Dybcio
> <konrad.dybcio@oss.qualcomm.com> wrote:
>>
>> On 3/23/26 1:58 PM, Loic Poulain wrote:
>>> Add the Qualcomm CAMSS Offline Processing Engine (OPE) node for
>>> QCM2290. The OPE is a memory-to-memory image processing block used in
>>> offline imaging pipelines.
>>>
>>> The node includes register regions, clocks, interconnects, IOMMU
>>> mappings, power domains, interrupts, and an associated OPP table.
>>>
>>> At the moment we assign a fixed rate to GCC_CAMSS_AXI_CLK since this
>>> clock is shared across multiple CAMSS components and there is currently
>>> no support for dynamically scaling it.
>>>
>>> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
>>> ---
[...]
>> Similarly, in the arbitrary choice of indices, I think putting "core"
>> first is "neat"
>
> Ok, I thought alphabetical ordering was preferred?
I believe that was Vladimir's misinterpretation of the DTS coding style
(which is admittedly convoluted so I don't really blame him)
>>> + assigned-clocks = <&gcc GCC_CAMSS_AXI_CLK>;
>>> + assigned-clock-rates = <300000000>;
>>
>> I really think we shouldn't be doing this here for a clock that covers
>> so much hw
>
> Yes, so we probably need some camss framework to scale this, or move
> this assigned value to camss main node for now.
We do need some sort of a backfeeding mechanism to let camss aggregate
various requests coming from the clients if we want to prevent having to
run things at TURBO all the time, so resolving that early would be a good
idea, even if a little inconvenient..
[...]
>>> + opp-580000000 {
>>> + opp-hz = /bits/ 64 <580000000>;
>>> + required-opps = <&rpmpd_opp_turbo>;
>>> + turbo-mode;
>>
>> Are we going to act on this property? Otherwise I think it's just a naming
>> collision with Qualcomm's TURBO (which may? have previously??? had some
>> special implications)
>
> 588 MHz is categorized as the "Max Turbo" frequency for the OPE core clock.
> At some point we may want to enable this only under specific conditions.
> For now, the OPE driver does not make use of this property.
Fair, we can always get rid of it later if it turns out unnecessary
Konrad
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-23 15:31 ` Loic Poulain
@ 2026-03-24 11:00 ` Bryan O'Donoghue
2026-03-24 15:57 ` Loic Poulain
` (3 more replies)
0 siblings, 4 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-24 11:00 UTC (permalink / raw)
To: Loic Poulain
Cc: vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
On 23/03/2026 15:31, Loic Poulain wrote:
>>> +
>>> +static void ope_prog_bayer2rgb(struct ope_dev *ope)
>>> +{
>>> + /* Fixed Settings */
>>> + ope_write_pp(ope, 0x860, 0x4001);
>>> + ope_write_pp(ope, 0x868, 128);
>>> + ope_write_pp(ope, 0x86c, 128 << 20);
>>> + ope_write_pp(ope, 0x870, 102);
>> What are the magic numbers about ? Please define bit-fields and offsets.
> There are some registers I can't disclose today, which have to be
> configured with working values,
> Similarly to some sensor configuration in media/i2c.
Not really the same thing, all of the offsets in upstream CAMSS and its
CLC are documented. Sensor values are typically upstreamed by people who
don't control the documentation, that is not the case with Qcom
submitting this code upstream now.
Are you guys doing an upstream implementation or not ?
>> Parameters passed in from user-space/libcamera and then translated to
>> registers etc.
> The above fixed settings will not be part of the initial parameters.
>
>>> +}
>>> +
>>> +static void ope_prog_wb(struct ope_dev *ope)
>>> +{
>>> + /* Default white balance config */
>>> + u32 g_gain = OPE_WB(1, 1);
>>> + u32 b_gain = OPE_WB(3, 2);
>>> + u32 r_gain = OPE_WB(3, 2);
>>> +
>>> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(0), g_gain);
>>> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(1), b_gain);
>>> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(2), r_gain);
>>> +
>>> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_MODULE_CFG, OPE_PP_CLC_WB_GAIN_MODULE_CFG_EN);
>>> +}
>> Fixed gains will have to come from real data.
> These gains will indeed need to be configurable, most likely via ISP
> parameters, here, they have been adjusted based on colorbar test
> pattern from imx219 sensors but also tested with real capture.
>
>>> +
>>> +static void ope_prog_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
>>> +{
>>> + struct ope_dev *ope = ctx->ope;
>>> + int i;
>>> +
>>> + dev_dbg(ope->dev, "Context %p - Programming S%u\n", ctx, ope_stripe_index(ctx, stripe));
>>> +
>>> + /* Fetch Engine */
>>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0, stripe->src.format);
>>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE,
>>> + (stripe->src.width << 16) + stripe->src.height);
>>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_ADDR_IMAGE, stripe->src.addr);
>>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_STRIDE, stripe->src.stride);
>>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CCIF_META_DATA,
>>> + FIELD_PREP(OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN, stripe->src.pattern));
>>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CORE_CFG, OPE_BUS_RD_CLIENT_0_CORE_CFG_EN);
>>> +
>>> + /* Write Engines */
>>> + for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
>>> + if (!stripe->dst[i].enabled) {
>>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i), 0);
>>> + continue;
>>> + }
>>> +
>>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_ADDR_IMAGE(i), stripe->dst[i].addr);
>>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_0(i),
>>> + (stripe->dst[i].height << 16) + stripe->dst[i].width);
>>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_1(i), stripe->dst[i].x_init);
>>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_2(i), stripe->dst[i].stride);
>>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_PACKER_CFG(i), stripe->dst[i].format);
>>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i),
>>> + OPE_BUS_WR_CLIENT_CFG_EN + OPE_BUS_WR_CLIENT_CFG_AUTORECOVER);
>>> + }
>>> +
>>> + /* Downscalers */
>>> + for (i = 0; i < OPE_DS_MAX; i++) {
>>> + struct ope_dsc_config *dsc = &stripe->dsc[i];
>>> + u32 base = ope_ds_base[i];
>>> + u32 cfg = 0;
>>> +
>>> + if (dsc->input_width != dsc->output_width) {
>>> + dsc->phase_step_h |= DS_RESOLUTION(dsc->input_width,
>>> + dsc->output_width) << 30;
>>> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN;
>>> + }
>>> +
>>> + if (dsc->input_height != dsc->output_height) {
>>> + dsc->phase_step_v |= DS_RESOLUTION(dsc->input_height,
>>> + dsc->output_height) << 30;
>>> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN;
>>> + }
>>> +
>>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(base), cfg);
>>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(base),
>>> + ((dsc->input_width - 1) << 16) + dsc->input_height - 1);
>>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(base), dsc->phase_step_h);
>>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(base), dsc->phase_step_v);
>>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_CFG(base),
>>> + cfg ? OPE_PP_CLC_DOWNSCALE_MN_CFG_EN : 0);
>>> + }
>>> +}
>> So - this is where the CDM should be used - so that you don't have to do
>> all of these MMIO writes inside of your ISR.
> Indeed, and that also the reason stripes are computed ahead of time,
> so that they can be further 'queued' in a CDM.
>
>> Is that and additional step after the RFC ?
> The current implementation (without CDM) already provides good results
> and performance, so CDM can be viewed as a future enhancement.
That's true but then the number of MMIO writes per ISR is pretty small
right now. You have about 50 writes here.
> As far as I understand, CDM could also be implemented in a generic way
> within CAMSS, since other CAMSS blocks make use of CDM as well.
> This is something we should discuss further.
My concern is even conservatively if each module adds another 10 ?
writes by the time we get to denoising, sharpening, lens shade
correction, those writes could easily look more like 100.
What user-space should submit is well documented data-structures which
then get translated into CDM buffers by the OPE and IFE for the various
bits of the pipeline.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-03-23 12:58 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Loic Poulain
` (2 preceding siblings ...)
2026-03-23 12:58 ` [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node Loic Poulain
@ 2026-03-24 12:54 ` Bryan O'Donoghue
2026-03-24 16:16 ` Loic Poulain
2026-04-05 19:57 ` Laurent Pinchart
4 siblings, 1 reply; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-24 12:54 UTC (permalink / raw)
To: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio
Cc: linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 23/03/2026 12:58, Loic Poulain wrote:
> This first version is intentionally minimalistic. It provides a working
> configuration using a fixed set of static processing parameters, mainly
> to achieve correct and good-quality debayering.
You need the other 50% of the kernel side - the generation of bayer
statistics in the IFE, as well as generation of parameters to feed back
into the OPE - which requires a user-space implementation too, so a lot
of work there too.
I'd also say when we have an ICP we should be using it via the HFI
protocol, thus burying all of the IPE/OPE BPS and CDM complexity in the
firmware.
Understood Agatti has no ICP so you're limited to direct OPE/IFE
register access here. For HFI capable platforms - the majority - HFI is
the way to go.
I'll publish an RFC for Hamoa for that soonish so we can make sure both
coexist.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-24 11:00 ` Bryan O'Donoghue
@ 2026-03-24 15:57 ` Loic Poulain
2026-03-24 21:27 ` Dmitry Baryshkov
` (2 subsequent siblings)
3 siblings, 0 replies; 47+ messages in thread
From: Loic Poulain @ 2026-03-24 15:57 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
Hi Bryan,
On Tue, Mar 24, 2026 at 12:00 PM Bryan O'Donoghue <bod@kernel.org> wrote:
>
> On 23/03/2026 15:31, Loic Poulain wrote:
> >>> +
> >>> +static void ope_prog_bayer2rgb(struct ope_dev *ope)
> >>> +{
> >>> + /* Fixed Settings */
> >>> + ope_write_pp(ope, 0x860, 0x4001);
> >>> + ope_write_pp(ope, 0x868, 128);
> >>> + ope_write_pp(ope, 0x86c, 128 << 20);
> >>> + ope_write_pp(ope, 0x870, 102);
> >> What are the magic numbers about ? Please define bit-fields and offsets.
> > There are some registers I can't disclose today, which have to be
> > configured with working values,
> > Similarly to some sensor configuration in media/i2c.
>
> Not really the same thing, all of the offsets in upstream CAMSS and its
> CLC are documented. Sensor values are typically upstreamed by people who
> don't control the documentation, that is not the case with Qcom
> submitting this code upstream now.
>
> Are you guys doing an upstream implementation or not ?
Yes, but some configuration will be static and non-parametrable, I
will check if we can at least document the layout.
>
> >> Parameters passed in from user-space/libcamera and then translated to
> >> registers etc.
> > The above fixed settings will not be part of the initial parameters.
> >
> >>> +}
> >>> +
> >>> +static void ope_prog_wb(struct ope_dev *ope)
> >>> +{
> >>> + /* Default white balance config */
> >>> + u32 g_gain = OPE_WB(1, 1);
> >>> + u32 b_gain = OPE_WB(3, 2);
> >>> + u32 r_gain = OPE_WB(3, 2);
> >>> +
> >>> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(0), g_gain);
> >>> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(1), b_gain);
> >>> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_WB_CFG(2), r_gain);
> >>> +
> >>> + ope_write_pp(ope, OPE_PP_CLC_WB_GAIN_MODULE_CFG, OPE_PP_CLC_WB_GAIN_MODULE_CFG_EN);
> >>> +}
> >> Fixed gains will have to come from real data.
> > These gains will indeed need to be configurable, most likely via ISP
> > parameters, here, they have been adjusted based on colorbar test
> > pattern from imx219 sensors but also tested with real capture.
> >
> >>> +
> >>> +static void ope_prog_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
> >>> +{
> >>> + struct ope_dev *ope = ctx->ope;
> >>> + int i;
> >>> +
> >>> + dev_dbg(ope->dev, "Context %p - Programming S%u\n", ctx, ope_stripe_index(ctx, stripe));
> >>> +
> >>> + /* Fetch Engine */
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0, stripe->src.format);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE,
> >>> + (stripe->src.width << 16) + stripe->src.height);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_ADDR_IMAGE, stripe->src.addr);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_STRIDE, stripe->src.stride);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CCIF_META_DATA,
> >>> + FIELD_PREP(OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN, stripe->src.pattern));
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CORE_CFG, OPE_BUS_RD_CLIENT_0_CORE_CFG_EN);
> >>> +
> >>> + /* Write Engines */
> >>> + for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
> >>> + if (!stripe->dst[i].enabled) {
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i), 0);
> >>> + continue;
> >>> + }
> >>> +
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_ADDR_IMAGE(i), stripe->dst[i].addr);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_0(i),
> >>> + (stripe->dst[i].height << 16) + stripe->dst[i].width);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_1(i), stripe->dst[i].x_init);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_2(i), stripe->dst[i].stride);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_PACKER_CFG(i), stripe->dst[i].format);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i),
> >>> + OPE_BUS_WR_CLIENT_CFG_EN + OPE_BUS_WR_CLIENT_CFG_AUTORECOVER);
> >>> + }
> >>> +
> >>> + /* Downscalers */
> >>> + for (i = 0; i < OPE_DS_MAX; i++) {
> >>> + struct ope_dsc_config *dsc = &stripe->dsc[i];
> >>> + u32 base = ope_ds_base[i];
> >>> + u32 cfg = 0;
> >>> +
> >>> + if (dsc->input_width != dsc->output_width) {
> >>> + dsc->phase_step_h |= DS_RESOLUTION(dsc->input_width,
> >>> + dsc->output_width) << 30;
> >>> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN;
> >>> + }
> >>> +
> >>> + if (dsc->input_height != dsc->output_height) {
> >>> + dsc->phase_step_v |= DS_RESOLUTION(dsc->input_height,
> >>> + dsc->output_height) << 30;
> >>> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN;
> >>> + }
> >>> +
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(base), cfg);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(base),
> >>> + ((dsc->input_width - 1) << 16) + dsc->input_height - 1);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(base), dsc->phase_step_h);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(base), dsc->phase_step_v);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_CFG(base),
> >>> + cfg ? OPE_PP_CLC_DOWNSCALE_MN_CFG_EN : 0);
> >>> + }
> >>> +}
> >> So - this is where the CDM should be used - so that you don't have to do
> >> all of these MMIO writes inside of your ISR.
> > Indeed, and that also the reason stripes are computed ahead of time,
> > so that they can be further 'queued' in a CDM.
> >
> >> Is that and additional step after the RFC ?
> > The current implementation (without CDM) already provides good results
> > and performance, so CDM can be viewed as a future enhancement.
>
> That's true but then the number of MMIO writes per ISR is pretty small
> right now. You have about 50 writes here.
Right, it will increase significantly. The idea was to start with a
version that omits CDM so that we can focus on the other functional
aspects of the ISP for now.
>
> > As far as I understand, CDM could also be implemented in a generic way
> > within CAMSS, since other CAMSS blocks make use of CDM as well.
> > This is something we should discuss further.
> My concern is even conservatively if each module adds another 10 ?
> writes by the time we get to denoising, sharpening, lens shade
> correction, those writes could easily look more like 100.
>
> What user-space should submit is well documented data-structures which
> then get translated into CDM buffers by the OPE and IFE for the various
> bits of the pipeline.
Yes it will.
Regards,
Loic
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-03-24 12:54 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Bryan O'Donoghue
@ 2026-03-24 16:16 ` Loic Poulain
2026-04-05 19:48 ` Laurent Pinchart
0 siblings, 1 reply; 47+ messages in thread
From: Loic Poulain @ 2026-03-24 16:16 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
On Tue, Mar 24, 2026 at 1:54 PM Bryan O'Donoghue <bod@kernel.org> wrote:
>
> On 23/03/2026 12:58, Loic Poulain wrote:
> > This first version is intentionally minimalistic. It provides a working
> > configuration using a fixed set of static processing parameters, mainly
> > to achieve correct and good-quality debayering.
>
> You need the other 50% of the kernel side - the generation of bayer
> statistics in the IFE, as well as generation of parameters to feed back
> into the OPE - which requires a user-space implementation too, so a lot
> of work there too.
>
> I'd also say when we have an ICP we should be using it via the HFI
> protocol, thus burying all of the IPE/OPE BPS and CDM complexity in the
> firmware.
>
> Understood Agatti has no ICP so you're limited to direct OPE/IFE
> register access here. For HFI capable platforms - the majority - HFI is
> the way to go.
Fully agree, this is exactly the point where we should sync and work
together on a proper solution.
As a follow‑up to this RFC, I already have several ongoing pieces that
aim to generalize the CAMSS ISP support, and I’d very much like to
discuss them with you:
- camss-isp-m2m: Generic M2M scheduling framework handling job dispatch
based on buffer readiness and enabled endpoints (frame input, output,
statistics, parameters).
- camss-isp-pipeline: Helper layer to construct complex media/ISP graphs
from a structural description (endpoints, links, etc.).
- camss-isp-params: Generic helper for handling ISP parameter buffers
(using v4l2-isp-params).
- camss-isp-stats: Generic helper framework for CAMSS statistics devices.
- camss-(isp-)ope: OPE‑specific logic only (register configuration, IRQ
handling, parameter‑to‑register translation).
This approach should significantly reduce the amount of
platform‑specific code required for future ISP blocks. It should also
allow you to integrate a camss-isp-hamoa (or similar) backend, or even
a camss-isp-hfi implementation for the M2M functions, without
duplicating the infrastructure.
So yes, let’s sync and agree on a shared/open development model and an
overall direction, possibly even a common tree, to ensure we stay
aligned and can collaborate effectively.
>
> I'll publish an RFC for Hamoa for that soonish so we can make sure both
> coexist.
Ack.
Regards,
Loic
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-24 11:00 ` Bryan O'Donoghue
2026-03-24 15:57 ` Loic Poulain
@ 2026-03-24 21:27 ` Dmitry Baryshkov
2026-03-26 12:06 ` johannes.goede
2026-03-25 9:30 ` Konrad Dybcio
2026-04-05 20:11 ` Laurent Pinchart
3 siblings, 1 reply; 47+ messages in thread
From: Dmitry Baryshkov @ 2026-03-24 21:27 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio,
linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On Tue, Mar 24, 2026 at 11:00:21AM +0000, Bryan O'Donoghue wrote:
> On 23/03/2026 15:31, Loic Poulain wrote:
> > > > +
> > > > +static void ope_prog_bayer2rgb(struct ope_dev *ope)
> > > > +{
> > > > + /* Fixed Settings */
> > > > + ope_write_pp(ope, 0x860, 0x4001);
> > > > + ope_write_pp(ope, 0x868, 128);
> > > > + ope_write_pp(ope, 0x86c, 128 << 20);
> > > > + ope_write_pp(ope, 0x870, 102);
> > > What are the magic numbers about ? Please define bit-fields and offsets.
> > There are some registers I can't disclose today, which have to be
> > configured with working values,
> > Similarly to some sensor configuration in media/i2c.
>
> Not really the same thing, all of the offsets in upstream CAMSS and its CLC
> are documented. Sensor values are typically upstreamed by people who don't
> control the documentation, that is not the case with Qcom submitting this
> code upstream now.
>
> Are you guys doing an upstream implementation or not ?
And there are enough upstream implementations, even coming from the
vendors, without (or with the minimal) register specifications.
>
> > As far as I understand, CDM could also be implemented in a generic way
> > within CAMSS, since other CAMSS blocks make use of CDM as well.
> > This is something we should discuss further.
> My concern is even conservatively if each module adds another 10 ? writes by
> the time we get to denoising, sharpening, lens shade correction, those
> writes could easily look more like 100.
>
> What user-space should submit is well documented data-structures which then
> get translated into CDM buffers by the OPE and IFE for the various bits of
> the pipeline.
I hope here you have accent on the well-documented (ideally some kind of
the vendor-independent ABI).
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-24 11:00 ` Bryan O'Donoghue
2026-03-24 15:57 ` Loic Poulain
2026-03-24 21:27 ` Dmitry Baryshkov
@ 2026-03-25 9:30 ` Konrad Dybcio
2026-04-05 20:11 ` Laurent Pinchart
3 siblings, 0 replies; 47+ messages in thread
From: Konrad Dybcio @ 2026-03-25 9:30 UTC (permalink / raw)
To: Bryan O'Donoghue, Loic Poulain
Cc: vladimir.zapolskiy, laurent.pinchart, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
On 3/24/26 12:00 PM, Bryan O'Donoghue wrote:
> On 23/03/2026 15:31, Loic Poulain wrote:
[...]
>>> So - this is where the CDM should be used - so that you don't have to do
>>> all of these MMIO writes inside of your ISR.
>> Indeed, and that also the reason stripes are computed ahead of time,
>> so that they can be further 'queued' in a CDM.
>>
>>> Is that and additional step after the RFC ?
>> The current implementation (without CDM) already provides good results
>> and performance, so CDM can be viewed as a future enhancement.
>
> That's true but then the number of MMIO writes per ISR is pretty small right now. You have about 50 writes here.
>
>> As far as I understand, CDM could also be implemented in a generic way
>> within CAMSS, since other CAMSS blocks make use of CDM as well.
>> This is something we should discuss further.
> My concern is even conservatively if each module adds another 10 ? writes by the time we get to denoising, sharpening, lens shade correction, those writes could easily look more like 100.
>
> What user-space should submit is well documented data-structures which then get translated into CDM buffers by the OPE and IFE for the various bits of the pipeline.
Would simply switching to a threaded irq handler resolve this?
Konrad
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-24 21:27 ` Dmitry Baryshkov
@ 2026-03-26 12:06 ` johannes.goede
2026-03-30 11:37 ` Dmitry Baryshkov
0 siblings, 1 reply; 47+ messages in thread
From: johannes.goede @ 2026-03-26 12:06 UTC (permalink / raw)
To: Dmitry Baryshkov, Bryan O'Donoghue
Cc: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio,
linux-media, linux-arm-msm, devicetree, linux-kernel, mchehab
Hi Dmitry,
On 24-Mar-26 22:27, Dmitry Baryshkov wrote:
> On Tue, Mar 24, 2026 at 11:00:21AM +0000, Bryan O'Donoghue wrote:
>> On 23/03/2026 15:31, Loic Poulain wrote:
<snip>
>>> As far as I understand, CDM could also be implemented in a generic way
>>> within CAMSS, since other CAMSS blocks make use of CDM as well.
>>> This is something we should discuss further.
>> My concern is even conservatively if each module adds another 10 ? writes by
>> the time we get to denoising, sharpening, lens shade correction, those
>> writes could easily look more like 100.
>>
>> What user-space should submit is well documented data-structures which then
>> get translated into CDM buffers by the OPE and IFE for the various bits of
>> the pipeline.
>
> I hope here you have accent on the well-documented (ideally some kind of
> the vendor-independent ABI).
The plan is to use the new extensible generic v4l2 ISP parameters
API for this:
https://docs.kernel.org/6.19/driver-api/media/v4l2-isp.html
What this does is basically divide the parameter buffer (which
is just a mmap-able bunch of bytes) into variable sized packets/
blocks with each block having a small header, with a type field.
And then we can have say CCMv1 type for the CCM on the OPE and
if with some future hardware the format of the CCM (say different
fixpoint format) ever changes we can simply define a new CCMv2
and then the parameter buffer can be filled with different
versions of different parameter blocks depending on the hw.
And on the kernel side there are helpers to parse this, you
simply pass a list of the types the current hw supports
+ per type data-callback functions.
And then your CCMv1 or CCMv2 helper will get called with
the matching parameter-data.
So this way we can easily add new hw support without needing
to change the existing API, we can simply extend the list
of parameter types as needed.
Regards,
Hans
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-26 12:06 ` johannes.goede
@ 2026-03-30 11:37 ` Dmitry Baryshkov
2026-03-30 13:46 ` johannes.goede
0 siblings, 1 reply; 47+ messages in thread
From: Dmitry Baryshkov @ 2026-03-30 11:37 UTC (permalink / raw)
To: johannes.goede
Cc: Bryan O'Donoghue, Loic Poulain, vladimir.zapolskiy,
laurent.pinchart, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
On Thu, Mar 26, 2026 at 01:06:59PM +0100, johannes.goede@oss.qualcomm.com wrote:
> Hi Dmitry,
>
> On 24-Mar-26 22:27, Dmitry Baryshkov wrote:
> > On Tue, Mar 24, 2026 at 11:00:21AM +0000, Bryan O'Donoghue wrote:
> >> On 23/03/2026 15:31, Loic Poulain wrote:
>
> <snip>
>
> >>> As far as I understand, CDM could also be implemented in a generic way
> >>> within CAMSS, since other CAMSS blocks make use of CDM as well.
> >>> This is something we should discuss further.
> >> My concern is even conservatively if each module adds another 10 ? writes by
> >> the time we get to denoising, sharpening, lens shade correction, those
> >> writes could easily look more like 100.
> >>
> >> What user-space should submit is well documented data-structures which then
> >> get translated into CDM buffers by the OPE and IFE for the various bits of
> >> the pipeline.
> >
> > I hope here you have accent on the well-documented (ideally some kind of
> > the vendor-independent ABI).
>
> The plan is to use the new extensible generic v4l2 ISP parameters
> API for this:
>
> https://docs.kernel.org/6.19/driver-api/media/v4l2-isp.html
>
> What this does is basically divide the parameter buffer (which
> is just a mmap-able bunch of bytes) into variable sized packets/
> blocks with each block having a small header, with a type field.
>
> And then we can have say CCMv1 type for the CCM on the OPE and
> if with some future hardware the format of the CCM (say different
> fixpoint format) ever changes we can simply define a new CCMv2
> and then the parameter buffer can be filled with different
> versions of different parameter blocks depending on the hw.
>
> And on the kernel side there are helpers to parse this, you
> simply pass a list of the types the current hw supports
> + per type data-callback functions.
>
> And then your CCMv1 or CCMv2 helper will get called with
> the matching parameter-data.
This leads to userspace having to know exact format for each hardware
version, which is not nice. At the very least it should be possible to
accept CCMv1 buffers and covert them to CCMv2 when required.
>
> So this way we can easily add new hw support without needing
> to change the existing API, we can simply extend the list
> of parameter types as needed.
>
> Regards,
>
> Hans
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 11:37 ` Dmitry Baryshkov
@ 2026-03-30 13:46 ` johannes.goede
2026-03-30 14:11 ` Bryan O'Donoghue
0 siblings, 1 reply; 47+ messages in thread
From: johannes.goede @ 2026-03-30 13:46 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: Bryan O'Donoghue, Loic Poulain, vladimir.zapolskiy,
laurent.pinchart, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
Hi,
On 30-Mar-26 13:37, Dmitry Baryshkov wrote:
> On Thu, Mar 26, 2026 at 01:06:59PM +0100, johannes.goede@oss.qualcomm.com wrote:
>> Hi Dmitry,
>>
>> On 24-Mar-26 22:27, Dmitry Baryshkov wrote:
>>> On Tue, Mar 24, 2026 at 11:00:21AM +0000, Bryan O'Donoghue wrote:
>>>> On 23/03/2026 15:31, Loic Poulain wrote:
>>
>> <snip>
>>
>>>>> As far as I understand, CDM could also be implemented in a generic way
>>>>> within CAMSS, since other CAMSS blocks make use of CDM as well.
>>>>> This is something we should discuss further.
>>>> My concern is even conservatively if each module adds another 10 ? writes by
>>>> the time we get to denoising, sharpening, lens shade correction, those
>>>> writes could easily look more like 100.
>>>>
>>>> What user-space should submit is well documented data-structures which then
>>>> get translated into CDM buffers by the OPE and IFE for the various bits of
>>>> the pipeline.
>>>
>>> I hope here you have accent on the well-documented (ideally some kind of
>>> the vendor-independent ABI).
>>
>> The plan is to use the new extensible generic v4l2 ISP parameters
>> API for this:
>>
>> https://docs.kernel.org/6.19/driver-api/media/v4l2-isp.html
>>
>> What this does is basically divide the parameter buffer (which
>> is just a mmap-able bunch of bytes) into variable sized packets/
>> blocks with each block having a small header, with a type field.
>>
>> And then we can have say CCMv1 type for the CCM on the OPE and
>> if with some future hardware the format of the CCM (say different
>> fixpoint format) ever changes we can simply define a new CCMv2
>> and then the parameter buffer can be filled with different
>> versions of different parameter blocks depending on the hw.
>>
>> And on the kernel side there are helpers to parse this, you
>> simply pass a list of the types the current hw supports
>> + per type data-callback functions.
>>
>> And then your CCMv1 or CCMv2 helper will get called with
>> the matching parameter-data.
>
> This leads to userspace having to know exact format for each hardware
> version, which is not nice. At the very least it should be possible to
> accept CCMv1 buffers and covert them to CCMv2 when required.
Yes, but a new ISP may also have a different pipeline altogether
with e.g. more then one preview/viewfinder output vs one viewfinder
output for current hw, etc.
Or the raw-bayer hw-statistics format may change, which would also
require libcamera updates.
Generally speaking the development model for MIPI cameras with
hardware ISP with libcamera is that enabling new hardware will
require both kernel updates as well as libcamera updates.
ISPs are simply so complex that it has been decided that having
a unified API where old userspace will "just work" with newer
hw as long as the kernel has support for the new hw is simply
not realistically doable. There are too many possible topologies
tweakable parameters, etc.
And even if such a thing were possible (1) it would lead to an API
mostly limited to supporting some sort of shared lowest common
denominator feature set which is not what we want.
The purpose of the extensible generic v4l2 ISP parameters API is
to allow having a single kernel driver + API for multiple generations
of hardware-ISP, without breaking the userspace API for the older
generations. As well as having a single set of support code
for that supporting multiple generations on the libcamera side.
But once the kernel driver grows support for a new ISP generation
then it is expected for the libcamera counter part to also need
at least some updates to support the new generation.
Regards,
Hans
1) It might be possible if you throw a whole lot of manpower at
it ...
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 13:46 ` johannes.goede
@ 2026-03-30 14:11 ` Bryan O'Donoghue
2026-03-30 14:27 ` johannes.goede
2026-03-30 18:55 ` Dmitry Baryshkov
0 siblings, 2 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-30 14:11 UTC (permalink / raw)
To: johannes.goede, Dmitry Baryshkov
Cc: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio,
linux-media, linux-arm-msm, devicetree, linux-kernel, mchehab
On 30/03/2026 14:46, johannes.goede@oss.qualcomm.com wrote:
>>> And then your CCMv1 or CCMv2 helper will get called with
>>> the matching parameter-data.
>> This leads to userspace having to know exact format for each hardware
>> version, which is not nice. At the very least it should be possible to
>> accept CCMv1 buffers and covert them to CCMv2 when required.
> Yes, but a new ISP may also have a different pipeline altogether
> with e.g. more then one preview/viewfinder output vs one viewfinder
> output for current hw, etc.
My scoping on HFI shows that the IQ structures between Kona and later
versions have pretty stable data-structures.
It might be worthwhile for the non-HFI version to implement those
structures.
I keep mentioning CDM. Its also possible to construct the buffer in the
format the CDM would require and hand that from user-space into the kernel.
That would save alot of overhead translating from one format to another.
That's another reason I bring up CDM again and again. We probably don't
want to fix to the wrong format for OPE, introduce the CDM and then find
we have to map from one format to another for large and complex data
over and over again for each frame or every N frames.
TBH I think the CDM should happen for this system and in that vein is
there any reason not to pack the data in the order the CDM will want ?
So probably in fact IQ structs are not the right thing for OPE+IFE.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 14:11 ` Bryan O'Donoghue
@ 2026-03-30 14:27 ` johannes.goede
2026-03-30 14:32 ` Bryan O'Donoghue
2026-03-30 18:55 ` Dmitry Baryshkov
1 sibling, 1 reply; 47+ messages in thread
From: johannes.goede @ 2026-03-30 14:27 UTC (permalink / raw)
To: Bryan O'Donoghue, Dmitry Baryshkov
Cc: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio,
linux-media, linux-arm-msm, devicetree, linux-kernel, mchehab
Hi,
On 30-Mar-26 16:11, Bryan O'Donoghue wrote:
> On 30/03/2026 14:46, johannes.goede@oss.qualcomm.com wrote:
>>>> And then your CCMv1 or CCMv2 helper will get called with
>>>> the matching parameter-data.
>>> This leads to userspace having to know exact format for each hardware
>>> version, which is not nice. At the very least it should be possible to
>>> accept CCMv1 buffers and covert them to CCMv2 when required.
>> Yes, but a new ISP may also have a different pipeline altogether
>> with e.g. more then one preview/viewfinder output vs one viewfinder
>> output for current hw, etc.
>
> My scoping on HFI shows that the IQ structures between Kona and later versions have pretty stable data-structures.
>
> It might be worthwhile for the non-HFI version to implement those structures.
Maybe, it depends on if they are really 100% the same
various IQ parameters are in various different fixed-point
formats. I don't think we want to be converting from
one precision fixed-point to another precision fixed-point
in the kernel.
> I keep mentioning CDM. Its also possible to construct the buffer in the format the CDM would require and hand that from user-space into the kernel.
I believe the CDM take register addresses + values to setup
the OPE for the next stripe to process ?
Directly exporting a format which takes register addresses
+ values to userspace does not sound like a good idea.
If you look at the current structure of the OPE driver
it already keeps tracks if per stripe settings, only atm
it programs those directly on the stripe completion IRQ
rather then setting up the CDM. Generating the CDM settings
from that data should be straight forward.
I really do not believe that such low-level details belong
in the userspace API in any way.
If anything whether we are using the CDM or directly doing
the next stripe programming from the IRQ handler should
be completely transparent to userspace.
>
> That would save alot of overhead translating from one format to another.
>
> That's another reason I bring up CDM again and again. We probably don't want to fix to the wrong format for OPE, introduce the CDM and then find we have to map from one format to another for large and complex data over and over again for each frame or every N frames.
CDM is a much lower-level API then what is expected from
a media-controller centric V4L2 driver. Basically the OPE
driver will export:
* media-controller node
* bunch of subdevs + routing between them
* /dev/video# videobuffer queue for raw input frames
* /dev/video# parameter queue for extensible generic v4l2 ISP parameters buffers (with qcom specific contents)
* /dev/video# videobuffer "video" output queue for processed frames
* /dev/video# videobuffer "viewfinder" output queue for "extra" downscaled processed frames
No statistics since these come from the CSI2 bits (VFE PIX)
on Agetti.
This is is basically the current consensus what a modern
hardware camera ISP driver should look like to userspace.
Anything lower level then this should be abstracted by
the kernel.
Note both output nodes can probably downscale, but
the viewfinder one can do an extra downscaling step
on top in case userspace wants 2 streams one higher res
to record and a lower-res to show on screen.
Regards,
Hans
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 14:27 ` johannes.goede
@ 2026-03-30 14:32 ` Bryan O'Donoghue
2026-03-30 18:59 ` Dmitry Baryshkov
2026-03-30 19:07 ` Loic Poulain
0 siblings, 2 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-30 14:32 UTC (permalink / raw)
To: johannes.goede, Dmitry Baryshkov
Cc: Loic Poulain, vladimir.zapolskiy, laurent.pinchart,
kieran.bingham, robh, krzk+dt, andersson, konradybcio,
linux-media, linux-arm-msm, devicetree, linux-kernel, mchehab
On 30/03/2026 15:27, johannes.goede@oss.qualcomm.com wrote:
>> That's another reason I bring up CDM again and again. We probably don't want to fix to the wrong format for OPE, introduce the CDM and then find we have to map from one format to another for large and complex data over and over again for each frame or every N frames.
> CDM is a much lower-level API then what is expected from
> a media-controller centric V4L2 driver. Basically the OPE
> driver will export:
My concern is about wrappering one thing inside of another thing and
then stuffing it again back into CDM and doing the same on the way out.
There are already 50 MMIO writes in the OPE ISR, I don't believe it is
sustainable to keep adding MMIO into that.
I'm aware of a project in qcom that did something with making the CDM
format in libcamera and handed that off to kernel, recommend looking
into that.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 14:11 ` Bryan O'Donoghue
2026-03-30 14:27 ` johannes.goede
@ 2026-03-30 18:55 ` Dmitry Baryshkov
2026-03-30 22:51 ` Bryan O'Donoghue
2026-04-05 20:14 ` Laurent Pinchart
1 sibling, 2 replies; 47+ messages in thread
From: Dmitry Baryshkov @ 2026-03-30 18:55 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: johannes.goede, Loic Poulain, vladimir.zapolskiy,
laurent.pinchart, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
On Mon, Mar 30, 2026 at 03:11:58PM +0100, Bryan O'Donoghue wrote:
> On 30/03/2026 14:46, johannes.goede@oss.qualcomm.com wrote:
> > > > And then your CCMv1 or CCMv2 helper will get called with
> > > > the matching parameter-data.
> > > This leads to userspace having to know exact format for each hardware
> > > version, which is not nice. At the very least it should be possible to
> > > accept CCMv1 buffers and covert them to CCMv2 when required.
> > Yes, but a new ISP may also have a different pipeline altogether
> > with e.g. more then one preview/viewfinder output vs one viewfinder
> > output for current hw, etc.
>
> My scoping on HFI shows that the IQ structures between Kona and later
> versions have pretty stable data-structures.
>
> It might be worthwhile for the non-HFI version to implement those
> structures.
>
> I keep mentioning CDM. Its also possible to construct the buffer in the
> format the CDM would require and hand that from user-space into the kernel.
>
> That would save alot of overhead translating from one format to another.
>
> That's another reason I bring up CDM again and again. We probably don't want
> to fix to the wrong format for OPE, introduce the CDM and then find we have
> to map from one format to another for large and complex data over and over
> again for each frame or every N frames.
>
> TBH I think the CDM should happen for this system and in that vein is there
> any reason not to pack the data in the order the CDM will want ?
This sounds like the most horrible idea: letting userspace directly
program any registers in a way that is not visible to the kernel.
>
> So probably in fact IQ structs are not the right thing for OPE+IFE.
>
> ---
> bod
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 14:32 ` Bryan O'Donoghue
@ 2026-03-30 18:59 ` Dmitry Baryshkov
2026-03-30 19:07 ` Loic Poulain
1 sibling, 0 replies; 47+ messages in thread
From: Dmitry Baryshkov @ 2026-03-30 18:59 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: johannes.goede, Loic Poulain, vladimir.zapolskiy,
laurent.pinchart, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
On Mon, Mar 30, 2026 at 03:32:55PM +0100, Bryan O'Donoghue wrote:
> On 30/03/2026 15:27, johannes.goede@oss.qualcomm.com wrote:
> > > That's another reason I bring up CDM again and again. We probably don't want to fix to the wrong format for OPE, introduce the CDM and then find we have to map from one format to another for large and complex data over and over again for each frame or every N frames.
> > CDM is a much lower-level API then what is expected from
> > a media-controller centric V4L2 driver. Basically the OPE
> > driver will export:
>
> My concern is about wrappering one thing inside of another thing and then
> stuffing it again back into CDM and doing the same on the way out.
>
> There are already 50 MMIO writes in the OPE ISR, I don't believe it is
> sustainable to keep adding MMIO into that.
That's why I asked about the ABI. If we have a format for OPE
programming, we can reuse it for CDM. If we don't, we have to open the
wormhole. That is unless we make OPE driver utilize CDM instead of
writing registers through MMIO (and instead of userspace directly
programming the CDM).
>
> I'm aware of a project in qcom that did something with making the CDM format
> in libcamera and handed that off to kernel, recommend looking into that.
>
> ---
> bod
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 14:32 ` Bryan O'Donoghue
2026-03-30 18:59 ` Dmitry Baryshkov
@ 2026-03-30 19:07 ` Loic Poulain
2026-04-05 20:23 ` Laurent Pinchart
1 sibling, 1 reply; 47+ messages in thread
From: Loic Poulain @ 2026-03-30 19:07 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: johannes.goede, Dmitry Baryshkov, vladimir.zapolskiy,
laurent.pinchart, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
On Mon, Mar 30, 2026 at 4:33 PM Bryan O'Donoghue <bod@kernel.org> wrote:
>
> On 30/03/2026 15:27, johannes.goede@oss.qualcomm.com wrote:
> >> That's another reason I bring up CDM again and again. We probably don't want to fix to the wrong format for OPE, introduce the CDM and then find we have to map from one format to another for large and complex data over and over again for each frame or every N frames.
> > CDM is a much lower-level API then what is expected from
> > a media-controller centric V4L2 driver. Basically the OPE
> > driver will export:
>
> My concern is about wrappering one thing inside of another thing and
> then stuffing it again back into CDM and doing the same on the way out.
I think there will always be some level of copying involved. That
said, we can pre‑build the CDM sequence in the drivers and only update
the variable values, which should avoid significant overhead.
If we start handling CDM formats directly on the user side, it would
require exposing a lot of low‑level knowledge there (such as register
layouts and offsets), and that would diverge from how other ISP
implementations are structured. I’m concerned this would increase
complexity and reduce portability.
> There are already 50 MMIO writes in the OPE ISR, I don't believe it is
> sustainable to keep adding MMIO into that.
Yes, I understand the concern. From our testing so far, however, this
has not shown to be an issue. In addition, a full reconfiguration
would only happen in specific cases, such as on explicit full
configuration changes or during context switching. We can certainly
look at implementing CDM, but at this stage it didn't seem to bring
significant benefits, so I prefered to focus on other functional
aspects, and revisit CDM once there is a clearer need, measurable
gain, or if it becomes part of the uAPI as discussed here.
> I'm aware of a project in qcom that did something with making the CDM
> format in libcamera and handed that off to kernel, recommend looking
> into that.
I will, thanks, I'm however, concerned about how acceptable this
approach would be to the wider community and to the maintainers.
Regards,
Loic
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 18:55 ` Dmitry Baryshkov
@ 2026-03-30 22:51 ` Bryan O'Donoghue
2026-03-31 8:11 ` Konrad Dybcio
2026-04-05 20:14 ` Laurent Pinchart
1 sibling, 1 reply; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-03-30 22:51 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: johannes.goede, Loic Poulain, vladimir.zapolskiy,
laurent.pinchart, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
On 30/03/2026 19:55, Dmitry Baryshkov wrote:
> This sounds like the most horrible idea: letting userspace directly
> program any registers in a way that is not visible to the kernel.
No I'm wondering if there is a way to construct the basic format in
user-space so it doesn't need to be re-interpreted stuffed/unstuffed.
As mentioned I believe there is a defunct qcom project which did/does
just that, not sure why that hasn't been investigated/developed.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 22:51 ` Bryan O'Donoghue
@ 2026-03-31 8:11 ` Konrad Dybcio
0 siblings, 0 replies; 47+ messages in thread
From: Konrad Dybcio @ 2026-03-31 8:11 UTC (permalink / raw)
To: Bryan O'Donoghue, Dmitry Baryshkov
Cc: johannes.goede, Loic Poulain, vladimir.zapolskiy,
laurent.pinchart, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
On 3/31/26 12:51 AM, Bryan O'Donoghue wrote:
> On 30/03/2026 19:55, Dmitry Baryshkov wrote:
>> This sounds like the most horrible idea: letting userspace directly
>> program any registers in a way that is not visible to the kernel.
>
> No I'm wondering if there is a way to construct the basic format in user-space so it doesn't need to be re-interpreted stuffed/unstuffed.
>
> As mentioned I believe there is a defunct qcom project which did/does just that, not sure why that hasn't been investigated/developed.
I believe this isn't a great idea since the format will at one point be
platform-dependent (I think it may be already) and one will have to teach
_all_ of the userspace implementations about all of these specifics
Unless I'm missing the bigger picture, we're not talking about super
large amounts of data that would need to be slightly shuffled around
Konrad
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-03-24 16:16 ` Loic Poulain
@ 2026-04-05 19:48 ` Laurent Pinchart
2026-04-05 19:55 ` Bryan O'Donoghue
2026-04-06 13:22 ` Loic Poulain
0 siblings, 2 replies; 47+ messages in thread
From: Laurent Pinchart @ 2026-04-05 19:48 UTC (permalink / raw)
To: Loic Poulain
Cc: Bryan O'Donoghue, vladimir.zapolskiy, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
On Tue, Mar 24, 2026 at 05:16:21PM +0100, Loic Poulain wrote:
> On Tue, Mar 24, 2026 at 1:54 PM Bryan O'Donoghue wrote:
> > On 23/03/2026 12:58, Loic Poulain wrote:
> > > This first version is intentionally minimalistic. It provides a working
> > > configuration using a fixed set of static processing parameters, mainly
> > > to achieve correct and good-quality debayering.
> >
> > You need the other 50% of the kernel side - the generation of bayer
> > statistics in the IFE, as well as generation of parameters to feed back
> > into the OPE - which requires a user-space implementation too, so a lot
> > of work there too.
> >
> > I'd also say when we have an ICP we should be using it via the HFI
> > protocol, thus burying all of the IPE/OPE BPS and CDM complexity in the
> > firmware.
> >
> > Understood Agatti has no ICP so you're limited to direct OPE/IFE
> > register access here. For HFI capable platforms - the majority - HFI is
> > the way to go.
>
> Fully agree, this is exactly the point where we should sync and work
> together on a proper solution.
I don't necessarily agree with that. There are pros and cons for using
HFI on platforms that have an ICP. If correctly written, a firmware can
improve the throughput in multi-camera use cases by reprogramming the
time-multiplexed OPE faster. On the other hand, in use cases that don't
require pushing the platform to its limits, dealing with a closed-source
firmware often causes lots of issues.
We should aim at supporting both direct ISP access and HFI with the same
userspace API, even on a single platform. Which option to start with is
an open question that we should discuss.
> As a follow‑up to this RFC, I already have several ongoing pieces that
> aim to generalize the CAMSS ISP support, and I’d very much like to
> discuss them with you:
>
> - camss-isp-m2m: Generic M2M scheduling framework handling job dispatch
> based on buffer readiness and enabled endpoints (frame input, output,
> statistics, parameters).
This should be generic, not limited to camss. v4l2-isp is a good
candidate.
> - camss-isp-pipeline: Helper layer to construct complex media/ISP graphs
> from a structural description (endpoints, links, etc.).
That also doesn't seem specific to camss.
> - camss-isp-params: Generic helper for handling ISP parameter buffers
> (using v4l2-isp-params).
I'm curious to know what camss-specific helpers you envision there.
> - camss-isp-stats: Generic helper framework for CAMSS statistics devices.
Same.
> - camss-(isp-)ope: OPE‑specific logic only (register configuration, IRQ
> handling, parameter‑to‑register translation).
>
> This approach should significantly reduce the amount of
> platform‑specific code required for future ISP blocks. It should also
> allow you to integrate a camss-isp-hamoa (or similar) backend, or even
> a camss-isp-hfi implementation for the M2M functions, without
> duplicating the infrastructure.
>
> So yes, let’s sync and agree on a shared/open development model and an
> overall direction, possibly even a common tree, to ensure we stay
> aligned and can collaborate effectively.
Let's schedule a call to kickstart those discussions. Many people are on
Easter vacation this week, next week could be a good candidate.
> > I'll publish an RFC for Hamoa for that soonish so we can make sure both
> > coexist.
>
> Ack.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-04-05 19:48 ` Laurent Pinchart
@ 2026-04-05 19:55 ` Bryan O'Donoghue
2026-04-05 20:47 ` Laurent Pinchart
2026-04-06 13:22 ` Loic Poulain
1 sibling, 1 reply; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-04-05 19:55 UTC (permalink / raw)
To: Laurent Pinchart, Loic Poulain
Cc: vladimir.zapolskiy, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
On 05/04/2026 20:48, Laurent Pinchart wrote:
> I don't necessarily agree with that. There are pros and cons for using
> HFI on platforms that have an ICP. If correctly written, a firmware can
> improve the throughput in multi-camera use cases by reprogramming the
> time-multiplexed OPE faster. On the other hand, in use cases that don't
> require pushing the platform to its limits, dealing with a closed-source
> firmware often causes lots of issues.
>
> We should aim at supporting both direct ISP access and HFI with the same
> userspace API, even on a single platform. Which option to start with is
> an open question that we should discuss.
I think - for IPE and BPS ICP/HFI is the way to go.
However thinking about it - inline pixel processing IPP inside of the
IFE is superior to BPS/IPE for virtually every scenario i.e. why deliver
a frame to user-space and then submit it directly to BPS via CDM or via
a firmware interface HFI, if you can do the same processing in the IFE -
which on the majority of qcom platforms, you can.
Agatti is an outlier in that sense.
So actually I've shifted my focus on Hamoa to IFE/IPP.
You still BTW do want HFI for BPS/IPE - but to get 3a going on the vast
majority of qcom platforms - you want the PIX/IPP path in the IFE.
OTOH if you want to do offline bayer processing - taking say a saved
file from the filesystem - then BPS/IPE is the way to do it and IMO HFI
is the way to do that.
But ICP/BPS/IPE is a nice to have.
I realise that's a word-soup of TLAs but yeah, TL;DR IFE/IPP is the way
to go on !Agatti and once we get a nice 3a loop going there a fun
side-project would be offline bayer processing via HFI.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-03-23 12:58 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Loic Poulain
` (3 preceding siblings ...)
2026-03-24 12:54 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Bryan O'Donoghue
@ 2026-04-05 19:57 ` Laurent Pinchart
4 siblings, 0 replies; 47+ messages in thread
From: Laurent Pinchart @ 2026-04-05 19:57 UTC (permalink / raw)
To: Loic Poulain
Cc: bod, vladimir.zapolskiy, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
johannes.goede, mchehab
Hi Loic,
I'm really happy to see this on the list :-)
On Mon, Mar 23, 2026 at 01:58:21PM +0100, Loic Poulain wrote:
> This RFC series introduces initial support for the Qualcomm CAMSS
> Offline Processing Engine (OPE), as found on Agatti-based platforms.
> Boards such as Arduino UNO-Q use this SoC family and will benefit
> from hardware-assisted image processing enabled by this work.
>
> This represents the first step toward enabling image processing beyond
> raw capture on Qualcomm platforms by using hardware blocks for
> operations such as debayering, 3A, and scaling.
I assume you mean colour gains instead of 3A, based on what I can see in
the driver. I'm looking forward to hardware support for the rest of the
3A :-)
> The OPE sits outside the live capture pipeline. It operates on frames
> fetched from system memory and writes processed results back to memory.
> Because of this design, the OPE is not tied to any specific capture
> interface: frames may come from CAMSS RDI or PIX paths, or from any
> other producer capable of providing memory-backed buffers.
>
> The hardware can sustain up to 580 megapixels per second, which is
> sufficient to process a 10MPix stream at 60 fps or to handle four
> parallel 2MPix (HD) streams at 60 fps.
Isn't 10 MPix/frame * 60 fps = 600 MPix/s, higher than 580 MPix/s ?
> The initial driver implementation relies on the V4L2 m2m framework
> to keep the design simple while already enabling practical offline
> processing workflows. This model also provides time-sharing across
> multiple contexts through its built-in scheduling.
I understand this decision, but that will need to change. In order to
enable support for more ISP processing blocks, we will need to introduce
parameter buffers. The rkisp1 and mali-c55 drivers are two examples of
how it can be done. If you need any help, please don't hesitate to reach
out.
> This first version is intentionally minimalistic. It provides a working
> configuration using a fixed set of static processing parameters, mainly
> to achieve correct and good-quality debayering.
>
> Support for more advanced use-cases (dynamic parameters, statistics
> outputs, additional data endpoints) will require evolving the driver
> model beyond a pure m2m design. This may involve either moving away
> from m2m, as other ISP drivers do, or extending it to support auxiliary
> endpoints for parameters and statistics.
Ah, I should have read this before writing the above :-) Let's align the
userspace API of driver with the other ISP drivers.
> This series includes:
> - dt-binding schema for CAMSS OPE
> - initial CAMSS OPE driver
> - QCM2290 device tree node describing the hardware block.
>
> Feedback on the architecture and expected uAPI direction is especially
> welcome.
>
> Loic Poulain (3):
> dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE)
> media: qcom: camss: Add CAMSS Offline Processing Engine driver
> arm64: dts: qcom: qcm2290: Add CAMSS OPE node
>
> .../bindings/media/qcom,camss-ope.yaml | 87 +
> arch/arm64/boot/dts/qcom/agatti.dtsi | 72 +
> drivers/media/platform/qcom/camss/Makefile | 4 +
> drivers/media/platform/qcom/camss/camss-ope.c | 2058 +++++++++++++++++
> 4 files changed, 2221 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/media/qcom,camss-ope.yaml
> create mode 100644 drivers/media/platform/qcom/camss/camss-ope.c
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-24 11:00 ` Bryan O'Donoghue
` (2 preceding siblings ...)
2026-03-25 9:30 ` Konrad Dybcio
@ 2026-04-05 20:11 ` Laurent Pinchart
2026-04-05 20:15 ` Bryan O'Donoghue
3 siblings, 1 reply; 47+ messages in thread
From: Laurent Pinchart @ 2026-04-05 20:11 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: Loic Poulain, vladimir.zapolskiy, kieran.bingham, robh, krzk+dt,
andersson, konradybcio, linux-media, linux-arm-msm, devicetree,
linux-kernel, johannes.goede, mchehab
On Tue, Mar 24, 2026 at 11:00:21AM +0000, Bryan O'Donoghue wrote:
> On 23/03/2026 15:31, Loic Poulain wrote:
[snip]
> >>> +static void ope_prog_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
> >>> +{
> >>> + struct ope_dev *ope = ctx->ope;
> >>> + int i;
> >>> +
> >>> + dev_dbg(ope->dev, "Context %p - Programming S%u\n", ctx, ope_stripe_index(ctx, stripe));
> >>> +
> >>> + /* Fetch Engine */
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0, stripe->src.format);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE,
> >>> + (stripe->src.width << 16) + stripe->src.height);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_ADDR_IMAGE, stripe->src.addr);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_STRIDE, stripe->src.stride);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CCIF_META_DATA,
> >>> + FIELD_PREP(OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN, stripe->src.pattern));
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CORE_CFG, OPE_BUS_RD_CLIENT_0_CORE_CFG_EN);
> >>> +
> >>> + /* Write Engines */
> >>> + for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
> >>> + if (!stripe->dst[i].enabled) {
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i), 0);
> >>> + continue;
> >>> + }
> >>> +
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_ADDR_IMAGE(i), stripe->dst[i].addr);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_0(i),
> >>> + (stripe->dst[i].height << 16) + stripe->dst[i].width);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_1(i), stripe->dst[i].x_init);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_2(i), stripe->dst[i].stride);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_PACKER_CFG(i), stripe->dst[i].format);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i),
> >>> + OPE_BUS_WR_CLIENT_CFG_EN + OPE_BUS_WR_CLIENT_CFG_AUTORECOVER);
> >>> + }
> >>> +
> >>> + /* Downscalers */
> >>> + for (i = 0; i < OPE_DS_MAX; i++) {
> >>> + struct ope_dsc_config *dsc = &stripe->dsc[i];
> >>> + u32 base = ope_ds_base[i];
> >>> + u32 cfg = 0;
> >>> +
> >>> + if (dsc->input_width != dsc->output_width) {
> >>> + dsc->phase_step_h |= DS_RESOLUTION(dsc->input_width,
> >>> + dsc->output_width) << 30;
> >>> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN;
> >>> + }
> >>> +
> >>> + if (dsc->input_height != dsc->output_height) {
> >>> + dsc->phase_step_v |= DS_RESOLUTION(dsc->input_height,
> >>> + dsc->output_height) << 30;
> >>> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN;
> >>> + }
> >>> +
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(base), cfg);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(base),
> >>> + ((dsc->input_width - 1) << 16) + dsc->input_height - 1);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(base), dsc->phase_step_h);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(base), dsc->phase_step_v);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_CFG(base),
> >>> + cfg ? OPE_PP_CLC_DOWNSCALE_MN_CFG_EN : 0);
> >>> + }
> >>> +}
> >>
> >> So - this is where the CDM should be used - so that you don't have to do
> >> all of these MMIO writes inside of your ISR.
> >
> > Indeed, and that also the reason stripes are computed ahead of time,
> > so that they can be further 'queued' in a CDM.
> >
> >> Is that and additional step after the RFC ?
> > The current implementation (without CDM) already provides good results
> > and performance, so CDM can be viewed as a future enhancement.
>
> That's true but then the number of MMIO writes per ISR is pretty small
> right now. You have about 50 writes here.
>
> > As far as I understand, CDM could also be implemented in a generic way
> > within CAMSS, since other CAMSS blocks make use of CDM as well.
> > This is something we should discuss further.
>
> My concern is even conservatively if each module adds another 10 ?
> writes by the time we get to denoising, sharpening, lens shade
> correction, those writes could easily look more like 100.
>
> What user-space should submit is well documented data-structures which
> then get translated into CDM buffers by the OPE and IFE for the various
> bits of the pipeline.
The mali-c55 driver does this, it translates the ISP parameters buffers
to a list of register values in userspace context, when the buffer is
queued. In the IRQ handler, it then either copies those values to
registers with MMIO writes, or use a DMA engine, depending on the
platform. The rppx1 driver does something similar, with a different
format for the buffer containing the register values.
I think this architecture could be replicated here. This translation in
userspace context ensures that work at IRQ time is limited. The driver
can use whatever DMA engine is available depending on the platform, and
we can also force usage of MMIO for debugging or development purpose.
That way, development of ISP features is decoupled from development of
CDM support, enabling parallel development if desired, and faster
plaform enablement that allows starting the userspace side of the work
quicker.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 18:55 ` Dmitry Baryshkov
2026-03-30 22:51 ` Bryan O'Donoghue
@ 2026-04-05 20:14 ` Laurent Pinchart
1 sibling, 0 replies; 47+ messages in thread
From: Laurent Pinchart @ 2026-04-05 20:14 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: Bryan O'Donoghue, johannes.goede, Loic Poulain,
vladimir.zapolskiy, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
On Mon, Mar 30, 2026 at 09:55:23PM +0300, Dmitry Baryshkov wrote:
> On Mon, Mar 30, 2026 at 03:11:58PM +0100, Bryan O'Donoghue wrote:
> > On 30/03/2026 14:46, johannes.goede@oss.qualcomm.com wrote:
> > > > > And then your CCMv1 or CCMv2 helper will get called with
> > > > > the matching parameter-data.
> > > >
> > > > This leads to userspace having to know exact format for each hardware
> > > > version, which is not nice. At the very least it should be possible to
> > > > accept CCMv1 buffers and covert them to CCMv2 when required.
> > >
> > > Yes, but a new ISP may also have a different pipeline altogether
> > > with e.g. more then one preview/viewfinder output vs one viewfinder
> > > output for current hw, etc.
> >
> > My scoping on HFI shows that the IQ structures between Kona and later
> > versions have pretty stable data-structures.
> >
> > It might be worthwhile for the non-HFI version to implement those
> > structures.
> >
> > I keep mentioning CDM. Its also possible to construct the buffer in the
> > format the CDM would require and hand that from user-space into the kernel.
> >
> > That would save alot of overhead translating from one format to another.
> >
> > That's another reason I bring up CDM again and again. We probably don't want
> > to fix to the wrong format for OPE, introduce the CDM and then find we have
> > to map from one format to another for large and complex data over and over
> > again for each frame or every N frames.
> >
> > TBH I think the CDM should happen for this system and in that vein is there
> > any reason not to pack the data in the order the CDM will want ?
>
> This sounds like the most horrible idea: letting userspace directly
> program any registers in a way that is not visible to the kernel.
ISP hardware is typically not designed to make this safe, so I would be
really, really careful about going in that direction. It also seems a
dangerous idea to me.
> > So probably in fact IQ structs are not the right thing for OPE+IFE.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-04-05 20:11 ` Laurent Pinchart
@ 2026-04-05 20:15 ` Bryan O'Donoghue
2026-04-05 20:24 ` Laurent Pinchart
0 siblings, 1 reply; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-04-05 20:15 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Loic Poulain, vladimir.zapolskiy, kieran.bingham, robh, krzk+dt,
andersson, konradybcio, linux-media, linux-arm-msm, devicetree,
linux-kernel, johannes.goede, mchehab
On 05/04/2026 21:11, Laurent Pinchart wrote:
> The mali-c55 driver does this, it translates the ISP parameters buffers
> to a list of register values in userspace context, when the buffer is
> queued. In the IRQ handler, it then either copies those values to
> registers with MMIO writes, or use a DMA engine, depending on the
> platform. The rppx1 driver does something similar, with a different
> format for the buffer containing the register values.
>
> I think this architecture could be replicated here. This translation in
> userspace context ensures that work at IRQ time is limited. The driver
> can use whatever DMA engine is available depending on the platform, and
> we can also force usage of MMIO for debugging or development purpose.
> That way, development of ISP features is decoupled from development of
> CDM support, enabling parallel development if desired, and faster
> plaform enablement that allows starting the userspace side of the work
> quicker.
I think that's a reasonable plan.
We make the buffer in user-space which could be used by CDM but stage
the implementation.
That way if CDM proves too hard, we can do MMIO for a while, and then
transition to CDM if/when.
For me though I really think translating between formats is storing up pain.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-03-30 19:07 ` Loic Poulain
@ 2026-04-05 20:23 ` Laurent Pinchart
0 siblings, 0 replies; 47+ messages in thread
From: Laurent Pinchart @ 2026-04-05 20:23 UTC (permalink / raw)
To: Loic Poulain
Cc: Bryan O'Donoghue, johannes.goede, Dmitry Baryshkov,
vladimir.zapolskiy, kieran.bingham, robh, krzk+dt, andersson,
konradybcio, linux-media, linux-arm-msm, devicetree, linux-kernel,
mchehab
On Mon, Mar 30, 2026 at 09:07:59PM +0200, Loic Poulain wrote:
> On Mon, Mar 30, 2026 at 4:33 PM Bryan O'Donoghue wrote:
> > On 30/03/2026 15:27, johannes.goede@oss.qualcomm.com wrote:
> > >> That's another reason I bring up CDM again and again. We probably
> > >> don't want to fix to the wrong format for OPE, introduce the CDM
> > >> and then find we have to map from one format to another for large
> > >> and complex data over and over again for each frame or every N
> > >> frames.
> > >
> > > CDM is a much lower-level API then what is expected from
> > > a media-controller centric V4L2 driver. Basically the OPE
> > > driver will export:
> >
> > My concern is about wrappering one thing inside of another thing and
> > then stuffing it again back into CDM and doing the same on the way out.
>
> I think there will always be some level of copying involved. That
> said, we can pre‑build the CDM sequence in the drivers and only update
> the variable values, which should avoid significant overhead.
>
> If we start handling CDM formats directly on the user side, it would
> require exposing a lot of low‑level knowledge there (such as register
> layouts and offsets), and that would diverge from how other ISP
> implementations are structured. I’m concerned this would increase
> complexity and reduce portability.
Agreed, I don't think we should go in that direction. Translating the
parameters buffer to the format expecting by the CDM can be done in the
kernel in userspace context, and work in the IRQ handler will then
become minimal. As far as I understand the CDM expects a buffer that
contains register address and value pairs. This is exactly what the
R-Car V4H does, the rppx1 driver translates the parameters buffer to the
same register addresses and values format, and then passes it to the
VSP (which has the same role as the CDM here).
As mentioned in a separate e-mail, we also support programming the ISP
through MMIO. This creates more work in IRQ context, but is very useful
during development. Switching to MMIO just requires a different code
path in the IRQ handler that iterates over the registers
addresses/values in the VSP buffer, and writes to registers directly.
The architecture is very modular.
> > There are already 50 MMIO writes in the OPE ISR, I don't believe it is
> > sustainable to keep adding MMIO into that.
>
> Yes, I understand the concern. From our testing so far, however, this
> has not shown to be an issue. In addition, a full reconfiguration
> would only happen in specific cases, such as on explicit full
> configuration changes or during context switching. We can certainly
> look at implementing CDM, but at this stage it didn't seem to bring
> significant benefits, so I prefered to focus on other functional
> aspects, and revisit CDM once there is a clearer need, measurable
> gain, or if it becomes part of the uAPI as discussed here.
I agree. Let's design the driver with CDM in mind to have the right
abstraction layers, and work on CDM support in a second step. If someone
believes this should be done urgently, they can even help by working in
parallel with ISP features enablement.
> > I'm aware of a project in qcom that did something with making the CDM
> > format in libcamera and handed that off to kernel, recommend looking
> > into that.
>
> I will, thanks, I'm however, concerned about how acceptable this
> approach would be to the wider community and to the maintainers.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-04-05 20:15 ` Bryan O'Donoghue
@ 2026-04-05 20:24 ` Laurent Pinchart
2026-04-05 20:28 ` Bryan O'Donoghue
0 siblings, 1 reply; 47+ messages in thread
From: Laurent Pinchart @ 2026-04-05 20:24 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: Loic Poulain, vladimir.zapolskiy, kieran.bingham, robh, krzk+dt,
andersson, konradybcio, linux-media, linux-arm-msm, devicetree,
linux-kernel, johannes.goede, mchehab
On Sun, Apr 05, 2026 at 09:15:47PM +0100, Bryan O'Donoghue wrote:
> On 05/04/2026 21:11, Laurent Pinchart wrote:
> > The mali-c55 driver does this, it translates the ISP parameters buffers
> > to a list of register values in userspace context, when the buffer is
> > queued. In the IRQ handler, it then either copies those values to
> > registers with MMIO writes, or use a DMA engine, depending on the
> > platform. The rppx1 driver does something similar, with a different
> > format for the buffer containing the register values.
> >
> > I think this architecture could be replicated here. This translation in
> > userspace context ensures that work at IRQ time is limited. The driver
> > can use whatever DMA engine is available depending on the platform, and
> > we can also force usage of MMIO for debugging or development purpose.
> > That way, development of ISP features is decoupled from development of
> > CDM support, enabling parallel development if desired, and faster
> > plaform enablement that allows starting the userspace side of the work
> > quicker.
>
> I think that's a reasonable plan.
>
> We make the buffer in user-space which could be used by CDM but stage
> the implementation.
My proposal is to use an abstraction for the ISP parameters buffer, with
logical parameters, and translate that to the CDM buffer in kernelspace,
but in userspace context instead of IRQ handler context.
> That way if CDM proves too hard, we can do MMIO for a while, and then
> transition to CDM if/when.
>
> For me though I really think translating between formats is storing up pain.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
2026-04-05 20:24 ` Laurent Pinchart
@ 2026-04-05 20:28 ` Bryan O'Donoghue
0 siblings, 0 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-04-05 20:28 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Loic Poulain, vladimir.zapolskiy, kieran.bingham, robh, krzk+dt,
andersson, konradybcio, linux-media, linux-arm-msm, devicetree,
linux-kernel, johannes.goede, mchehab
On 05/04/2026 21:24, Laurent Pinchart wrote:
>> We make the buffer in user-space which could be used by CDM but stage
>> the implementation.
> My proposal is to use an abstraction for the ISP parameters buffer, with
> logical parameters, and translate that to the CDM buffer in kernelspace,
> but in userspace context instead of IRQ handler context.
As I understand it, the parameters buffers can top out at nearly 2.5
megabytes.
However I haven't looked into the CDM format in detail so - it needs
anaylsis.
TBH I'm happy enough to follow a precedent, let's discuss further with
an analysis of CDM in hand.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-04-05 19:55 ` Bryan O'Donoghue
@ 2026-04-05 20:47 ` Laurent Pinchart
2026-04-05 21:29 ` Bryan O'Donoghue
2026-04-05 23:02 ` Bryan O'Donoghue
0 siblings, 2 replies; 47+ messages in thread
From: Laurent Pinchart @ 2026-04-05 20:47 UTC (permalink / raw)
To: Bryan O'Donoghue
Cc: Loic Poulain, vladimir.zapolskiy, kieran.bingham, robh, krzk+dt,
andersson, konradybcio, linux-media, linux-arm-msm, devicetree,
linux-kernel, johannes.goede, mchehab
On Sun, Apr 05, 2026 at 08:55:00PM +0100, Bryan O'Donoghue wrote:
> On 05/04/2026 20:48, Laurent Pinchart wrote:
> > I don't necessarily agree with that. There are pros and cons for using
> > HFI on platforms that have an ICP. If correctly written, a firmware can
> > improve the throughput in multi-camera use cases by reprogramming the
> > time-multiplexed OPE faster. On the other hand, in use cases that don't
> > require pushing the platform to its limits, dealing with a closed-source
> > firmware often causes lots of issues.
> >
> > We should aim at supporting both direct ISP access and HFI with the same
> > userspace API, even on a single platform. Which option to start with is
> > an open question that we should discuss.
>
> I think - for IPE and BPS ICP/HFI is the way to go.
>
> However thinking about it - inline pixel processing IPP inside of the
> IFE is superior to BPS/IPE for virtually every scenario i.e. why deliver
> a frame to user-space and then submit it directly to BPS via CDM or via
> a firmware interface HFI, if you can do the same processing in the IFE -
> which on the majority of qcom platforms, you can.
As always, it depends. Offline processing consumes more memory bandwidth
and introduces latency, *but* if the statistics are computed by the IFE,
then the OPE can process frames using statistics coming from the same
frame instead of previous frames. It can improve the reactivity of the
algorithms.
Some processing is also badly suited for inline pipelines. In
particular, DOL HDR stitching in an inline pipeline requires a large
amoung of line buffers, so many ISP vendors implement it in offline ISPs
only. Temporal denoising can also be more tricky in an inline ISP.
Processing is sometimes split between the inline and offline parts, with
inline processing in Bayer domain, covering processing algorithms that
don't benefit much from using stats from the same frame, and offline
processing taking over for the rest.
> Agatti is an outlier in that sense.
>
> So actually I've shifted my focus on Hamoa to IFE/IPP.
I'd love to get stats out of the IFE :-)
> You still BTW do want HFI for BPS/IPE - but to get 3a going on the vast
> majority of qcom platforms - you want the PIX/IPP path in the IFE.
>
> OTOH if you want to do offline bayer processing - taking say a saved
> file from the filesystem - then BPS/IPE is the way to do it and IMO HFI
> is the way to do that.
>
> But ICP/BPS/IPE is a nice to have.
We need a glossary :-)
> I realise that's a word-soup of TLAs but yeah, TL;DR IFE/IPP is the way
> to go on !Agatti and once we get a nice 3a loop going there a fun
> side-project would be offline bayer processing via HFI.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-04-05 20:47 ` Laurent Pinchart
@ 2026-04-05 21:29 ` Bryan O'Donoghue
2026-04-05 23:02 ` Bryan O'Donoghue
1 sibling, 0 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-04-05 21:29 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Loic Poulain, vladimir.zapolskiy, kieran.bingham, robh, krzk+dt,
andersson, konradybcio, linux-media, linux-arm-msm, devicetree,
linux-kernel, johannes.goede, mchehab
On 05/04/2026 21:47, Laurent Pinchart wrote:
> Temporal denoising can also be more tricky in an inline ISP.
Funny you should mention that, to my knowledge, this is the only
functional thing BPS/IPE has that IFE/PIX does not.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-04-05 20:47 ` Laurent Pinchart
2026-04-05 21:29 ` Bryan O'Donoghue
@ 2026-04-05 23:02 ` Bryan O'Donoghue
1 sibling, 0 replies; 47+ messages in thread
From: Bryan O'Donoghue @ 2026-04-05 23:02 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Loic Poulain, vladimir.zapolskiy, kieran.bingham, robh, krzk+dt,
andersson, konradybcio, linux-media, linux-arm-msm, devicetree,
linux-kernel, johannes.goede, mchehab
On 05/04/2026 21:47, Laurent Pinchart wrote:
>> So actually I've shifted my focus on Hamoa to IFE/IPP.
> I'd love to get stats out of the IFE 🙂
Yeah, I'm fiddling with stats on Hamoa IFE right now.
---
bod
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support
2026-04-05 19:48 ` Laurent Pinchart
2026-04-05 19:55 ` Bryan O'Donoghue
@ 2026-04-06 13:22 ` Loic Poulain
1 sibling, 0 replies; 47+ messages in thread
From: Loic Poulain @ 2026-04-06 13:22 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Bryan O'Donoghue, vladimir.zapolskiy, kieran.bingham, robh,
krzk+dt, andersson, konradybcio, linux-media, linux-arm-msm,
devicetree, linux-kernel, johannes.goede, mchehab
Hi Laurent,
On Sun, Apr 5, 2026 at 9:48 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> On Tue, Mar 24, 2026 at 05:16:21PM +0100, Loic Poulain wrote:
> > On Tue, Mar 24, 2026 at 1:54 PM Bryan O'Donoghue wrote:
> > > On 23/03/2026 12:58, Loic Poulain wrote:
> > > > This first version is intentionally minimalistic. It provides a working
> > > > configuration using a fixed set of static processing parameters, mainly
> > > > to achieve correct and good-quality debayering.
> > >
> > > You need the other 50% of the kernel side - the generation of bayer
> > > statistics in the IFE, as well as generation of parameters to feed back
> > > into the OPE - which requires a user-space implementation too, so a lot
> > > of work there too.
> > >
> > > I'd also say when we have an ICP we should be using it via the HFI
> > > protocol, thus burying all of the IPE/OPE BPS and CDM complexity in the
> > > firmware.
> > >
> > > Understood Agatti has no ICP so you're limited to direct OPE/IFE
> > > register access here. For HFI capable platforms - the majority - HFI is
> > > the way to go.
> >
> > Fully agree, this is exactly the point where we should sync and work
> > together on a proper solution.
>
> I don't necessarily agree with that. There are pros and cons for using
> HFI on platforms that have an ICP. If correctly written, a firmware can
> improve the throughput in multi-camera use cases by reprogramming the
> time-multiplexed OPE faster. On the other hand, in use cases that don't
> require pushing the platform to its limits, dealing with a closed-source
> firmware often causes lots of issues.
Yes, we need to further explore the ICP (MCU-based offload) solution
before drawing any conclusions, especially to assess how complex it is
to leverage or bypass. That said, the current platform (Agatti/OPE)
does not support it anyway.
> We should aim at supporting both direct ISP access and HFI with the same
> userspace API, even on a single platform. Which option to start with is
> an open question that we should discuss.
>
> > As a follow‑up to this RFC, I already have several ongoing pieces that
> > aim to generalize the CAMSS ISP support, and I’d very much like to
> > discuss them with you:
> >
> > - camss-isp-m2m: Generic M2M scheduling framework handling job dispatch
> > based on buffer readiness and enabled endpoints (frame input, output,
> > statistics, parameters).
>
> This should be generic, not limited to camss. v4l2-isp is a good
> candidate.
>
> > - camss-isp-pipeline: Helper layer to construct complex media/ISP graphs
> > from a structural description (endpoints, links, etc.).
>
> That also doesn't seem specific to camss.
Yes, architecturally this is not CAMSS‑specific. However, the current
implementation may rely on certain assumptions or shortcuts that do
not hold across all general offline ISP use cases. With some effort,
it should be possible to generalize them [1] [2] .
[1] https://github.com/loicpoulain/linux/blob/camss-isp-dev/drivers/media/platform/qcom/camss/camss-isp-pipeline.c
[2] https://github.com/loicpoulain/linux/blob/camss-isp-dev/drivers/media/platform/qcom/camss/camss-isp-m2m.c
>
> > - camss-isp-params: Generic helper for handling ISP parameter buffers
> > (using v4l2-isp-params).
>
> I'm curious to know what camss-specific helpers you envision there.
Nothing too complex initially, just a parser built on the v4l2‑isp
helpers, along with a few handler callbacks [3]. This is something
I’ll discuss with Bryan, as we definitely want to reuse the same
format and parser for both inline and offline ISPs (as well as for
stats).
[3] https://github.com/loicpoulain/linux/blob/camss-isp-dev/drivers/media/platform/qcom/camss/camss-isp-params.c
>
> > - camss-isp-stats: Generic helper framework for CAMSS statistics devices.
>
> Same.
>
> > - camss-(isp-)ope: OPE‑specific logic only (register configuration, IRQ
> > handling, parameter‑to‑register translation).
> >
> > This approach should significantly reduce the amount of
> > platform‑specific code required for future ISP blocks. It should also
> > allow you to integrate a camss-isp-hamoa (or similar) backend, or even
> > a camss-isp-hfi implementation for the M2M functions, without
> > duplicating the infrastructure.
> >
> > So yes, let’s sync and agree on a shared/open development model and an
> > overall direction, possibly even a common tree, to ensure we stay
> > aligned and can collaborate effectively.
>
> Let's schedule a call to kickstart those discussions. Many people are on
> Easter vacation this week, next week could be a good candidate.
>
> > > I'll publish an RFC for Hamoa for that soonish so we can make sure both
> > > coexist.
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2026-04-06 13:22 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <xy6TKmdveRx4cMshSHEUGZ7s3lbsurWcsc2vq05A7_N4bCialR7EelZitouugtZDkpFCAghjqY4NDdSQEIPprw==@protonmail.internalid>
2026-03-23 12:58 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Loic Poulain
2026-03-23 12:58 ` [RFC PATCH 1/3] dt-bindings: media: qcom: Add CAMSS Offline Processing Engine (OPE) Loic Poulain
2026-03-23 13:03 ` Krzysztof Kozlowski
2026-03-23 16:03 ` Loic Poulain
2026-03-23 16:10 ` Krzysztof Kozlowski
2026-03-23 13:03 ` Bryan O'Donoghue
2026-03-23 12:58 ` [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver Loic Poulain
2026-03-23 13:43 ` Bryan O'Donoghue
2026-03-23 15:31 ` Loic Poulain
2026-03-24 11:00 ` Bryan O'Donoghue
2026-03-24 15:57 ` Loic Poulain
2026-03-24 21:27 ` Dmitry Baryshkov
2026-03-26 12:06 ` johannes.goede
2026-03-30 11:37 ` Dmitry Baryshkov
2026-03-30 13:46 ` johannes.goede
2026-03-30 14:11 ` Bryan O'Donoghue
2026-03-30 14:27 ` johannes.goede
2026-03-30 14:32 ` Bryan O'Donoghue
2026-03-30 18:59 ` Dmitry Baryshkov
2026-03-30 19:07 ` Loic Poulain
2026-04-05 20:23 ` Laurent Pinchart
2026-03-30 18:55 ` Dmitry Baryshkov
2026-03-30 22:51 ` Bryan O'Donoghue
2026-03-31 8:11 ` Konrad Dybcio
2026-04-05 20:14 ` Laurent Pinchart
2026-03-25 9:30 ` Konrad Dybcio
2026-04-05 20:11 ` Laurent Pinchart
2026-04-05 20:15 ` Bryan O'Donoghue
2026-04-05 20:24 ` Laurent Pinchart
2026-04-05 20:28 ` Bryan O'Donoghue
2026-03-23 12:58 ` [RFC PATCH 3/3] arm64: dts: qcom: qcm2290: Add CAMSS OPE node Loic Poulain
2026-03-23 13:03 ` Bryan O'Donoghue
2026-03-23 13:24 ` Konrad Dybcio
2026-03-23 13:33 ` Bryan O'Donoghue
2026-03-23 16:15 ` Krzysztof Kozlowski
2026-03-24 10:30 ` Bryan O'Donoghue
2026-03-23 16:31 ` Loic Poulain
2026-03-24 10:43 ` Konrad Dybcio
2026-03-24 12:54 ` [RFC PATCH 0/3] media: qcom: camss: CAMSS Offline Processing Engine support Bryan O'Donoghue
2026-03-24 16:16 ` Loic Poulain
2026-04-05 19:48 ` Laurent Pinchart
2026-04-05 19:55 ` Bryan O'Donoghue
2026-04-05 20:47 ` Laurent Pinchart
2026-04-05 21:29 ` Bryan O'Donoghue
2026-04-05 23:02 ` Bryan O'Donoghue
2026-04-06 13:22 ` Loic Poulain
2026-04-05 19:57 ` Laurent Pinchart
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox