[RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support

Devicetree
 help / color / mirror / Atom feed

* [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support
@ 2026-06-13  7:01 MidG971
  2026-06-13  7:01 ` [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data MidG971
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

RFC, not for merge. End-to-end inference does not produce correct output
yet (see Status), so per the v2 discussion this is a request for design
feedback. It probes, attaches, and submits cleanly on a stock v7.1-rc6
tree; what remains is one hardware-internal issue.

The RK3568 has a single NVDLA-derived NPU core, the same IP family as the
RK3588 NPU the driver already supports; the register layout matches. The
RK3568 differences are a 32-bit NPU AXI/IOMMU (vs 40-bit) and explicit
PVTPLL/PMU bring-up to power and de-idle the NPU before it is reachable.

Patches:
  1-2  rocket: per-SoC data struct, then derive DMA width and core count
       from match data (refactors, no functional change); patch 2 also
       bounds-checks the per-SoC cores array.
  3    rocket: RK3568 SoC data; start the PVTPLL compute clock via SCMI.
       Powering on and de-idling the NPU NoC are left to the power domain.
  4    rocket: reset the NPU before detaching the IOMMU on a job timeout
       (the detach otherwise stalls a wedged AXI master and WARNs).
  5    rocket: keep the IOMMU domain attached across jobs instead of
       re-attaching per job (the per-job rk_iommu handshake on the idle
       NPU MMU is slow and noisy); also drop the domain on reset and stop
       the scheduler before IOMMU teardown.
  6    dt-bindings: add the RK3568 NPU compatible; require rockchip,pmu
       for RK3568.
  7-8  arm64 dts: add the NPU and its IOMMU, and enable them on ROCK 3B.
  9    pmdomain: give the RK3568 NPU power domain a regulator so genpd
       owns vdd_npu via domain-supply (Suggested-by Chaoyi Chen).

Dependencies. This series no longer touches the IOMMU driver; two
in-flight Rockchip IOMMU changes are relevant but not part of it:
  - Simon Xue's "iommu/rockchip: Drop global rk_ops in favor of
    per-device ops" [1]. On boards with more than 4 GiB of RAM the NPU
    MMU's DTE must stay below 4 GiB (its DTE address is 32-bit), so the
    NPU IOMMU is described with the "rockchip,iommu" compatible, whose ops
    allocate the page tables with GFP_DMA32; the SoC's other IOMMUs use
    the "rockchip,rk3568-iommu" (40-bit) ops. The driver keeps a single
    global ops pointer, so two ops on one SoC trip its coexistence check;
    this series therefore sits on top of Simon's per-device-ops change,
    which Rockchip (Chaoyi Chen) confirmed is the intended way to give the
    NPU MMU its 32-bit DTE.
  - "iommu/rockchip: disable fetch dte time limit" [2] (Simon Xue / Sven
    Pueschel, in the iommu tree), which sets AUTO_GATING bit 31. v3 carried
    a local AUTO_GATING patch; that unconditional fix has since been merged,
    so this series drops its IOMMU patch. The bit is a no-op on this
    hardware in any case (the page walk completes on its reset value).

Power bring-up. The NPU is brought up through the power-domain layer (no
driver hack): the NPU power-domain keeps its clocks but drops the pm_qos
phandle (qos_npu sits behind the gated NPU NoC, so genpd's power-off QoS
save faults reading it), and vdd_npu is wired as the domain's
domain-supply with the domain marked need_regulator (patch 9), so genpd
brings the rail up before it de-idles the NoC at power-on. The PMU de-idle
then ACKs without PVTPLL running; PVTPLL is only needed for compute.

Status. On v7.1-rc6 the driver probes, creates /dev/accel/accel0,
attaches an IOMMU domain, and submits jobs; the program controller
fetches and broadcasts the command list. Inference output is still
wrong. The kernel side (this series) appears complete; what remains is
mesa/Teflon userspace, which still emits RK3588-tuned config (to be
filed on mesa-dev), and the hardware: with corrected config the NPU
reads the full input and weight tensors (per its DMA counters) but the
MAC/output stage never completes and the job times out, leaving the
output at the buffer's zero-point. It is not in the command list (a
byte-exact replay of the vendor's command list behaves the same).
Pointers from anyone with RK3568 NPU experience welcome.

Known residual. On the first IOMMU attach the NPU MMU is idle with paging
already enabled; the rk_iommu stall/reset handshake does not complete in
that state and logs one burst of timeouts before the (kept) domain
settles. It is harmless here because the job times out regardless, but it
points at an idle-MMU reconfiguration corner the rk_iommu code does not
handle on this block.

[1] https://lore.kernel.org/linux-rockchip/20260310105303.128859-1-xxm@rock-chips.com/
[2] https://lore.kernel.org/all/20260428-spu-iommudtefix-v2-1-f592f579e508@pengutronix.de/

Changes since v3:
  - Dropped the local AUTO_GATING patch: the correct fix (set AUTO_GATING
    bit 31, "disable fetch dte time limit") has since been merged upstream
    [2], so the series no longer touches the IOMMU driver.
  - vdd_npu: new pmdomain patch (9) gives the RK3568 NPU domain a regulator
    (need_regulator) and the board wires domain-supply, dropping the
    regulator-always-on workaround (Suggested-by Chaoyi Chen). It relies on
    the in-tree pmdomain default-off-if-need_regulator handling. The
    "Failed to create device link ... <pmic>" line at pmdomain probe is a
    pre-existing fw_devlink cyclic-dependency warning (the single
    power-controller provides every domain, including the one the I2C PMIC
    needs), seen the same way on RK3588; it is harmless here beyond a few
    wasted EPROBE_DEFER retries, and a proper fix belongs in the
    power-controller driver, not this series.
  - rk356x dts: also assign the CRU CLK_NPU so the NPU AXI bus clock comes
    up at 200 MHz instead of the 12 MHz boot default; order the NPU/IOMMU
    nodes by unit address.
  - rocket RK3568: fetch the SCMI/PVTPLL clock by name (the v3 bulk index
    resolved to the wrong clock); drop the redundant driver PMU de-idle
    writes (handled by the power domain).
  - rocket: clear the attached IOMMU domain on reset; unwind through
    rocket_core_fini() on noc_init failure; stop the scheduler before the
    IOMMU teardown.
  - rocket: bounds-check the cores array against the per-SoC core count.
  - Binding: require rockchip,pmu on RK3568.
  - Dependency framing: confirmed by Rockchip as v2 + 32-bit DTE via
    Simon's per-device-ops series (was framed as v1 in v3).

Midgy BALON (9):
  accel: rocket: Introduce per-SoC rocket_soc_data
  accel: rocket: Derive DMA width and core count from match data
  accel: rocket: Add RK3568 SoC support
  accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  accel: rocket: Keep the IOMMU domain attached across jobs
  dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU
  pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain

 .../npu/rockchip,rk3588-rknn-core.yaml        | 27 +++++++++-
 .../boot/dts/rockchip/rk3568-rock-3b.dts      | 18 ++++++-
 arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 ++++++++++++++
 drivers/accel/rocket/rocket_core.c            | 30 ++++++++++-
 drivers/accel/rocket/rocket_core.h            | 19 +++++++
 drivers/accel/rocket/rocket_device.c          | 15 ++----
 drivers/accel/rocket/rocket_device.h          |  3 +-
 drivers/accel/rocket/rocket_drv.c             | 50 ++++++++++++++++++-
 drivers/accel/rocket/rocket_job.c             | 45 ++++++++++++++---
 drivers/pmdomain/rockchip/pm-domains.c        | 36 +++++++++----
 10 files changed, 245 insertions(+), 36 deletions(-)


base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
-- 
2.39.5


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:10   ` sashiko-bot
  2026-06-13  7:01 ` [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data MidG971
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

Add a per-SoC data structure carried in the OF match table, currently
holding only the NPU AXI address width, and use it for the per-core DMA
mask instead of a hardcoded 40-bit value.  No functional change: the
RK3588 AXI master is 40-bit.  This prepares for SoCs with a narrower
address width.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c |  7 ++++++-
 drivers/accel/rocket/rocket_core.h | 11 +++++++++++
 drivers/accel/rocket/rocket_drv.c  |  6 +++++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index b3b2fa9ba645a..09c445af7de73 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -7,6 +7,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/err.h>
 #include <linux/iommu.h>
+#include <linux/of.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 #include <linux/reset.h>
@@ -21,6 +22,10 @@ int rocket_core_init(struct rocket_core *core)
 	u32 version;
 	int err = 0;
 
+	core->soc_data = of_device_get_match_data(dev);
+	if (!core->soc_data)
+		return dev_err_probe(dev, -EINVAL, "missing SoC match data\n");
+
 	core->resets[0].id = "srst_a";
 	core->resets[1].id = "srst_h";
 	err = devm_reset_control_bulk_get_exclusive(&pdev->dev, ARRAY_SIZE(core->resets),
@@ -52,7 +57,7 @@ int rocket_core_init(struct rocket_core *core)
 
 	dma_set_max_seg_size(dev, UINT_MAX);
 
-	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(core->soc_data->dma_bits));
 	if (err)
 		return err;
 
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index f6d7382854ca9..8ee105a0be40e 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -12,6 +12,16 @@
 
 #include "rocket_registers.h"
 
+struct rocket_core;
+
+/**
+ * struct rocket_soc_data - per-SoC configuration data
+ * @dma_bits: Physical address width reachable by the NPU's AXI master.
+ */
+struct rocket_soc_data {
+	unsigned int dma_bits;
+};
+
 #define rocket_pc_readl(core, reg) \
 	readl((core)->pc_iomem + (REG_PC_##reg))
 #define rocket_pc_writel(core, reg, value) \
@@ -31,6 +41,7 @@ struct rocket_core {
 	struct device *dev;
 	struct rocket_device *rdev;
 	unsigned int index;
+	const struct rocket_soc_data *soc_data;
 
 	int irq;
 	void __iomem *pc_iomem;
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 8bbbce594883e..384c38e13acce 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -213,8 +213,12 @@ static void rocket_remove(struct platform_device *pdev)
 	}
 }
 
+static const struct rocket_soc_data rk3588_soc_data = {
+	.dma_bits = 40,
+};
+
 static const struct of_device_id dt_match[] = {
-	{ .compatible = "rockchip,rk3588-rknn-core" },
+	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
 	{}
 };
 MODULE_DEVICE_TABLE(of, dt_match);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
  2026-06-13  7:01 ` [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:09   ` sashiko-bot
  2026-06-13  7:01 ` [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support MidG971
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

The probe already has the per-SoC match data, which now records the core
count and DMA width.  Use it for the cores array allocation and the
device DMA mask instead of re-scanning the device tree for available core
nodes.

While at it, reject a device tree that declares more NPU core nodes than
the SoC has, so the fixed-size cores array can never be overrun.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.h   |  2 ++
 drivers/accel/rocket/rocket_device.c | 15 +++++----------
 drivers/accel/rocket/rocket_device.h |  3 ++-
 drivers/accel/rocket/rocket_drv.c    | 13 ++++++++++++-
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 8ee105a0be40e..d6421251670dc 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -16,9 +16,11 @@ struct rocket_core;
 
 /**
  * struct rocket_soc_data - per-SoC configuration data
+ * @num_cores: Number of NPU cores in this SoC.
  * @dma_bits: Physical address width reachable by the NPU's AXI master.
  */
 struct rocket_soc_data {
+	unsigned int num_cores;
 	unsigned int dma_bits;
 };
 
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index 46e6ee1e72c5f..6186f4faa3a2a 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -6,18 +6,16 @@
 #include <linux/clk.h>
 #include <linux/dma-mapping.h>
 #include <linux/platform_device.h>
-#include <linux/of.h>
 
 #include "rocket_device.h"
 
 struct rocket_device *rocket_device_init(struct platform_device *pdev,
-					 const struct drm_driver *rocket_drm_driver)
+					 const struct drm_driver *rocket_drm_driver,
+					 const struct rocket_soc_data *soc_data)
 {
 	struct device *dev = &pdev->dev;
-	struct device_node *core_node;
 	struct rocket_device *rdev;
 	struct drm_device *ddev;
-	unsigned int num_cores = 0;
 	int err;
 
 	rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
@@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
 	ddev = &rdev->ddev;
 	dev_set_drvdata(dev, rdev);
 
-	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
-		if (of_device_is_available(core_node))
-			num_cores++;
-
-	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
+	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
+				   GFP_KERNEL);
 	if (!rdev->cores)
 		return ERR_PTR(-ENOMEM);
 
 	dma_set_max_seg_size(dev, UINT_MAX);
 
-	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(soc_data->dma_bits));
 	if (err)
 		return ERR_PTR(err);
 
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index ce662abc01d3d..2f74e078974e3 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -22,7 +22,8 @@ struct rocket_device {
 };
 
 struct rocket_device *rocket_device_init(struct platform_device *pdev,
-					 const struct drm_driver *rocket_drm_driver);
+					 const struct drm_driver *rocket_drm_driver,
+					 const struct rocket_soc_data *soc_data);
 void rocket_device_fini(struct rocket_device *rdev);
 #define to_rocket_device(drm_dev) \
 	((struct rocket_device *)(container_of((drm_dev), struct rocket_device, ddev)))
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 384c38e13acce..f0beed2d522c7 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -159,11 +159,15 @@ static const struct drm_driver rocket_drm_driver = {
 
 static int rocket_probe(struct platform_device *pdev)
 {
+	const struct rocket_soc_data *soc_data = of_device_get_match_data(&pdev->dev);
 	int ret;
 
+	if (!soc_data)
+		return -EINVAL;
+
 	if (rdev == NULL) {
 		/* First core probing, initialize DRM device. */
-		rdev = rocket_device_init(drm_dev, &rocket_drm_driver);
+		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
 		if (IS_ERR(rdev)) {
 			dev_err(&pdev->dev, "failed to initialize rocket device\n");
 			return PTR_ERR(rdev);
@@ -172,6 +176,12 @@ static int rocket_probe(struct platform_device *pdev)
 
 	unsigned int core = rdev->num_cores;
 
+	if (core >= soc_data->num_cores) {
+		dev_err(&pdev->dev, "too many NPU core nodes (max %u)\n",
+			soc_data->num_cores);
+		return -EINVAL;
+	}
+
 	dev_set_drvdata(&pdev->dev, rdev);
 
 	rdev->cores[core].rdev = rdev;
@@ -214,6 +224,7 @@ static void rocket_remove(struct platform_device *pdev)
 }
 
 static const struct rocket_soc_data rk3588_soc_data = {
+	.num_cores = 3,
 	.dma_bits = 40,
 };
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
  2026-06-13  7:01 ` [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data MidG971
  2026-06-13  7:01 ` [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:11   ` sashiko-bot
  2026-06-13  7:01 ` [RFC PATCH v4 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout MidG971
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

The RK3568 has a single core of the same NVDLA-derived NPU IP as the
RK3588, with a 32-bit AXI master.  Add rk3568_soc_data and its
compatible.

Unlike the RK3588, the RK3568 NPU's compute clock is a PVTPLL managed by
TF-A via SCMI; start it from an noc_init callback with a real rate change
(an intermediate rate defeats the clock framework's unchanged-rate
shortcut).  Powering on and de-idling the NPU NoC are left to the power
domain (genpd), which performs them when the IOMMU supplier is resumed,
so the driver does not poke the PMU directly.

If noc_init fails, unwind through rocket_core_fini() so the core is torn
down completely rather than leaking the runtime-PM and IOMMU state.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c |  9 +++++++++
 drivers/accel/rocket/rocket_core.h |  3 +++
 drivers/accel/rocket/rocket_drv.c  | 31 ++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index 09c445af7de73..779e951596a15 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -88,6 +88,15 @@ int rocket_core_init(struct rocket_core *core)
 		return err;
 	}
 
+	if (core->soc_data->noc_init) {
+		err = core->soc_data->noc_init(core);
+		if (err) {
+			pm_runtime_put_sync(dev);
+			rocket_core_fini(core);
+			return err;
+		}
+	}
+
 	version = rocket_pc_readl(core, VERSION);
 	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
 
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index d6421251670dc..5a145ba8c5a92 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -18,10 +18,13 @@ struct rocket_core;
  * struct rocket_soc_data - per-SoC configuration data
  * @num_cores: Number of NPU cores in this SoC.
  * @dma_bits: Physical address width reachable by the NPU's AXI master.
+ * @noc_init: Optional callback to bring up the NPU before it is reachable.
+ *            Used on RK3568 to start the PVTPLL compute clock via SCMI.
  */
 struct rocket_soc_data {
 	unsigned int num_cores;
 	unsigned int dma_bits;
+	int (*noc_init)(struct rocket_core *core);
 };
 
 #define rocket_pc_readl(core, reg) \
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index f0beed2d522c7..86484110ad6f0 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -10,6 +10,7 @@
 #include <linux/err.h>
 #include <linux/iommu.h>
 #include <linux/of.h>
+#include <linux/of_clk.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 
@@ -223,12 +224,42 @@ static void rocket_remove(struct platform_device *pdev)
 	}
 }
 
+/*
+ * The NPU compute clock is a PVTPLL managed by TF-A via SCMI; spin it up
+ * with a real rate change (an intermediate rate defeats the clock
+ * framework's unchanged-rate shortcut).  Powering on and de-idling the NPU
+ * NoC are handled by the power domain (genpd) before the NPU is accessed.
+ */
+static int rk3568_noc_init(struct rocket_core *core)
+{
+	struct clk *npu_clk;
+
+	npu_clk = of_clk_get_by_name(core->dev->of_node, "npu");
+	if (IS_ERR(npu_clk))
+		return dev_err_probe(core->dev, PTR_ERR(npu_clk),
+				     "failed to get the NPU SCMI clock\n");
+
+	if (clk_set_rate(npu_clk, 600000000UL) ||
+	    clk_set_rate(npu_clk, 1000000000UL))
+		dev_warn(core->dev, "failed to set the NPU compute clock rate\n");
+	clk_put(npu_clk);
+
+	return 0;
+}
+
+static const struct rocket_soc_data rk3568_soc_data = {
+	.num_cores = 1,
+	.dma_bits = 32,
+	.noc_init = rk3568_noc_init,
+};
+
 static const struct rocket_soc_data rk3588_soc_data = {
 	.num_cores = 3,
 	.dma_bits = 40,
 };
 
 static const struct of_device_id dt_match[] = {
+	{ .compatible = "rockchip,rk3568-rknn-core", .data = &rk3568_soc_data },
 	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
 	{}
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
                   ` (2 preceding siblings ...)
  2026-06-13  7:01 ` [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:15   ` sashiko-bot
  2026-06-13  7:01 ` [RFC PATCH v4 5/9] accel: rocket: Keep the IOMMU domain attached across jobs MidG971
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

On a job timeout the NPU AXI master can be left wedged with
outstanding transactions. rocket_reset() detached the IOMMU group
before resetting the hardware, so iommu_detach_group() ->
__iommu_group_set_core_domain() asked the rk_iommu to stall and wait
for the in-flight transactions to drain. They never did, the stall
request timed out (-ETIMEDOUT) and the IOMMU core WARNed:

  WARNING: drivers/iommu/iommu.c:157 __iommu_group_set_core_domain
    iommu_detach_group
    rocket_reset
    rocket_job_timedout

Assert the core reset first: it quiesces the AXI master so the
following IOMMU detach completes cleanly. Move the detach after
rocket_core_reset() and out of the job_lock (it does not touch
in_flight_job).

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_job.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index ac51bff39833f..e25234261536b 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -364,14 +364,20 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
 		if (core->in_flight_job)
 			pm_runtime_put_noidle(core->dev);
 
-		iommu_detach_group(NULL, core->iommu_group);
-
 		core->in_flight_job = NULL;
 	}
 
-	/* Proceed with reset now. */
+	/*
+	 * Reset the NPU hardware before detaching the IOMMU. A timed-out job
+	 * leaves the NPU AXI master wedged; detaching the IOMMU then issues a
+	 * stall request that never drains and times out (warning in the IOMMU
+	 * core). Asserting the core reset first quiesces the master so the
+	 * detach completes cleanly.
+	 */
 	rocket_core_reset(core);
 
+	iommu_detach_group(NULL, core->iommu_group);
+
 	/* NPU has been reset, we can clear the reset pending bit. */
 	atomic_set(&core->reset.pending, 0);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 5/9] accel: rocket: Keep the IOMMU domain attached across jobs
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
                   ` (3 preceding siblings ...)
  2026-06-13  7:01 ` [RFC PATCH v4 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:12   ` sashiko-bot
  2026-06-13  7:01 ` [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 MidG971
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

rocket attached the job's IOMMU domain in rocket_job_run() and
detached it again on every completion and reset. Each attach/detach
toggles the rk_iommu stall/force-reset/paging handshake, and on
RK3568 the NPU MMU is idle between jobs, so that handshake times out
and logs a burst of "stall/paging request timed out" errors for
every job.

Attach the per-context domain once and keep it: track the attached
domain in the core, swap it only when a job from a different context
runs, and detach it at core teardown. A reference on the attached
domain is held so it outlives the job that first attached it and is
released on swap/teardown.

Because a hardware reset (on job timeout) wipes the IOMMU page-table
base register, drop the attached domain after rocket_core_reset() so
the next job re-attaches and reprograms it. Also tear down the
scheduler before detaching the IOMMU in rocket_core_fini(), so an
in-flight job can no longer reach the domain being detached.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c | 14 +++++++++++-
 drivers/accel/rocket/rocket_core.h |  3 +++
 drivers/accel/rocket/rocket_job.c  | 35 +++++++++++++++++++++++++-----
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index 779e951596a15..6c128f585cff4 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -13,6 +13,7 @@
 #include <linux/reset.h>
 
 #include "rocket_core.h"
+#include "rocket_drv.h"
 #include "rocket_job.h"
 
 int rocket_core_init(struct rocket_core *core)
@@ -112,9 +113,20 @@ void rocket_core_fini(struct rocket_core *core)
 {
 	pm_runtime_dont_use_autosuspend(core->dev);
 	pm_runtime_disable(core->dev);
+
+	/*
+	 * Stop the scheduler before tearing down the IOMMU so an in-flight
+	 * job can no longer touch the (about to be detached) domain.
+	 */
+	rocket_job_fini(core);
+
+	if (core->attached_domain) {
+		iommu_detach_group(NULL, core->iommu_group);
+		rocket_iommu_domain_put(core->attached_domain);
+		core->attached_domain = NULL;
+	}
 	iommu_group_put(core->iommu_group);
 	core->iommu_group = NULL;
-	rocket_job_fini(core);
 }
 
 void rocket_core_reset(struct rocket_core *core)
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 5a145ba8c5a92..78791ecb32e75 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -42,6 +42,8 @@ struct rocket_soc_data {
 #define rocket_core_writel(core, reg, value) \
 	writel(value, (core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
 
+struct rocket_iommu_domain;
+
 struct rocket_core {
 	struct device *dev;
 	struct rocket_device *rdev;
@@ -56,6 +58,7 @@ struct rocket_core {
 	struct reset_control_bulk_data resets[2];
 
 	struct iommu_group *iommu_group;
+	struct rocket_iommu_domain *attached_domain;
 
 	struct mutex job_lock;
 	struct rocket_job *in_flight_job;
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index e25234261536b..368b2ebead1b3 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -9,6 +9,7 @@
 #include <drm/rocket_accel.h>
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
+#include <linux/kref.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 
@@ -314,9 +315,26 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
 	if (ret < 0)
 		return fence;
 
-	ret = iommu_attach_group(job->domain->domain, core->iommu_group);
-	if (ret < 0)
-		return fence;
+	/*
+	 * Attach the job's IOMMU domain only when it differs from the one
+	 * already attached. Re-attaching per job toggles the rk_iommu
+	 * stall/reset handshake on an idle NPU MMU, which is slow and
+	 * noisy; keep the domain attached across jobs instead.
+	 */
+	if (core->attached_domain != job->domain) {
+		if (core->attached_domain) {
+			iommu_detach_group(NULL, core->iommu_group);
+			rocket_iommu_domain_put(core->attached_domain);
+			core->attached_domain = NULL;
+		}
+
+		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
+		if (ret < 0)
+			return fence;
+
+		kref_get(&job->domain->kref);
+		core->attached_domain = job->domain;
+	}
 
 	scoped_guard(mutex, &core->job_lock) {
 		core->in_flight_job = job;
@@ -340,7 +358,6 @@ static void rocket_job_handle_irq(struct rocket_core *core)
 				return;
 			}
 
-			iommu_detach_group(NULL, iommu_group_get(core->dev));
 			dma_fence_signal(core->in_flight_job->done_fence);
 			pm_runtime_put_autosuspend(core->dev);
 			core->in_flight_job = NULL;
@@ -376,7 +393,15 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
 	 */
 	rocket_core_reset(core);
 
-	iommu_detach_group(NULL, core->iommu_group);
+	/*
+	 * The reset wipes the IOMMU page-table base, so drop the attached
+	 * domain to force the next job to re-attach and reprogram it.
+	 */
+	if (core->attached_domain) {
+		iommu_detach_group(NULL, core->iommu_group);
+		rocket_iommu_domain_put(core->attached_domain);
+		core->attached_domain = NULL;
+	}
 
 	/* NPU has been reset, we can clear the reset pending bit. */
 	atomic_set(&core->reset.pending, 0);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
                   ` (4 preceding siblings ...)
  2026-06-13  7:01 ` [RFC PATCH v4 5/9] accel: rocket: Keep the IOMMU domain attached across jobs MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:11   ` sashiko-bot
  2026-06-13  7:01 ` [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU MidG971
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

The RK3568 carries a single core of the same NVDLA-derived NPU IP as the
RK3588.  Add its compatible.

On RK3568 the NPU NoC bus-idle and power gating are controlled through the
system PMU rather than a dedicated register block, so add a rockchip,pmu
phandle to that syscon.  The RK3568 NPU has no dedicated SRAM rail, so
sram-supply is required only on RK3588.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 .../npu/rockchip,rk3588-rknn-core.yaml        | 27 ++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
index caca2a4903cd1..e0b948ac47d45 100644
--- a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
+++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
@@ -21,6 +21,7 @@ properties:
 
   compatible:
     enum:
+      - rockchip,rk3568-rknn-core
       - rockchip,rk3588-rknn-core
 
   reg:
@@ -50,6 +51,13 @@ properties:
 
   npu-supply: true
 
+  rockchip,pmu:
+    $ref: /schemas/types.yaml#/definitions/phandle
+    description:
+      Phandle to the PMU syscon.  On RK3568 the NPU's NoC bus-idle and
+      power gating are controlled through the PMU; this points to that
+      syscon so those registers can be reached.
+
   power-domains:
     maxItems: 1
 
@@ -75,7 +83,24 @@ required:
   - resets
   - reset-names
   - npu-supply
-  - sram-supply
+
+allOf:
+  - if:
+      properties:
+        compatible:
+          contains:
+            const: rockchip,rk3588-rknn-core
+    then:
+      required:
+        - sram-supply
+  - if:
+      properties:
+        compatible:
+          contains:
+            const: rockchip,rk3568-rknn-core
+    then:
+      required:
+        - rockchip,pmu
 
 additionalProperties: false
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
                   ` (5 preceding siblings ...)
  2026-06-13  7:01 ` [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:09   ` sashiko-bot
  2026-06-13  8:18   ` Jonas Karlman
  2026-06-13  7:01 ` [RFC PATCH v4 8/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU MidG971
  2026-06-13  7:01 ` [RFC PATCH v4 9/9] pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain MidG971
  8 siblings, 2 replies; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

The RK3568 has an NVDLA-derived NPU at fde40000 with its own IOMMU at
fde4b000. Add both nodes (disabled by default) and the NPU power-domain
child under the PMU power-controller, and point rockchip,pmu at the PMU
syscon that controls the NPU NoC bus-idle.

Besides the SCMI compute clock, assign the CRU CLK_NPU so the NPU AXI
bus clock comes up at 200 MHz rather than the 12 MHz boot default.

The power-domain deliberately carries no pm_qos: qos_npu sits behind the
NPU NoC, which is gated until the NPU is brought up, so a genpd power-off
QoS save would fault reading it.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
index 64bdd8b7754b5..313db59c1aed8 100644
--- a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
@@ -512,6 +512,13 @@ power-domain@RK3568_PD_GPU {
 				#power-domain-cells = <0>;
 			};
 
+			pd_npu: power-domain@RK3568_PD_NPU {
+				reg = <RK3568_PD_NPU>;
+				clocks = <&cru ACLK_NPU_PRE>,
+					 <&cru HCLK_NPU_PRE>;
+				#power-domain-cells = <0>;
+			};
+
 			/* These power domains are grouped by VD_LOGIC */
 			power-domain@RK3568_PD_VI {
 				reg = <RK3568_PD_VI>;
@@ -572,6 +579,37 @@ power-domain@RK3568_PD_RKVENC {
 		};
 	};
 
+	rknn_core_0: npu@fde40000 {
+		compatible = "rockchip,rk3568-rknn-core";
+		reg = <0x0 0xfde40000 0x0 0x1000>,
+		      <0x0 0xfde41000 0x0 0x1000>,
+		      <0x0 0xfde43000 0x0 0x1000>;
+		reg-names = "pc", "cna", "core";
+		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>,
+			 <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_PRE>;
+		clock-names = "aclk", "hclk", "npu", "pclk";
+		assigned-clocks = <&scmi_clk SCMI_CLK_NPU>, <&cru CLK_NPU>;
+		assigned-clock-rates = <200000000>, <600000000>;
+		resets = <&cru SRST_A_NPU>, <&cru SRST_H_NPU>;
+		reset-names = "srst_a", "srst_h";
+		power-domains = <&power RK3568_PD_NPU>;
+		rockchip,pmu = <&pmu>;
+		iommus = <&rknn_mmu_0>;
+		status = "disabled";
+	};
+
+	rknn_mmu_0: iommu@fde4b000 {
+		compatible = "rockchip,iommu";
+		reg = <0x0 0xfde4b000 0x0 0x40>;
+		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
+		clock-names = "aclk", "iface";
+		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>;
+		power-domains = <&power RK3568_PD_NPU>;
+		#iommu-cells = <0>;
+		status = "disabled";
+	};
+
 	gpu: gpu@fde60000 {
 		compatible = "rockchip,rk3568-mali", "arm,mali-bifrost";
 		reg = <0x0 0xfde60000 0x0 0x4000>;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 8/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
                   ` (6 preceding siblings ...)
  2026-06-13  7:01 ` [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:40   ` Jonas Karlman
  2026-06-13  7:01 ` [RFC PATCH v4 9/9] pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain MidG971
  8 siblings, 1 reply; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

Enable the NPU and its IOMMU on ROCK 3B and wire vdd_npu as the NPU
power domain's domain-supply, so genpd brings the rail up and down with
the domain (the domain is marked need_regulator). The PVTPLL compute
clock is brought up later by the driver.

The rail is no longer kept always-on, so pin it to 1000 mV (the NPU's
1 GHz operating voltage; the driver runs a fixed compute rate with no
devfreq voltage scaling) and mark it boot-on, so it is up before the
power domain de-idles the NPU NoC at power-on.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 .../arm64/boot/dts/rockchip/rk3568-rock-3b.dts | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts b/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
index 69001e453732e..d3f9776c2bdc3 100644
--- a/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
@@ -330,9 +330,10 @@ regulator-state-mem {
 
 			vdd_npu: DCDC_REG4 {
 				regulator-name = "vdd_npu";
+				regulator-boot-on;
 				regulator-initial-mode = <0x2>;
-				regulator-min-microvolt = <500000>;
-				regulator-max-microvolt = <1350000>;
+				regulator-min-microvolt = <1000000>;
+				regulator-max-microvolt = <1000000>;
 				regulator-ramp-delay = <6001>;
 
 				regulator-state-mem {
@@ -787,3 +788,16 @@ vp0_out_hdmi: endpoint@ROCKCHIP_VOP2_EP_HDMI0 {
 		remote-endpoint = <&hdmi_in_vp0>;
 	};
 };
+
+&pd_npu {
+	domain-supply = <&vdd_npu>;
+};
+
+&rknn_core_0 {
+	npu-supply = <&vdd_npu>;
+	status = "okay";
+};
+
+&rknn_mmu_0 {
+	status = "okay";
+};
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v4 9/9] pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain
  2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
                   ` (7 preceding siblings ...)
  2026-06-13  7:01 ` [RFC PATCH v4 8/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU MidG971
@ 2026-06-13  7:01 ` MidG971
  2026-06-13  7:21   ` sashiko-bot
  8 siblings, 1 reply; 20+ messages in thread
From: MidG971 @ 2026-06-13  7:01 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson
  Cc: dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas, Midgy BALON

From: Midgy BALON <midgy971@gmail.com>

The RK3568 NPU rail (vdd_npu) needs to be enabled before the domain is
powered on and disabled after it is powered off. Give DOMAIN_RK3568 a
regulator parameter (like DOMAIN_RK3588 already has) so the NPU domain
can set need_regulator, letting genpd manage the rail wired up as the
domain's domain-supply instead of marking it always-on in DT.

Suggested-by: Chaoyi Chen <chaoyi.chen@rock-chips.com>
Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/pmdomain/rockchip/pm-domains.c | 36 ++++++++++++++++++--------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
index 490bbb1d1d8e8..19db307e3811d 100644
--- a/drivers/pmdomain/rockchip/pm-domains.c
+++ b/drivers/pmdomain/rockchip/pm-domains.c
@@ -138,6 +138,20 @@ struct rockchip_pmu {
 	.active_wakeup = wakeup,			\
 }
 
+#define DOMAIN_M_R(_name, pwr, status, req, idle, ack, wakeup, regulator)	\
+{							\
+	.name = _name,				\
+	.pwr_w_mask = (pwr) << 16,			\
+	.pwr_mask = (pwr),				\
+	.status_mask = (status),			\
+	.req_w_mask = (req) << 16,			\
+	.req_mask = (req),				\
+	.idle_mask = (idle),				\
+	.ack_mask = (ack),				\
+	.active_wakeup = wakeup,			\
+	.need_regulator = regulator,			\
+}
+
 #define DOMAIN_M_G(_name, pwr, status, req, idle, ack, g_mask, wakeup, keepon)	\
 {							\
 	.name = _name,					\
@@ -241,8 +255,8 @@ struct rockchip_pmu {
 #define DOMAIN_RK3562(name, pwr, req, g_mask, mem, wakeup)		\
 	DOMAIN_M_G_SD(name, pwr, pwr, req, req, req, g_mask, mem, wakeup, false)
 
-#define DOMAIN_RK3568(name, pwr, req, wakeup)		\
-	DOMAIN_M(name, pwr, pwr, req, req, req, wakeup)
+#define DOMAIN_RK3568(name, pwr, req, wakeup, regulator)		\
+	DOMAIN_M_R(name, pwr, pwr, req, req, req, wakeup, regulator)
 
 #define DOMAIN_RK3576(name, p_offset, pwr, status, r_status, r_offset, req, idle, g_mask, wakeup)	\
 	DOMAIN_M_O_R_G(name, p_offset, pwr, status, 0, r_status, r_status, r_offset, req, idle, idle, g_mask, wakeup)
@@ -1274,15 +1288,15 @@ static const struct rockchip_domain_info rk3562_pm_domains[] = {
 };
 
 static const struct rockchip_domain_info rk3568_pm_domains[] = {
-	[RK3568_PD_NPU]		= DOMAIN_RK3568("npu",  BIT(1), BIT(2),  false),
-	[RK3568_PD_GPU]		= DOMAIN_RK3568("gpu",  BIT(0), BIT(1),  false),
-	[RK3568_PD_VI]		= DOMAIN_RK3568("vi",   BIT(6), BIT(3),  false),
-	[RK3568_PD_VO]		= DOMAIN_RK3568("vo",   BIT(7), BIT(4),  false),
-	[RK3568_PD_RGA]		= DOMAIN_RK3568("rga",  BIT(5), BIT(5),  false),
-	[RK3568_PD_VPU]		= DOMAIN_RK3568("vpu",  BIT(2), BIT(6),  false),
-	[RK3568_PD_RKVDEC]	= DOMAIN_RK3568("vdec", BIT(4), BIT(8),  false),
-	[RK3568_PD_RKVENC]	= DOMAIN_RK3568("venc", BIT(3), BIT(7),  false),
-	[RK3568_PD_PIPE]	= DOMAIN_RK3568("pipe", BIT(8), BIT(11), false),
+	[RK3568_PD_NPU]		= DOMAIN_RK3568("npu",  BIT(1), BIT(2),  false, true),
+	[RK3568_PD_GPU]		= DOMAIN_RK3568("gpu",  BIT(0), BIT(1),  false, false),
+	[RK3568_PD_VI]		= DOMAIN_RK3568("vi",   BIT(6), BIT(3),  false, false),
+	[RK3568_PD_VO]		= DOMAIN_RK3568("vo",   BIT(7), BIT(4),  false, false),
+	[RK3568_PD_RGA]		= DOMAIN_RK3568("rga",  BIT(5), BIT(5),  false, false),
+	[RK3568_PD_VPU]		= DOMAIN_RK3568("vpu",  BIT(2), BIT(6),  false, false),
+	[RK3568_PD_RKVDEC]	= DOMAIN_RK3568("vdec", BIT(4), BIT(8),  false, false),
+	[RK3568_PD_RKVENC]	= DOMAIN_RK3568("venc", BIT(3), BIT(7),  false, false),
+	[RK3568_PD_PIPE]	= DOMAIN_RK3568("pipe", BIT(8), BIT(11), false, false),
 };
 
 static const struct rockchip_domain_info rk3576_pm_domains[] = {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  2026-06-13  7:01 ` [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU MidG971
@ 2026-06-13  7:09   ` sashiko-bot
  2026-06-13  8:18   ` Jonas Karlman
  1 sibling, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-13  7:09 UTC (permalink / raw)
  To: MidG971; +Cc: robh, conor+dt, devicetree, dri-devel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Incorrect IOMMU compatible string 'rockchip,iommu' is used for the RK3568 NPU IOMMU, which will cause translation faults.
- [High] The assigned clock rates for the NPU AXI bus and compute core are swapped, severely overclocking the AXI bus.
--

commit a304190b5b1e8ffa602c7c5d7dbdfed3a2c3dcae
Author: Midgy BALON <midgy971@gmail.com>
Subject: arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU

This patch adds the device tree nodes for the NVDLA-derived NPU and its
associated IOMMU on the RK3568 SoC. It configures the clocks, resets, power
domains, and assigns the PMU syscon to manage the NPU NoC bus-idle. The
configuration also intends to bring up the NPU AXI bus clock at 200 MHz.

> diff --git a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> index 64bdd8b7754b5..313db59c1aed8 100644
> --- a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
[ ... ]
> @@ -572,6 +579,37 @@ power-domain@RK3568_PD_RKVENC {
>  		};
>  	};
>  
> +	rknn_core_0: npu@fde40000 {
> +		compatible = "rockchip,rk3568-rknn-core";
> +		reg = <0x0 0xfde40000 0x0 0x1000>,
> +		      <0x0 0xfde41000 0x0 0x1000>,
> +		      <0x0 0xfde43000 0x0 0x1000>;
> +		reg-names = "pc", "cna", "core";
> +		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
> +		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>,
> +			 <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_PRE>;
> +		clock-names = "aclk", "hclk", "npu", "pclk";
> +		assigned-clocks = <&scmi_clk SCMI_CLK_NPU>, <&cru CLK_NPU>;
> +		assigned-clock-rates = <200000000>, <600000000>;

[Severity: High]
Will this configuration inadvertently overclock the AXI bus?

The commit message states the intent to assign the CRU CLK_NPU so the NPU
AXI bus clock comes up at 200 MHz. However, based on the positional mapping
of assigned-clocks to assigned-clock-rates, this assigns 200 MHz to the SCMI
compute core and 600 MHz to CLK_NPU.

Should the values in assigned-clock-rates be swapped to match the order
in assigned-clocks to prevent bus hangs?

> +		resets = <&cru SRST_A_NPU>, <&cru SRST_H_NPU>;
> +		reset-names = "srst_a", "srst_h";
> +		power-domains = <&power RK3568_PD_NPU>;
> +		rockchip,pmu = <&pmu>;
> +		iommus = <&rknn_mmu_0>;
> +		status = "disabled";
> +	};
> +
> +	rknn_mmu_0: iommu@fde4b000 {
> +		compatible = "rockchip,iommu";

[Severity: High]
Does this compatible string correctly identify the v2 IOMMU hardware
present on the RK3568?

Using rockchip,iommu binds to the v1 hardware operations in the Rockchip
IOMMU driver. Because the RK3568 features a v2 IOMMU, configuring it with
v1 driver operations could result in incorrect page table programming and
translation faults when the NPU attempts DMA.

Other IOMMU nodes in rk356x-base.dtsi use rockchip,rk3568-iommu. Should
this node use that compatible string instead?

> +		reg = <0x0 0xfde4b000 0x0 0x40>;
> +		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
> +		clock-names = "aclk", "iface";

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=7

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data
  2026-06-13  7:01 ` [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data MidG971
@ 2026-06-13  7:09   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-13  7:09 UTC (permalink / raw)
  To: MidG971; +Cc: conor+dt, dri-devel, devicetree, robh

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [High] The bounds check for the 'cores' array uses the current node's match data rather than the allocated capacity, allowing an out-of-bounds write if the device tree has mixed compatible strings.

Pre-existing issues:
- [High] If `rocket_device_init()` fails, the global `rdev` pointer is left containing an `ERR_PTR`, causing a kernel panic on subsequent core probes.
- [High] Global DRM device and core arrays are allocated using `devm_` helpers tied to the first probed core's lifecycle, causing a Use-After-Free when that core is unbound.
- [High] Removing and re-probing NPU cores out of order corrupts the 'cores' array because 'num_cores' is incorrectly used as both an active count and the array index.
- [High] Missing locking around the global `rdev` initialization and `rdev->num_cores` access allows concurrent probes to corrupt device tracking and leak allocations.
- [High] The DRM device is registered before its hardware cores are fully initialized, allowing userspace to trigger a NULL pointer dereference.
--

commit a4289515ae9af5aaa001a280a8b52c1288f26e24
Author: Midgy BALON <midgy971@gmail.com>
One line subject: accel: rocket: Derive DMA width and core count from match data

The probe already has the per-SoC match data, which now records the core
count and DMA width. This patch uses it for the cores array allocation and
the device DMA mask instead of re-scanning the device tree for available
core nodes. It also rejects a device tree that declares more NPU core nodes
than the SoC has to prevent overrun.

> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> index 46e6ee1e72c5f..6186f4faa3a2a 100644
> --- a/drivers/accel/rocket/rocket_device.c
> +++ b/drivers/accel/rocket/rocket_device.c
> @@ -6,18 +6,16 @@
>  #include <linux/clk.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/platform_device.h>
> -#include <linux/of.h>
>  
>  #include "rocket_device.h"
>  
>  struct rocket_device *rocket_device_init(struct platform_device *pdev,
> -					 const struct drm_driver *rocket_drm_driver)
> +					 const struct drm_driver *rocket_drm_driver,
> +					 const struct rocket_soc_data *soc_data)
>  {
>  	struct device *dev = &pdev->dev;
> -	struct device_node *core_node;
>  	struct rocket_device *rdev;
>  	struct drm_device *ddev;
> -	unsigned int num_cores = 0;
>  	int err;
>  
>  	rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
> @@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
>  	ddev = &rdev->ddev;
>  	dev_set_drvdata(dev, rdev);
>  
> -	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> -		if (of_device_is_available(core_node))
> -			num_cores++;
> -
> -	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> +	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
> +				   GFP_KERNEL);

[Severity: High]
This is a pre-existing issue, but does using devres tied to the first
probed core's lifecycle cause a use-after-free when that core is unbound?

If userspace unbinds this first core via sysfs, rocket_remove() decrements
num_cores but does not explicitly clean up the DRM device because other
cores are still bound. Upon returning from remove, the driver core
automatically frees all devres tied to the first core's device. This
would destroy the global rdev memory, causing a use-after-free when any
remaining cores attempt to access it.

[Severity: High]
This is a pre-existing issue, but does registering the DRM device
before its hardware cores are fully initialized allow userspace to
trigger a NULL pointer dereference?

In rocket_device_init(), the driver calls drm_dev_register() to expose
the character device to userspace. This occurs before the caller,
rocket_probe(), populates rdev->cores[0].dev and initializes the hardware.

If userspace opens the device during this small timing window,
rocket_open() is called and invokes:

    rocket_iommu_domain_create(rdev->cores[0].dev);

Since the device pointer is still NULL, this would attempt to dereference
a NULL device pointer in the IOMMU subsystem, crashing the system.

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 384c38e13acce..f0beed2d522c7 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -159,11 +159,15 @@ static const struct drm_driver rocket_drm_driver = {
>  
>  static int rocket_probe(struct platform_device *pdev)
>  {
> +	const struct rocket_soc_data *soc_data = of_device_get_match_data(&pdev->dev);
>  	int ret;
>  
> +	if (!soc_data)
> +		return -EINVAL;
> +
>  	if (rdev == NULL) {

[Severity: High]
This is a pre-existing issue, but does the lack of locking around the global
rdev initialization allow concurrent probes to corrupt device tracking
and leak memory?

If concurrent probes occur (for example, via asynchronous probing or
concurrent sysfs binds), multiple threads can pass this if check,
causing multiple DRM device allocations. They can also read the same
num_cores value, leading both threads to overwrite the same index in the
rdev->cores array.

>  		/* First core probing, initialize DRM device. */
> -		rdev = rocket_device_init(drm_dev, &rocket_drm_driver);
> +		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
>  		if (IS_ERR(rdev)) {
>  			dev_err(&pdev->dev, "failed to initialize rocket device\n");
>  			return PTR_ERR(rdev);

[Severity: High]
This is a pre-existing issue, but does this error path leave the global
rdev pointer containing an ERR_PTR?

If rocket_device_init() fails, it returns an error pointer which is
assigned to the global rdev. The function then returns the error but fails
to reset rdev to NULL.

When the next core probes, the (rdev == NULL) check is bypassed, and the
driver immediately dereferences the error pointer when reading
rdev->num_cores:

    unsigned int core = rdev->num_cores;

Could this lead to a reproducible kernel panic?

> @@ -172,6 +176,12 @@ static int rocket_probe(struct platform_device *pdev)
>  
>  	unsigned int core = rdev->num_cores;
>  
> +	if (core >= soc_data->num_cores) {

[Severity: High]
Does this bounds check use the current node's capacity rather than the
array's actual allocated size?

The array rdev->cores is permanently allocated using soc_data->num_cores
from the first probed device in rocket_device_init(). However, this bounds
check evaluates against the soc_data of the current probing device.

If a device tree or overlay declares a node with a smaller num_cores
followed by one with a larger num_cores, the check will pass for an index
that exceeds the initial allocation size. Will this cause a direct
out-of-bounds write into the rdev->cores array, corrupting kernel memory?

> +		dev_err(&pdev->dev, "too many NPU core nodes (max %u)\n",
> +			soc_data->num_cores);
> +		return -EINVAL;
> +	}
> +
>  	dev_set_drvdata(&pdev->dev, rdev);
>  
>  	rdev->cores[core].rdev = rdev;

[Severity: High]
This is a pre-existing issue, but does removing and re-probing NPU cores
out of order corrupt the cores array?

When a core is removed, rocket_remove() decrements rdev->num_cores
without shifting the remaining cores in the array to close the gap:

    rdev->num_cores--;

If cores are removed out of order, active cores are left at indices equal
to or greater than the new num_cores. The driver later uses num_cores as its
search bound, making those trailing active cores unreachable.

Furthermore, if a new core is probed, it assigns itself to the index
rdev->num_cores, blindly overwriting a still-active core:

    rdev->cores[core].rdev = rdev;

Could this cause state corruption and resource leaks?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data
  2026-06-13  7:01 ` [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data MidG971
@ 2026-06-13  7:10   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-13  7:10 UTC (permalink / raw)
  To: MidG971; +Cc: robh, conor+dt, dri-devel, devicetree

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

Pre-existing issues:
- [High] The shared IRQ handler accesses hardware registers without checking if the device is active.
- [High] The runtime suspend callback disables hardware clocks without synchronizing the shared IRQ.
- [High] The global `rdev` pointer is not reset to NULL when initialization fails, causing a kernel panic on subsequent probes.
- [High] Hardware registers are read during probe without ensuring the device resets are explicitly deasserted.
--

commit e068c6d40922642be56ef8ce61fd76a8d5727d7c
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Introduce per-SoC rocket_soc_data

Add a per-SoC data structure carried in the OF match table, currently
holding only the NPU AXI address width, and use it for the per-core DMA
mask instead of a hardcoded 40-bit value.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index b3b2fa9ba645a..09c445af7de73 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
> @@ -21,6 +22,10 @@ int rocket_core_init(struct rocket_core *core)
>  	u32 version;
>  	int err = 0;
>  
> +	core->soc_data = of_device_get_match_data(dev);
> +	if (!core->soc_data)
> +		return dev_err_probe(dev, -EINVAL, "missing SoC match data\n");
> +
>  	core->resets[0].id = "srst_a";
>  	core->resets[1].id = "srst_h";
>  	err = devm_reset_control_bulk_get_exclusive(&pdev->dev, ARRAY_SIZE(core->resets),

[Severity: High]
This isn't a bug introduced by this patch, but I noticed hardware registers are
read during probe without ensuring the device resets are explicitly deasserted.

Later in rocket_core_init():

	version = rocket_pc_readl(core, VERSION);

If the bootloader leaves the NPU in an asserted reset state, attempting to
read its registers will cause a synchronous external abort. Would it make
sense to deassert the resets before accessing registers here?

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 8bbbce594883e..384c38e13acce 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
[ ... ]
> @@ -213,8 +213,12 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }

[Severity: High]
This is a pre-existing issue, but I noticed the global rdev pointer is not
reset to NULL when initialization fails in rocket_probe() in this file:

	rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
	if (IS_ERR(rdev)) {
		...
		return PTR_ERR(rdev);
	}

If this fails during the first core's probe, an error pointer is assigned to
the global rdev pointer without being cleared. On the next core's probe, the
check for NULL will fail and it will blindly dereference the error pointer.
Could this be addressed by clearing the global pointer on the error path?

[Severity: High]
This is a pre-existing issue, but the runtime suspend callback in this driver
disables hardware clocks without synchronizing the shared IRQ:

rocket_device_runtime_suspend() {
	...
	clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), ...);
	...
}

Should this call synchronize_irq() first to avoid a race condition where the
IRQ handler might be executing mid-flight on another CPU while the hardware
is powered down?

[Severity: High]
This is a pre-existing issue, but in drivers/accel/rocket/rocket_job.c, the
shared IRQ handler accesses hardware registers without checking if the device
is active:

rocket_job_irq_handler() {
	...
	u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
	...
}

Because the IRQ is registered with IRQF_SHARED, if the handler is invoked by
another device while the NPU is runtime suspended, reading the registers can
result in invalid data (~0u) and spurious interrupts. Should it use
pm_runtime_get_if_active() before reading the status register?

[ ... ]

>  static const struct of_device_id dt_match[] = {
> -	{ .compatible = "rockchip,rk3588-rknn-core" },
> +	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
>  	{}
>  };
>  MODULE_DEVICE_TABLE(of, dt_match);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  2026-06-13  7:01 ` [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 MidG971
@ 2026-06-13  7:11   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-13  7:11 UTC (permalink / raw)
  To: MidG971; +Cc: conor+dt, robh, devicetree, dri-devel

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Double free of `rjob->tasks` leading to memory corruption triggerable from userspace.
- [Critical] Unconditional dereference of a NULL `job->domain` triggering a kernel panic on early job submission failure.
- [High] Permanent PM runtime reference count leak when `iommu_attach_group()` fails.
- [High] Zero-initialized clock IDs cause the framework to fetch the same clock repeatedly, violating hardware constraints.
- [Medium] Premature DRM device registration exposes an incomplete state to userspace.
--

commit d0d76b931ac773e3e176353cdb3ac9c590c43dc3
Author: Midgy BALON <midgy971@gmail.com>

dt-bindings: npu: rockchip, rk3588-rknn-core: Add RK3568

The RK3568 carries a single core of the same NVDLA-derived NPU IP as the
RK3588.  Add its compatible.

> diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> index caca2a4903cd1..e0b948ac47d45 100644
> --- a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
[ ... ]

drivers/accel/rocket/rocket_device.c:rocket_device_init() {
    ...
    err = drm_dev_register(ddev, 0);
    ...
}

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
    ...
    rdev->cores[core].dev = &pdev->dev;
    ...
}

[Severity: Medium]
This is a pre-existing issue, but does this premature registration expose an
incomplete state to userspace?

drm_dev_register() is called in rocket_device_init() before rocket_probe()
assigns rdev->cores[0].dev.

As a result, the device node is briefly visible to userspace while its core
structures remain partially uninitialized.

If userspace attempts to open the DRM device during this window, rocket_open()
passes the NULL rdev->cores[0].dev to rocket_iommu_domain_create().

While the IOMMU framework safely catches the NULL device and returns -ENODEV,
this still causes a spurious open() failure for userspace applications relying
on udev events.

drivers/accel/rocket/rocket_job.c:rocket_job_run() {
    ...
	ret = pm_runtime_get_sync(core->dev);
	if (ret < 0)
		return fence;
    ...
		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
		if (ret < 0)
			return fence;
    ...
}

[Severity: High]
This is a pre-existing issue, but does this cause a permanent PM runtime
reference count leak when iommu_attach_group() fails?

In rocket_job_run(), pm_runtime_get_sync() increments the PM usage count to wake
up the NPU.

If the subsequent iommu_attach_group() fails, the driver returns the allocated
fence and exits early without calling pm_runtime_put().

Because the driver exits before assigning core->in_flight_job, the subsequent
timeout handler rocket_reset() will also skip the manual
pm_runtime_put_noidle().

This causes a permanent PM runtime reference count leak, preventing the NPU
hardware from ever entering runtime suspend.

drivers/accel/rocket/rocket_core.c:rocket_core_init() {
    ...
    err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
    ...
}

[Severity: High]
This is a pre-existing issue, but do these zero-initialized clock IDs cause the
framework to fetch the same clock repeatedly?

In rocket_core_init(), the driver declares an array of clk_bulk_data but never
initializes the .id fields, leaving them implicitly NULL.

When devm_clk_bulk_get() is called, the clock framework handles the NULL IDs by
returning the first clock defined in the device tree for the device on all four
iterations.

Consequently, rocket_device_runtime_resume() enables the exact same clock four
times, while the remaining required clocks are left disabled.

This violates physical hardware state requirements and causes unclocked NPU
register accesses.

drivers/accel/rocket/rocket_job.c:rocket_copy_tasks() {
    ...
fail:
	kvfree(rjob->tasks);
	return ret;
}

drivers/accel/rocket/rocket_job.c:rocket_job_cleanup() {
    ...
	kvfree(job->tasks);
    ...
}

[Severity: Critical]
This is a pre-existing issue, but does this lead to a double free of rjob->tasks
that can be triggered from userspace?

In rocket_copy_tasks(), if copy_from_user() fails, kvfree(rjob->tasks) is
called, but the pointer rjob->tasks is not cleared.

The function returns an error to rocket_ioctl_submit_job(), which jumps to
out_cleanup_job and invokes rocket_job_put().

When the reference count drops, rocket_job_cleanup() is executed,
unconditionally calling kvfree(job->tasks) again on the already freed memory
block.

Since the IOCTL input is fully controlled by userspace, an unprivileged attacker
can trigger a double free by submitting an invalid task pointer.

drivers/accel/rocket/rocket_job.c:rocket_ioctl_submit_job() {
    ...
	rjob = kzalloc_obj(*rjob);
    ...
	ret = rocket_copy_tasks(dev, file, job, rjob);
	if (ret)
		goto out_cleanup_job;
    ...
}

drivers/accel/rocket/rocket_drv.c:rocket_iommu_domain_put() {
    ...
    kref_put(&domain->kref, rocket_iommu_domain_destroy);
    ...
}

[Severity: Critical]
This is a pre-existing issue, but does this unconditionally dereference a NULL
job->domain, triggering a kernel panic on early job submission failure?

In rocket_ioctl_submit_job(), rjob is allocated via kzalloc_obj(), leaving
rjob->domain initialized to NULL.

If a failure occurs before the domain is assigned, such as an invalid task
pointer in rocket_copy_tasks() or a bad BO handle in drm_gem_objects_lookup(),
the driver jumps to the cleanup path.

The cleanup handler, rocket_job_cleanup(), unconditionally passes the NULL
domain to rocket_iommu_domain_put(), which invokes kref_put(&domain->kref, ...).

Computing &domain->kref yields an offset address, and refcount_dec_and_test()
immediately dereferences this invalid address, resulting in a kernel panic that
can be triggered by unprivileged userspace.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=6

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support
  2026-06-13  7:01 ` [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support MidG971
@ 2026-06-13  7:11   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-13  7:11 UTC (permalink / raw)
  To: MidG971; +Cc: conor+dt, robh, dri-devel, devicetree

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] `pm_runtime_put_sync()` fails to suspend the device synchronously due to active autosuspend, leaking runtime PM state when `noc_init` fails.
- [High] `rk3568_noc_init()` ignores `clk_set_rate()` failures and incorrectly uses short-circuiting logic, allowing initialization to proceed with an inactive compute clock.
- [High] The RK3568 NPU compute clock (PVTPLL) is not restarted during runtime resume, leading to a hardware hang.

Pre-existing issues:
- [High] Uninitialized bulk clock IDs cause the driver to silently fail at managing the NPU's actual clocks, fetching the identical default clock repeatedly.
- [High] `devm_drm_dev_alloc()` memory is leaked on probe deferral because it is allocated on a global facade device.
--

commit 18b5e0abd4d8cdab9122360b8a9bd630b4573645
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Add RK3568 SoC support

The RK3568 has a single core of the same NVDLA-derived NPU IP as the
RK3588, with a 32-bit AXI master.  Add rk3568_soc_data and its
compatible.

Unlike the RK3588, the RK3568 NPU's compute clock is a PVTPLL managed by
TF-A via SCMI; start it from an noc_init callback with a real rate change
(an intermediate rate defeats the clock framework's unchanged-rate
shortcut).  Powering on and de-idling the NPU NoC are left to the power
domain (genpd), which performs them when the IOMMU supplier is resumed,
so the driver does not poke the PMU directly.

If noc_init fails, unwind through rocket_core_fini() so the core is torn
down completely rather than leaking the runtime-PM and IOMMU state.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index 09c445af7de73..779e951596a15 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
> @@ -88,6 +88,15 @@ int rocket_core_init(struct rocket_core *core)
>  		return err;
>  	}
>  
> +	if (core->soc_data->noc_init) {
> +		err = core->soc_data->noc_init(core);
> +		if (err) {
> +			pm_runtime_put_sync(dev);

[Severity: High]
Does this error path leak the runtime PM state?

Since rocket_core_init() enables autosuspend via pm_runtime_use_autosuspend(),
calling pm_runtime_put_sync() behaves asynchronously and schedules an
autosuspend timer instead of suspending immediately.

Then rocket_core_fini() calls pm_runtime_disable() which cancels pending
timers without suspending the device. Does this leave the device permanently
in the RPM_ACTIVE state and keep the power domain powered on?

> +			rocket_core_fini(core);
> +			return err;
> +		}
> +	}
> +
>  	version = rocket_pc_readl(core, VERSION);
>  	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index f0beed2d522c7..86484110ad6f0 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c

[ ... ]

> @@ -223,12 +224,42 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }
>  
> +/*
> + * The NPU compute clock is a PVTPLL managed by TF-A via SCMI; spin it up
> + * with a real rate change (an intermediate rate defeats the clock
> + * framework's unchanged-rate shortcut).  Powering on and de-idling the NPU
> + * NoC are handled by the power domain (genpd) before the NPU is accessed.
> + */
> +static int rk3568_noc_init(struct rocket_core *core)
> +{
> +	struct clk *npu_clk;
> +
> +	npu_clk = of_clk_get_by_name(core->dev->of_node, "npu");
> +	if (IS_ERR(npu_clk))
> +		return dev_err_probe(core->dev, PTR_ERR(npu_clk),
> +				     "failed to get the NPU SCMI clock\n");
> +
> +	if (clk_set_rate(npu_clk, 600000000UL) ||
> +	    clk_set_rate(npu_clk, 1000000000UL))

[Severity: High]
Does this logical OR prevent the second rate change from executing if the
first one fails?

If clk_set_rate() fails for the first call, the short-circuiting logic will
prevent the second clk_set_rate() from running.

> +		dev_warn(core->dev, "failed to set the NPU compute clock rate\n");
> +	clk_put(npu_clk);
> +
> +	return 0;

[Severity: High]
Should this return an error code if the clock rate fails to set?

Returning 0 here allows initialization to proceed even if the compute clock
is inactive. When rocket_core_init() subsequently reads the NPU VERSION
register over the AXI bus, will this cause a system hang due to the
inactive clock?

> +}

[Severity: High]
This isn't a bug introduced by this patch, but does rocket_core_init() fail
to properly initialize the bulk clock IDs?

When rdev->cores is allocated with devm_kcalloc(), the core->clks array is
initialized with NULL pointers for the .id fields. 

When rocket_core_init() later fetches the clocks:

	err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);

With NULL .id strings, clk_get() will repeatedly fetch the first clock listed
in the device tree's clocks property for all elements. Does this leave the
other NPU clocks unmanaged and potentially disabled, leading to AXI bus
hangs?

[Severity: High]
This isn't a bug introduced by this patch, but is there a memory leak in the
probe deferral path for devm_drm_dev_alloc()?

In rocket_device_init(), rdev is allocated on the global drm_dev:

    rdev = devm_drm_dev_alloc(drm_dev->dev, ...);

If rocket_core_init() fails (e.g. returning -EPROBE_DEFER), rocket_probe()
clears the rdev pointer but the devm resources are tied to the static global
drm_dev which is never removed. 

When probe is retried, does this result in allocating a new struct
rocket_device and leaking the previous allocation permanently?

[Severity: High]
Does the driver fail to restart the NPU compute clock during runtime resume?

During runtime suspend, the NPU power domain (DOMAIN_RK3568) is powered off
and the PVTPLL loses state. 

On runtime resume, rocket_device_runtime_resume() enables the bulk clocks:

    err = clk_bulk_prepare_enable(ARRAY_SIZE(rdev->cores[core].clks),
                                  rdev->cores[core].clks);

But it doesn't re-apply the rate-change sequence used in rk3568_noc_init() to
force TF-A to spin the clock up. Will this cause the hardware to hang on the
next NPU access because the compute clock remains off?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 5/9] accel: rocket: Keep the IOMMU domain attached across jobs
  2026-06-13  7:01 ` [RFC PATCH v4 5/9] accel: rocket: Keep the IOMMU domain attached across jobs MidG971
@ 2026-06-13  7:12   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-13  7:12 UTC (permalink / raw)
  To: MidG971; +Cc: conor+dt, dri-devel, devicetree, robh

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] rocket_core_fini() tears down the IOMMU domain without quiescing the hardware, causing IOMMU faults and wedged hardware.

Pre-existing issues:
- [High] Error paths in rocket_job_run() leak the PM runtime usage counter and return an unsignaled fence, causing permanent power drain and spurious timeouts.
- [High] Job timeout recovery leaves the NPU permanently powered on due to pm_runtime_put_noidle().
--

commit 629253998abe2060bd2b43f469323132b5743d36
Author: Midgy BALON <midgy971@gmail.com>
Subject: accel: rocket: Keep the IOMMU domain attached across jobs

This commit attaches the IOMMU domain once per context instead of toggling
it on every job to avoid hardware stall timeouts. The domain is released on
context swap, core reset, or core teardown.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index 779e951596a15..6c128f585cff4 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
[ ... ]
> @@ -112,9 +113,20 @@ void rocket_core_fini(struct rocket_core *core)
>  {
>  	pm_runtime_dont_use_autosuspend(core->dev);
>  	pm_runtime_disable(core->dev);
> +
> +	/*
> +	 * Stop the scheduler before tearing down the IOMMU so an in-flight
> +	 * job can no longer touch the (about to be detached) domain.
> +	 */
> +	rocket_job_fini(core);
> +
> +	if (core->attached_domain) {
> +		iommu_detach_group(NULL, core->iommu_group);

[Severity: High]
Does this detach the IOMMU page tables while the hardware is still active?

Calling drm_sched_fini() inside rocket_job_fini() only stops the software
scheduling thread, but it doesn't wait for the currently executing job on
the hardware to complete or reset the AXI master. Detaching the group while
the NPU is still executing could cause memory translation faults and wedge
the hardware.

Should the hardware be explicitly quiesced, for example by calling
rocket_core_reset(), before the IOMMU is detached here?

> +		rocket_iommu_domain_put(core->attached_domain);
> +		core->attached_domain = NULL;
> +	}
>  	iommu_group_put(core->iommu_group);
>  	core->iommu_group = NULL;
> -	rocket_job_fini(core);
>  }
>  
>  void rocket_core_reset(struct rocket_core *core)
> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index e25234261536b..368b2ebead1b3 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
[ ... ]
> @@ -314,9 +315,26 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
>  	if (ret < 0)
>  		return fence;
>  
> -	ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> -	if (ret < 0)
> -		return fence;

[Severity: High]
This isn't a bug introduced by this patch, but does the pm_runtime_get_sync
error path just above this leak the PM runtime usage counter? 

In rocket_job_run(), the code does:

	ret = pm_runtime_get_sync(core->dev);
	if (ret < 0)
		return fence;

If pm_runtime_get_sync() fails, returning without calling
pm_runtime_put_noidle() leaves the counter elevated.

> +	/*
> +	 * Attach the job's IOMMU domain only when it differs from the one
> +	 * already attached. Re-attaching per job toggles the rk_iommu
> +	 * stall/reset handshake on an idle NPU MMU, which is slow and
> +	 * noisy; keep the domain attached across jobs instead.
> +	 */
> +	if (core->attached_domain != job->domain) {
> +		if (core->attached_domain) {
> +			iommu_detach_group(NULL, core->iommu_group);
> +			rocket_iommu_domain_put(core->attached_domain);
> +			core->attached_domain = NULL;
> +		}
> +
> +		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> +		if (ret < 0)
> +			return fence;

[Severity: High]
This is a pre-existing issue, but does this error path leak the PM runtime
reference acquired earlier in this function? 

Since the in_flight_job isn't set yet when returning the fence here, the
subsequent timeout handler won't clean this up, preventing the device from
ever entering low power states.

> +
> +		kref_get(&job->domain->kref);
> +		core->attached_domain = job->domain;
> +	}
>  
>  	scoped_guard(mutex, &core->job_lock) {
>  		core->in_flight_job = job;
[ ... ]
> @@ -376,7 +393,15 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)

[Severity: High]
This isn't a bug introduced by this patch, but does job timeout recovery
leave the NPU permanently powered on?

Earlier in rocket_reset(), the usage counter is decremented:

	scoped_guard(mutex, &core->job_lock) {
		if (core->in_flight_job)
			pm_runtime_put_noidle(core->dev);

		core->in_flight_job = NULL;
	}

Because pm_runtime_put_noidle() does not invoke an idle check or queue an
auto-suspend, doesn't the device remain in the RPM_ACTIVE state and stay
physically powered on indefinitely? Should pm_runtime_put_autosuspend() be
used here instead?

>  	 */
>  	rocket_core_reset(core);
>  
> -	iommu_detach_group(NULL, core->iommu_group);
> +	/*
> +	 * The reset wipes the IOMMU page-table base, so drop the attached
> +	 * domain to force the next job to re-attach and reprogram it.
> +	 */
> +	if (core->attached_domain) {
> +		iommu_detach_group(NULL, core->iommu_group);
> +		rocket_iommu_domain_put(core->attached_domain);
> +		core->attached_domain = NULL;
> +	}
>  
>  	/* NPU has been reset, we can clear the reset pending bit. */
>  	atomic_set(&core->reset.pending, 0);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  2026-06-13  7:01 ` [RFC PATCH v4 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout MidG971
@ 2026-06-13  7:15   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-13  7:15 UTC (permalink / raw)
  To: MidG971; +Cc: dri-devel, conor+dt, robh, devicetree

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Double free of `rjob->tasks` when `rocket_copy_tasks` fails.
- [High] The shared IRQ handler accesses hardware registers without checking if the device is active via pm_runtime_get_if_active().
- [High] The runtime suspend callback powers down the hardware without synchronizing the shared IRQ.
- [High] Timeout recovery drops the PM usage count early via `pm_runtime_put_noidle()`, stranding the device in an active power state and introducing a race condition.
- [High] PM runtime usage counter leaks on error paths during job submission.
- [High] Driver removal cancels pending PM transitions without synchronous suspend.
- [High] Missing NULL check on `iommu_group_get()` leads to crash if device lacks an IOMMU.
--

commit 54a038a17307496e6d47b23eab270cebaf07629e
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Reset the NPU before detaching the IOMMU on timeout

This commit fixes a timeout issue where detaching the IOMMU before resetting 
the wedged NPU AXI master causes a stall. By asserting the core reset first, 
the AXI master is quiesced and the IOMMU detach completes cleanly.

> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index ac51bff39833f..e25234261536b 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
> @@ -364,14 +364,20 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
>  		if (core->in_flight_job)
>  			pm_runtime_put_noidle(core->dev);

[Severity: High]
This is a pre-existing issue, but does using pm_runtime_put_noidle() here
leave the device stranded in an active power state?

Without restarting the autosuspend timer or scheduling an idle callback, the
usage counter is dropped while leaving the device active. 

Could this also create a race condition if an idle check is triggered
concurrently (such as via a sysfs write), causing the device to begin
suspending while the subsequent rocket_core_reset() is toggling reset lines?

>  
> -		iommu_detach_group(NULL, core->iommu_group);
> -
>  		core->in_flight_job = NULL;
>  	}
>  
> -	/* Proceed with reset now. */
> +	/*
> +	 * Reset the NPU hardware before detaching the IOMMU. A timed-out job
> +	 * leaves the NPU AXI master wedged; detaching the IOMMU then issues a
> +	 * stall request that never drains and times out (warning in the IOMMU
> +	 * core). Asserting the core reset first quiesces the master so the
> +	 * detach completes cleanly.
> +	 */
>  	rocket_core_reset(core);
>  
> +	iommu_detach_group(NULL, core->iommu_group);
> +
>  	/* NPU has been reset, we can clear the reset pending bit. */
>  	atomic_set(&core->reset.pending, 0);
>  

[Severity: High]
This isn't a bug introduced by this patch, but looking at the shared IRQ
handler:

drivers/accel/rocket/rocket_job.c:rocket_job_irq_handler() {
    ...
    u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
    ...
}

Is it possible that this accesses hardware registers without verifying if the
device is active?

If the NPU is suspended and another device triggers the shared interrupt line,
could this read from powered-off hardware and return invalid data
(like 0xffffffff)?

Might this be misinterpreted as a valid interrupt, leading to spurious
threaded IRQ execution and potential bus crashes?

Is it necessary to use pm_runtime_get_if_active() to verify the power state
first?

[Severity: High]
This isn't a bug introduced by this patch, but does the runtime suspend
callback power down hardware without synchronizing the shared IRQ?

drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend() {
    ...
    clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks),
                               rdev->cores[core].clks);
    ...
}

Since the driver uses a shared IRQ (IRQF_SHARED), could the handler execute 
mid-flight on another CPU while the hardware is actively being shut down?

Should this call synchronize_irq() before disabling hardware clocks to prevent
bus errors or system crashes?

[Severity: High]
This is a pre-existing issue, but does rocket_job_run() leak the PM usage
counter on error paths?

drivers/accel/rocket/rocket_job.c:rocket_job_run() {
    ...
    ret = pm_runtime_get_sync(core->dev);
    if (ret < 0)
        return fence;
    ...
    ret = iommu_attach_group(job->domain->domain, core->iommu_group);
    if (ret < 0)
        return fence;
    ...
}

If either pm_runtime_get_sync() or iommu_attach_group() fails, would returning
an unsignaled fence early without balancing the usage counter via
pm_runtime_put_noidle() permanently leak the PM reference?

Might this prevent the device from ever entering runtime suspend, while also
leaving the DRM scheduler stalled on a dead fence until it times out?

[Severity: High]
This is a pre-existing issue, but does driver removal cancel pending PM
transitions without a synchronous suspend?

drivers/accel/rocket/rocket_core.c:rocket_core_fini() {
    ...
    pm_runtime_disable(core->dev);
    ...
}

Because pm_runtime_disable() cancels pending asynchronous transitions (such as
the autosuspend timer), wouldn't the NPU hardware clocks be left running
indefinitely after the driver is removed and the device is unbound?

Is a call to pm_runtime_put_sync() needed before disabling runtime PM?

[Severity: High]
This isn't a bug introduced by this patch, but is there a missing NULL check 
when fetching the IOMMU group?

drivers/accel/rocket/rocket_core.c:rocket_core_init() {
    ...
    core->iommu_group = iommu_group_get(dev);
    ...
}

If the device lacks an IOMMU (for example, if it is disabled by the user or
failed to probe), will iommu_group_get() return NULL?

If so, when rocket_job_run() later executes:

drivers/accel/rocket/rocket_job.c:rocket_job_run() {
    ...
    ret = iommu_attach_group(job->domain->domain, core->iommu_group);
    ...
}

Would this unconditionally dereference the group pointer and cause a kernel
panic on the first job submission?

[Severity: Critical]
This is a pre-existing issue, but does a failure in rocket_copy_tasks() lead 
to a double free?

drivers/accel/rocket/rocket_job.c:rocket_copy_tasks() {
    ...
fail:
    kvfree(rjob->tasks);
    return ret;
}

If this fails (such as from an invalid pointer during copy_from_user()), the
tasks array is freed, but is rjob->tasks left as a dangling pointer?

When the caller jumps to out_cleanup_job and invokes rocket_job_put(rjob),
it reaches:

drivers/accel/rocket/rocket_job.c:rocket_job_cleanup() {
    ...
    kvfree(job->tasks);
    ...
}

Will this result in freeing the dangling pointer a second time?

Can this double free corrupt kernel memory or cause a panic, potentially
being exploitable by an unprivileged user?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 9/9] pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain
  2026-06-13  7:01 ` [RFC PATCH v4 9/9] pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain MidG971
@ 2026-06-13  7:21   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-13  7:21 UTC (permalink / raw)
  To: MidG971; +Cc: conor+dt, devicetree, dri-devel, robh

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Unhandled error from `rockchip_pd_power()` during probe leaves the power domain in an inconsistent state, causing `genpd` to wrongly assume the domain is ON.
--

commit 08758fdfbdda5f83d100c68fb3b2bbe8fd467ea7
Author: Midgy BALON <midgy971@gmail.com>

pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain

This patch gives DOMAIN_RK3568 a regulator parameter, allowing the NPU
domain to set need_regulator. This enables genpd to manage the NPU rail
wired up as the domain's domain-supply instead of marking it always-on.

> diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
> index 490bbb1d1d8e8..19db307e3811d 100644
> --- a/drivers/pmdomain/rockchip/pm-domains.c
> +++ b/drivers/pmdomain/rockchip/pm-domains.c
[ ... ]
> @@ -1274,15 +1288,15 @@ static const struct rockchip_domain_info rk3562_pm_domains[] = {
>  };
>  
>  static const struct rockchip_domain_info rk3568_pm_domains[] = {
> -	[RK3568_PD_NPU]		= DOMAIN_RK3568("npu",  BIT(1), BIT(2),  false),
> -	[RK3568_PD_GPU]		= DOMAIN_RK3568("gpu",  BIT(0), BIT(1),  false),
> -	[RK3568_PD_VI]		= DOMAIN_RK3568("vi",   BIT(6), BIT(3),  false),
> -	[RK3568_PD_VO]		= DOMAIN_RK3568("vo",   BIT(7), BIT(4),  false),
> -	[RK3568_PD_RGA]		= DOMAIN_RK3568("rga",  BIT(5), BIT(5),  false),
> -	[RK3568_PD_VPU]		= DOMAIN_RK3568("vpu",  BIT(2), BIT(6),  false),
> -	[RK3568_PD_RKVDEC]	= DOMAIN_RK3568("vdec", BIT(4), BIT(8),  false),
> -	[RK3568_PD_RKVENC]	= DOMAIN_RK3568("venc", BIT(3), BIT(7),  false),
> -	[RK3568_PD_PIPE]	= DOMAIN_RK3568("pipe", BIT(8), BIT(11), false),
> +	[RK3568_PD_NPU]		= DOMAIN_RK3568("npu",  BIT(1), BIT(2),  false, true),

[Severity: High]
This is a pre-existing issue, but by setting need_regulator to true here,
does this expose the NPU domain to a potential probe failure and system
hang?

Looking at rockchip_pm_add_one_domain(), the return value of
rockchip_pd_power(pd, false) is currently ignored:

drivers/pmdomain/rockchip/pm-domains.c:rockchip_pm_add_one_domain() {
	...
	if (pd->info->need_regulator)
		rockchip_pd_power(pd, false);
	...
}

If the bootloader leaves the PMU MTCMOS register ON but the external
regulator is OFF, rockchip_pd_power() attempts to send an idle request.
Because the IP is physically unpowered, it cannot assert the ACK signal
and the request times out:

drivers/pmdomain/rockchip/pm-domains.c:rockchip_pd_power() {
	...
	ret = rockchip_pmu_set_idle_request(pd, true);
	if (ret < 0)
		goto out;
	...
}

Since the timeout error is ignored in rockchip_pm_add_one_domain(), the PMU
MTCMOS register remains ON, which causes pm_genpd_init() to incorrectly
initialize the domain as ON. When the consumer driver later requests power,
genpd assumes the domain is already active and skips enabling the regulator,
causing the driver to access an unpowered IP block.

Should the error handling in rockchip_pm_add_one_domain() be fixed to
properly synchronize the domain state before enabling this feature for the
NPU?

> +	[RK3568_PD_GPU]		= DOMAIN_RK3568("gpu",  BIT(0), BIT(1),  false, false),
> +	[RK3568_PD_VI]		= DOMAIN_RK3568("vi",   BIT(6), BIT(3),  false, false),
> +	[RK3568_PD_VO]		= DOMAIN_RK3568("vo",   BIT(7), BIT(4),  false, false),
> +	[RK3568_PD_RGA]		= DOMAIN_RK3568("rga",  BIT(5), BIT(5),  false, false),
> +	[RK3568_PD_VPU]		= DOMAIN_RK3568("vpu",  BIT(2), BIT(6),  false, false),
> +	[RK3568_PD_RKVDEC]	= DOMAIN_RK3568("vdec", BIT(4), BIT(8),  false, false),
> +	[RK3568_PD_RKVENC]	= DOMAIN_RK3568("venc", BIT(3), BIT(7),  false, false),
> +	[RK3568_PD_PIPE]	= DOMAIN_RK3568("pipe", BIT(8), BIT(11), false, false),
>  };
[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=9

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 8/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU
  2026-06-13  7:01 ` [RFC PATCH v4 8/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU MidG971
@ 2026-06-13  7:40   ` Jonas Karlman
  0 siblings, 0 replies; 20+ messages in thread
From: Jonas Karlman @ 2026-06-13  7:40 UTC (permalink / raw)
  To: MidG971
  Cc: tomeu@tomeuvizoso.net, ogabbay@kernel.org, heiko@sntech.de,
	robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
	ulf.hansson@linaro.org, dri-devel@lists.freedesktop.org,
	linux-rockchip@lists.infradead.org, devicetree@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	xxm@rock-chips.com, chaoyi.chen@rock-chips.com,
	finley.xiao@rock-chips.com, diederik@cknow-tech.com

Hi Midgy,

On 6/13/2026 9:01 AM, MidG971 wrote:
> From: Midgy BALON <midgy971@gmail.com>
> 
> Enable the NPU and its IOMMU on ROCK 3B and wire vdd_npu as the NPU
> power domain's domain-supply, so genpd brings the rail up and down with
> the domain (the domain is marked need_regulator). The PVTPLL compute
> clock is brought up later by the driver.
> 
> The rail is no longer kept always-on, so pin it to 1000 mV (the NPU's
> 1 GHz operating voltage; the driver runs a fixed compute rate with no
> devfreq voltage scaling) and mark it boot-on, so it is up before the
> power domain de-idles the NPU NoC at power-on.
> 
> Signed-off-by: Midgy BALON <midgy971@gmail.com>
> ---
>  .../arm64/boot/dts/rockchip/rk3568-rock-3b.dts | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts b/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
> index 69001e453732e..d3f9776c2bdc3 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
> @@ -330,9 +330,10 @@ regulator-state-mem {
>  
>  			vdd_npu: DCDC_REG4 {
>  				regulator-name = "vdd_npu";
> +				regulator-boot-on;

There is no need for the NPU in the bootloader, do not use DT as a
workaround for software issues.

This series mention the PVTPLL NPU clk and seem to contains some
workarounds related to how the PVTPLL clock is handled in TF-A.

The PVTPLL block typically require the pclk and power domain enabled to
function, and this series seem to add workarounds to try and ensure this,
e.g. with noc_init to activate PVTPLL usage.

I would suggest that you do not involve the PVTPLL clock in this initial
NPU support for RK3568, set CLK_NPU to 400 MHz and use it instead of the
SCMI clock, or keep SCMI clk rate less than or equal to 400 MHz to
disable PVTPLL_NEED mode in TF-A.

In a future series you can extend Linux with a proper PVTPLL clk driver
and OPP support for the rocket driver to correctly ensure pclk and pd is
enabled when a PVTPLL clock is managed.

>  				regulator-initial-mode = <0x2>;
> -				regulator-min-microvolt = <500000>;
> -				regulator-max-microvolt = <1350000>;
> +				regulator-min-microvolt = <1000000>;
> +				regulator-max-microvolt = <1000000>;

Please describe the HW, do not add workarounds for software issues or
shortcomings.

Regards,
Jonas

>  				regulator-ramp-delay = <6001>;
>  
>  				regulator-state-mem {
> @@ -787,3 +788,16 @@ vp0_out_hdmi: endpoint@ROCKCHIP_VOP2_EP_HDMI0 {
>  		remote-endpoint = <&hdmi_in_vp0>;
>  	};
>  };
> +
> +&pd_npu {
> +	domain-supply = <&vdd_npu>;
> +};
> +
> +&rknn_core_0 {
> +	npu-supply = <&vdd_npu>;
> +	status = "okay";
> +};
> +
> +&rknn_mmu_0 {
> +	status = "okay";
> +};


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  2026-06-13  7:01 ` [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU MidG971
  2026-06-13  7:09   ` sashiko-bot
@ 2026-06-13  8:18   ` Jonas Karlman
  1 sibling, 0 replies; 20+ messages in thread
From: Jonas Karlman @ 2026-06-13  8:18 UTC (permalink / raw)
  To: MidG971
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson,
	dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik

Hi Midgy,

On 6/13/2026 9:01 AM, MidG971 wrote:
> From: Midgy BALON <midgy971@gmail.com>
> 
> The RK3568 has an NVDLA-derived NPU at fde40000 with its own IOMMU at
> fde4b000. Add both nodes (disabled by default) and the NPU power-domain
> child under the PMU power-controller, and point rockchip,pmu at the PMU
> syscon that controls the NPU NoC bus-idle.
> 
> Besides the SCMI compute clock, assign the CRU CLK_NPU so the NPU AXI
> bus clock comes up at 200 MHz rather than the 12 MHz boot default.
> 
> The power-domain deliberately carries no pm_qos: qos_npu sits behind the
> NPU NoC, which is gated until the NPU is brought up, so a genpd power-off
> QoS save would fault reading it.
> 
> Signed-off-by: Midgy BALON <midgy971@gmail.com>
> ---
>  arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 +++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> index 64bdd8b7754b5..313db59c1aed8 100644
> --- a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> @@ -512,6 +512,13 @@ power-domain@RK3568_PD_GPU {
>  				#power-domain-cells = <0>;
>  			};
>  
> +			pd_npu: power-domain@RK3568_PD_NPU {
> +				reg = <RK3568_PD_NPU>;
> +				clocks = <&cru ACLK_NPU_PRE>,
> +					 <&cru HCLK_NPU_PRE>;
> +				#power-domain-cells = <0>;
> +			};
> +
>  			/* These power domains are grouped by VD_LOGIC */
>  			power-domain@RK3568_PD_VI {
>  				reg = <RK3568_PD_VI>;
> @@ -572,6 +579,37 @@ power-domain@RK3568_PD_RKVENC {
>  		};
>  	};
>  
> +	rknn_core_0: npu@fde40000 {
> +		compatible = "rockchip,rk3568-rknn-core";
> +		reg = <0x0 0xfde40000 0x0 0x1000>,
> +		      <0x0 0xfde41000 0x0 0x1000>,
> +		      <0x0 0xfde43000 0x0 0x1000>;
> +		reg-names = "pc", "cna", "core";
> +		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
> +		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>,
> +			 <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_PRE>;
> +		clock-names = "aclk", "hclk", "npu", "pclk";
> +		assigned-clocks = <&scmi_clk SCMI_CLK_NPU>, <&cru CLK_NPU>;
> +		assigned-clock-rates = <200000000>, <600000000>;

This looks strange, the SCMI clk can be seen as a virtual clock that
changes between normal CRU NPU clk and a PVTPLL NPU clk (depending on
rate). 200 MHz, a typical opp-suspend value (switch to CRU clk instead
of PVTPLL), will set the CLK_NPU rate to 200 MHz, then setting CLK_NPU
to 600 MHz (the lowest rate that activates PVTPLL mode) outside of SCMI
control looks strange.

Suggest you only set SCMI NPU clk rate to 200 or 400 MHz and drop any
special handling, e.g. noc_init, to closer match RK3588 and defer any
use of PVTPLL clk to a future series that also adds OPP support.

> +		resets = <&cru SRST_A_NPU>, <&cru SRST_H_NPU>;
> +		reset-names = "srst_a", "srst_h";
> +		power-domains = <&power RK3568_PD_NPU>;
> +		rockchip,pmu = <&pmu>;

This looks wrong, the rockchip,pmu prop is typically used to reference
PMU GRF, see i.e. pinctrl node, not the power-management that is seem to
be correctly abstracted using power-domains above, please drop this prop.

Regards,
Jonas

> +		iommus = <&rknn_mmu_0>;
> +		status = "disabled";
> +	};
> +
> +	rknn_mmu_0: iommu@fde4b000 {
> +		compatible = "rockchip,iommu";
> +		reg = <0x0 0xfde4b000 0x0 0x40>;
> +		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
> +		clock-names = "aclk", "iface";
> +		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>;
> +		power-domains = <&power RK3568_PD_NPU>;
> +		#iommu-cells = <0>;
> +		status = "disabled";
> +	};
> +
>  	gpu: gpu@fde60000 {
>  		compatible = "rockchip,rk3568-mali", "arm,mali-bifrost";
>  		reg = <0x0 0xfde60000 0x0 0x4000>;


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-06-13  8:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
2026-06-13  7:01 ` [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data MidG971
2026-06-13  7:10   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data MidG971
2026-06-13  7:09   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support MidG971
2026-06-13  7:11   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout MidG971
2026-06-13  7:15   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 5/9] accel: rocket: Keep the IOMMU domain attached across jobs MidG971
2026-06-13  7:12   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 MidG971
2026-06-13  7:11   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU MidG971
2026-06-13  7:09   ` sashiko-bot
2026-06-13  8:18   ` Jonas Karlman
2026-06-13  7:01 ` [RFC PATCH v4 8/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU MidG971
2026-06-13  7:40   ` Jonas Karlman
2026-06-13  7:01 ` [RFC PATCH v4 9/9] pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain MidG971
2026-06-13  7:21   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox