[RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support

Devicetree
 help / color / mirror / Atom feed

* [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
@ 2026-06-04 13:52 Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

RFC, not for merge. End-to-end inference does not produce correct output
yet (see Status), so per the v2 discussion this is a request for design
feedback. It now probes, attaches, and submits cleanly on a stock
v7.1-rc6 tree; what remains is one hardware-internal issue.

The RK3568 has a single NVDLA-derived NPU core, the same IP family as the
RK3588 NPU the driver already supports; the register layout matches. The
RK3568 differences are a 32-bit NPU AXI/IOMMU (vs 40-bit) and explicit
PVTPLL/PMU bring-up to power and de-idle the NPU before it is reachable.

Patches:
  1-2  rocket: per-SoC data struct, then derive DMA width and core count
       from match data (refactors, no functional change).
  3    rocket: RK3568 SoC data + PVTPLL/PMU/NOC bring-up.
  4    rocket: reset the NPU before detaching the IOMMU on a job timeout
       (the detach otherwise stalls a wedged AXI master and WARNs).
  5    rocket: keep the IOMMU domain attached across jobs instead of
       re-attaching per job (the per-job rk_iommu handshake on the idle
       NPU MMU is slow and noisy).
  6    iommu/rockchip: clear AUTO_GATING bit 1 on the RK356x v1 IOMMU so
       the page-walker keeps its clock (else a TLB-miss walk never
       completes).
  7    dt-bindings: add the RK3568 NPU compatible.
  8-9  arm64 dts: add the NPU and its IOMMU, and enable them on ROCK 3B.

Dependency. The NPU MMU is rockchip-iommu v1 (32-bit) while the rest of
the RK3568 uses v2 (40-bit). They cannot coexist until the driver carries
per-device ops; this series is developed on top of Simon Xue's
"iommu/rockchip: Drop global rk_ops in favor of per-device ops" [1].
Without it the NPU IOMMU fails to probe on a full RK3568 boot.

Power bring-up. The NPU is brought up through the power-domain layer (no
driver hack): the NPU power-domain keeps its clocks but drops the pm_qos
phandle (qos_npu sits behind the gated NPU NoC, so genpd's power-off QoS
save faults reading it), and vdd_npu is marked always-on so the rail is
up before genpd de-idles the NoC at power-on. The PMU de-idle then ACKs
without PVTPLL running; PVTPLL is only needed for compute.

Status. On v7.1-rc6 the driver probes, creates /dev/accel/accel0,
attaches an IOMMU domain, and submits jobs; the program controller
fetches and broadcasts the command list. Inference output is still wrong,
and the cause is split across three layers:
  - kernel (this series): the RK3568 differences appear handled;
  - mesa/Teflon userspace: still emits RK3588-tuned config, wrong for
    RK3568 (to be filed separately on mesa-dev);
  - hardware: with corrected config the NPU's DMA reads the full input
    and weight tensors (confirmed via its DMA bandwidth counters), but
    the MAC/output stage never completes, the job times out, and the
    output stays at the buffer's zero-point. I have not found the missing
    step; it is not in the command list (replaying the vendor's
    byte-exact command list behaves the same). Pointers welcome,
    especially from anyone with RK3568 NPU experience.

Known residual. On the first IOMMU attach the NPU MMU is idle with paging
already enabled; the rk_iommu stall/reset handshake does not complete in
that state and logs one burst of timeouts before the (kept) domain
settles. It is harmless here because the job times out regardless, but it
points at an idle-MMU reconfiguration corner the rk_iommu code does not
handle on this block.

[1] https://lore.kernel.org/linux-rockchip/20260310105303.128859-1-xxm@rock-chips.com/

Changes since v2:
  - Tagged RFC; now tested on a stock v7.1-rc6 tree.
  - Bring-up moved into the power-domain/DT layer (no initcall hack).
  - Added the IOMMU detach-on-timeout and attach-once driver fixes.
  - Split the driver patch (Heiko): soc_data / match-data / RK3568.
  - Derive DMA width and core count from match data; drop the DT rescans.
  - Binding describes the hardware; added the missing $ref on rockchip,pmu.
  - Disclosed the per-device-ops IOMMU dependency.

Midgy BALON (9):
  accel: rocket: Introduce per-SoC rocket_soc_data
  accel: rocket: Derive DMA width and core count from match data
  accel: rocket: Add RK3568 SoC support
  accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  accel: rocket: Keep the IOMMU domain attached across jobs
  iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU

 .../npu/rockchip,rk3588-rknn-core.yaml        | 18 ++++-
 .../boot/dts/rockchip/rk3568-rock-3b.dts      | 14 +++-
 arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 +++++++++++
 drivers/accel/rocket/rocket_core.c            | 22 ++++++-
 drivers/accel/rocket/rocket_core.h            | 19 ++++++
 drivers/accel/rocket/rocket_device.c          | 15 ++---
 drivers/accel/rocket/rocket_device.h          |  3 +-
 drivers/accel/rocket/rocket_drv.c             | 66 ++++++++++++++++++-
 drivers/accel/rocket/rocket_job.c             | 35 ++++++++--
 drivers/iommu/rockchip-iommu.c                | 12 ++++
 10 files changed, 219 insertions(+), 23 deletions(-)


base-commit: 52c800fdcf11888ebeb50c3d707f782cc15b66eb
-- 
2.39.5


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:08   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data Midgy BALON
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Add a per-SoC data structure carried in the OF match table, currently
holding only the NPU AXI address width, and use it for the per-core DMA
mask instead of a hardcoded 40-bit value.  No functional change: the
RK3588 AXI master is 40-bit.  This prepares for SoCs with a narrower
address width.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c |  7 ++++++-
 drivers/accel/rocket/rocket_core.h | 11 +++++++++++
 drivers/accel/rocket/rocket_drv.c  |  6 +++++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index b3b2fa9ba645a..09c445af7de73 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -7,6 +7,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/err.h>
 #include <linux/iommu.h>
+#include <linux/of.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 #include <linux/reset.h>
@@ -21,6 +22,10 @@ int rocket_core_init(struct rocket_core *core)
 	u32 version;
 	int err = 0;
 
+	core->soc_data = of_device_get_match_data(dev);
+	if (!core->soc_data)
+		return dev_err_probe(dev, -EINVAL, "missing SoC match data\n");
+
 	core->resets[0].id = "srst_a";
 	core->resets[1].id = "srst_h";
 	err = devm_reset_control_bulk_get_exclusive(&pdev->dev, ARRAY_SIZE(core->resets),
@@ -52,7 +57,7 @@ int rocket_core_init(struct rocket_core *core)
 
 	dma_set_max_seg_size(dev, UINT_MAX);
 
-	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(core->soc_data->dma_bits));
 	if (err)
 		return err;
 
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index f6d7382854ca9..8ee105a0be40e 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -12,6 +12,16 @@
 
 #include "rocket_registers.h"
 
+struct rocket_core;
+
+/**
+ * struct rocket_soc_data - per-SoC configuration data
+ * @dma_bits: Physical address width reachable by the NPU's AXI master.
+ */
+struct rocket_soc_data {
+	unsigned int dma_bits;
+};
+
 #define rocket_pc_readl(core, reg) \
 	readl((core)->pc_iomem + (REG_PC_##reg))
 #define rocket_pc_writel(core, reg, value) \
@@ -31,6 +41,7 @@ struct rocket_core {
 	struct device *dev;
 	struct rocket_device *rdev;
 	unsigned int index;
+	const struct rocket_soc_data *soc_data;
 
 	int irq;
 	void __iomem *pc_iomem;
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 8bbbce594883e..384c38e13acce 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -213,8 +213,12 @@ static void rocket_remove(struct platform_device *pdev)
 	}
 }
 
+static const struct rocket_soc_data rk3588_soc_data = {
+	.dma_bits = 40,
+};
+
 static const struct of_device_id dt_match[] = {
-	{ .compatible = "rockchip,rk3588-rknn-core" },
+	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
 	{}
 };
 MODULE_DEVICE_TABLE(of, dt_match);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:05   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support Midgy BALON
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

The probe already has the per-SoC match data, which now records the core
count and DMA width.  Use it for the cores array allocation and the
device DMA mask instead of re-scanning the device tree for available core
nodes.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.h   |  2 ++
 drivers/accel/rocket/rocket_device.c | 15 +++++----------
 drivers/accel/rocket/rocket_device.h |  3 ++-
 drivers/accel/rocket/rocket_drv.c    |  7 ++++++-
 4 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 8ee105a0be40e..d6421251670dc 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -16,9 +16,11 @@ struct rocket_core;
 
 /**
  * struct rocket_soc_data - per-SoC configuration data
+ * @num_cores: Number of NPU cores in this SoC.
  * @dma_bits: Physical address width reachable by the NPU's AXI master.
  */
 struct rocket_soc_data {
+	unsigned int num_cores;
 	unsigned int dma_bits;
 };
 
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index 46e6ee1e72c5f..6186f4faa3a2a 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -6,18 +6,16 @@
 #include <linux/clk.h>
 #include <linux/dma-mapping.h>
 #include <linux/platform_device.h>
-#include <linux/of.h>
 
 #include "rocket_device.h"
 
 struct rocket_device *rocket_device_init(struct platform_device *pdev,
-					 const struct drm_driver *rocket_drm_driver)
+					 const struct drm_driver *rocket_drm_driver,
+					 const struct rocket_soc_data *soc_data)
 {
 	struct device *dev = &pdev->dev;
-	struct device_node *core_node;
 	struct rocket_device *rdev;
 	struct drm_device *ddev;
-	unsigned int num_cores = 0;
 	int err;
 
 	rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
@@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
 	ddev = &rdev->ddev;
 	dev_set_drvdata(dev, rdev);
 
-	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
-		if (of_device_is_available(core_node))
-			num_cores++;
-
-	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
+	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
+				   GFP_KERNEL);
 	if (!rdev->cores)
 		return ERR_PTR(-ENOMEM);
 
 	dma_set_max_seg_size(dev, UINT_MAX);
 
-	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(soc_data->dma_bits));
 	if (err)
 		return ERR_PTR(err);
 
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index ce662abc01d3d..2f74e078974e3 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -22,7 +22,8 @@ struct rocket_device {
 };
 
 struct rocket_device *rocket_device_init(struct platform_device *pdev,
-					 const struct drm_driver *rocket_drm_driver);
+					 const struct drm_driver *rocket_drm_driver,
+					 const struct rocket_soc_data *soc_data);
 void rocket_device_fini(struct rocket_device *rdev);
 #define to_rocket_device(drm_dev) \
 	((struct rocket_device *)(container_of((drm_dev), struct rocket_device, ddev)))
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 384c38e13acce..c18840e5aff76 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -159,11 +159,15 @@ static const struct drm_driver rocket_drm_driver = {
 
 static int rocket_probe(struct platform_device *pdev)
 {
+	const struct rocket_soc_data *soc_data = of_device_get_match_data(&pdev->dev);
 	int ret;
 
+	if (!soc_data)
+		return -EINVAL;
+
 	if (rdev == NULL) {
 		/* First core probing, initialize DRM device. */
-		rdev = rocket_device_init(drm_dev, &rocket_drm_driver);
+		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
 		if (IS_ERR(rdev)) {
 			dev_err(&pdev->dev, "failed to initialize rocket device\n");
 			return PTR_ERR(rdev);
@@ -214,6 +218,7 @@ static void rocket_remove(struct platform_device *pdev)
 }
 
 static const struct rocket_soc_data rk3588_soc_data = {
+	.num_cores = 3,
 	.dma_bits = 40,
 };
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:05   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout Midgy BALON
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

The RK3568 has a single core of the same NVDLA-derived NPU IP as the
RK3588, with a 32-bit AXI master.  Unlike the RK3588 it must be powered
on and de-idled through the PMU, and its PVTPLL clock started via SCMI,
before the NPU is reachable.  Add rk3568_soc_data with an noc_init
callback performing this bring-up.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c |  9 +++++
 drivers/accel/rocket/rocket_core.h |  3 ++
 drivers/accel/rocket/rocket_drv.c  | 53 ++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index 09c445af7de73..a8de876365873 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -88,6 +88,15 @@ int rocket_core_init(struct rocket_core *core)
 		return err;
 	}
 
+	if (core->soc_data->noc_init) {
+		err = core->soc_data->noc_init(core);
+		if (err) {
+			pm_runtime_put_sync(dev);
+			rocket_job_fini(core);
+			return err;
+		}
+	}
+
 	version = rocket_pc_readl(core, VERSION);
 	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
 
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index d6421251670dc..66d138a8ed773 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -18,10 +18,13 @@ struct rocket_core;
  * struct rocket_soc_data - per-SoC configuration data
  * @num_cores: Number of NPU cores in this SoC.
  * @dma_bits: Physical address width reachable by the NPU's AXI master.
+ * @noc_init: Optional callback to power on and de-idle the NPU NOC bus.
+ *            Required on RK3568, where this is done through the PMU.
  */
 struct rocket_soc_data {
 	unsigned int num_cores;
 	unsigned int dma_bits;
+	int (*noc_init)(struct rocket_core *core);
 };
 
 #define rocket_pc_readl(core, reg) \
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index c18840e5aff76..5a72d0b5f4dff 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -9,9 +9,11 @@
 #include <linux/clk.h>
 #include <linux/err.h>
 #include <linux/iommu.h>
+#include <linux/mfd/syscon.h>
 #include <linux/of.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
+#include <linux/regmap.h>
 
 #include "rocket_device.h"
 #include "rocket_drv.h"
@@ -217,12 +219,63 @@ static void rocket_remove(struct platform_device *pdev)
 	}
 }
 
+/*
+ * On RK3568 the NPU NOC bus is gated and idle out of reset and must be
+ * powered on and de-idled through the PMU before the NPU is reachable.  PMU
+ * registers use a write-mask protocol: the upper 16 bits enable writes to the
+ * matching lower 16 bits.
+ *
+ * The NPU's high-speed clock is a PVTPLL managed by TF-A via SCMI and must be
+ * running before the NOC acknowledges the de-idle request.  Force a real SCMI
+ * rate change (an intermediate rate defeats the clock framework's
+ * unchanged-rate shortcut) now that the power domain is on and clocks enabled.
+ */
+#define ROCKET_RK3568_SCMI_CLK	2
+
+static int rk3568_noc_init(struct rocket_core *core)
+{
+	struct regmap *pmu;
+	unsigned int val;
+	int ret;
+
+	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 600000000UL);
+	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);
+
+	pmu = syscon_regmap_lookup_by_phandle(core->dev->of_node, "rockchip,pmu");
+	if (IS_ERR(pmu))
+		return dev_err_probe(core->dev, PTR_ERR(pmu),
+				     "failed to get PMU regmap\n");
+
+	/* Power on the NPU power domain (PWR_GATE_SFTCON bit 1 = 0). */
+	regmap_write(pmu, 0xa0, BIT(1 + 16));
+
+	/* Disable NPU NOC auto-idle (NOC_AUTO_CON0 bit 2). */
+	regmap_write(pmu, 0x70, BIT(2 + 16));
+
+	/* Request NPU bus de-idle (BUS_IDLE_SFTCON0 bit 2 = 0). */
+	regmap_write(pmu, 0x50, BIT(2 + 16));
+
+	/* Wait for the bus to report active (BUS_IDLE_ST bit 2 = 0). */
+	ret = regmap_read_poll_timeout(pmu, 0x68, val, !(val & BIT(2)), 10, 1000);
+	if (ret)
+		dev_err(core->dev, "timed out waiting for NPU bus de-idle\n");
+
+	return ret;
+}
+
+static const struct rocket_soc_data rk3568_soc_data = {
+	.num_cores = 1,
+	.dma_bits = 32,
+	.noc_init = rk3568_noc_init,
+};
+
 static const struct rocket_soc_data rk3588_soc_data = {
 	.num_cores = 3,
 	.dma_bits = 40,
 };
 
 static const struct of_device_id dt_match[] = {
+	{ .compatible = "rockchip,rk3568-rknn-core", .data = &rk3568_soc_data },
 	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
 	{}
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (2 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:10   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs Midgy BALON
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

On a job timeout the NPU AXI master can be left wedged with
outstanding transactions. rocket_reset() detached the IOMMU group
before resetting the hardware, so iommu_detach_group() ->
__iommu_group_set_core_domain() asked the rk_iommu to stall and wait
for the in-flight transactions to drain. They never did, the stall
request timed out (-ETIMEDOUT) and the IOMMU core WARNed:

  WARNING: drivers/iommu/iommu.c:157 __iommu_group_set_core_domain
    iommu_detach_group
    rocket_reset
    rocket_job_timedout

Assert the core reset first: it quiesces the AXI master so the
following IOMMU detach completes cleanly. Move the detach after
rocket_core_reset() and out of the job_lock (it does not touch
in_flight_job).

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_job.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index ac51bff39833f..e25234261536b 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -364,14 +364,20 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
 		if (core->in_flight_job)
 			pm_runtime_put_noidle(core->dev);
 
-		iommu_detach_group(NULL, core->iommu_group);
-
 		core->in_flight_job = NULL;
 	}
 
-	/* Proceed with reset now. */
+	/*
+	 * Reset the NPU hardware before detaching the IOMMU. A timed-out job
+	 * leaves the NPU AXI master wedged; detaching the IOMMU then issues a
+	 * stall request that never drains and times out (warning in the IOMMU
+	 * core). Asserting the core reset first quiesces the master so the
+	 * detach completes cleanly.
+	 */
 	rocket_core_reset(core);
 
+	iommu_detach_group(NULL, core->iommu_group);
+
 	/* NPU has been reset, we can clear the reset pending bit. */
 	atomic_set(&core->reset.pending, 0);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (3 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:08   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

rocket attached the job's IOMMU domain in rocket_job_run() and
detached it again on every completion and reset. Each attach/detach
toggles the rk_iommu stall/force-reset/paging handshake, and on
RK3568 the NPU MMU is idle between jobs, so that handshake times out
and logs a burst of "stall/paging request timed out" errors for
every job.

Attach the per-context domain once and keep it: track the attached
domain in the core, swap it only when a job from a different context
runs, and detach it at core teardown. A reference on the attached
domain is held so it outlives the job that first attached it and is
released on swap/teardown.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c |  6 ++++++
 drivers/accel/rocket/rocket_core.h |  3 +++
 drivers/accel/rocket/rocket_job.c  | 27 +++++++++++++++++++++------
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index a8de876365873..634f78dfe2887 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -13,6 +13,7 @@
 #include <linux/reset.h>
 
 #include "rocket_core.h"
+#include "rocket_drv.h"
 #include "rocket_job.h"
 
 int rocket_core_init(struct rocket_core *core)
@@ -112,6 +113,11 @@ void rocket_core_fini(struct rocket_core *core)
 {
 	pm_runtime_dont_use_autosuspend(core->dev);
 	pm_runtime_disable(core->dev);
+	if (core->attached_domain) {
+		iommu_detach_group(NULL, core->iommu_group);
+		rocket_iommu_domain_put(core->attached_domain);
+		core->attached_domain = NULL;
+	}
 	iommu_group_put(core->iommu_group);
 	core->iommu_group = NULL;
 	rocket_job_fini(core);
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 66d138a8ed773..05a197a9c0113 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -42,6 +42,8 @@ struct rocket_soc_data {
 #define rocket_core_writel(core, reg, value) \
 	writel(value, (core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
 
+struct rocket_iommu_domain;
+
 struct rocket_core {
 	struct device *dev;
 	struct rocket_device *rdev;
@@ -56,6 +58,7 @@ struct rocket_core {
 	struct reset_control_bulk_data resets[2];
 
 	struct iommu_group *iommu_group;
+	struct rocket_iommu_domain *attached_domain;
 
 	struct mutex job_lock;
 	struct rocket_job *in_flight_job;
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index e25234261536b..b248371be8a1e 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -9,6 +9,7 @@
 #include <drm/rocket_accel.h>
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
+#include <linux/kref.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 
@@ -314,9 +315,26 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
 	if (ret < 0)
 		return fence;
 
-	ret = iommu_attach_group(job->domain->domain, core->iommu_group);
-	if (ret < 0)
-		return fence;
+	/*
+	 * Attach the job's IOMMU domain only when it differs from the one
+	 * already attached. Re-attaching per job toggles the rk_iommu
+	 * stall/reset handshake on an idle NPU MMU, which is slow and
+	 * noisy; keep the domain attached across jobs instead.
+	 */
+	if (core->attached_domain != job->domain) {
+		if (core->attached_domain) {
+			iommu_detach_group(NULL, core->iommu_group);
+			rocket_iommu_domain_put(core->attached_domain);
+			core->attached_domain = NULL;
+		}
+
+		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
+		if (ret < 0)
+			return fence;
+
+		kref_get(&job->domain->kref);
+		core->attached_domain = job->domain;
+	}
 
 	scoped_guard(mutex, &core->job_lock) {
 		core->in_flight_job = job;
@@ -340,7 +358,6 @@ static void rocket_job_handle_irq(struct rocket_core *core)
 				return;
 			}
 
-			iommu_detach_group(NULL, iommu_group_get(core->dev));
 			dma_fence_signal(core->in_flight_job->done_fence);
 			pm_runtime_put_autosuspend(core->dev);
 			core->in_flight_job = NULL;
@@ -376,8 +393,6 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
 	 */
 	rocket_core_reset(core);
 
-	iommu_detach_group(NULL, core->iommu_group);
-
 	/* NPU has been reset, we can clear the reset pending bit. */
 	atomic_set(&core->reset.pending, 0);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (4 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:04   ` sashiko-bot
  2026-06-04 14:20   ` Tomeu Vizoso
  2026-06-04 13:52 ` [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 Midgy BALON
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

On the RK356x v1 IOMMU, RK_MMU_AUTO_GATING resets to 0x3. Bit 1 enables
auto clock-gating of the page-table walker, so the walker's AXI master
loses its clock between transactions; a TLB-miss page walk then never
completes and the IOMMU is left stuck (PAGING_ENABLED, never IDLE).

Clear bit 1 (keeping bit 0, the slave-port gate) once paging is enabled
so the walker keeps its clock. This is required for the RK3568 NPU MMU.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/iommu/rockchip-iommu.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 4da80136933c4..e3d8b6e9ca12b 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -953,6 +953,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
 
 	ret = rk_iommu_enable_paging(iommu);
 
+	if (!ret) {
+		/*
+		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
+		 * auto clock-gating; the walker's AXI master then loses its clock
+		 * between transactions and a TLB-miss page walk never completes,
+		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
+		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
+		 */
+		for (i = 0; i < iommu->num_mmu; i++)
+			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);
+	}
+
 out_disable_stall:
 	rk_iommu_disable_stall(iommu);
 out_disable_clocks:
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (5 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:08   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 9/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU Midgy BALON
  8 siblings, 1 reply; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

The RK3568 carries a single core of the same NVDLA-derived NPU IP as the
RK3588.  Add its compatible.

On RK3568 the NPU NOC bus-idle and power gating are controlled through the
system PMU rather than a dedicated register block, so add a rockchip,pmu
phandle to that syscon.  The RK3568 NPU has no dedicated SRAM rail, so
sram-supply is required only on RK3588.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 .../npu/rockchip,rk3588-rknn-core.yaml         | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
index caca2a4903cd1..af9936b32e9fe 100644
--- a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
+++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
@@ -21,6 +21,7 @@ properties:
 
   compatible:
     enum:
+      - rockchip,rk3568-rknn-core
       - rockchip,rk3588-rknn-core
 
   reg:
@@ -50,6 +51,13 @@ properties:
 
   npu-supply: true
 
+  rockchip,pmu:
+    $ref: /schemas/types.yaml#/definitions/phandle
+    description:
+      Phandle to the PMU syscon.  On RK3568 the NPU's NOC bus-idle and
+      power gating are controlled through the PMU; this points to that
+      syscon so those registers can be reached.
+
   power-domains:
     maxItems: 1
 
@@ -75,7 +83,15 @@ required:
   - resets
   - reset-names
   - npu-supply
-  - sram-supply
+
+if:
+  properties:
+    compatible:
+      contains:
+        const: rockchip,rk3588-rknn-core
+then:
+  required:
+    - sram-supply
 
 additionalProperties: false
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (6 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:11   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 9/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU Midgy BALON
  8 siblings, 1 reply; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

The RK3568 has an NVDLA-derived NPU at fde40000 with its own IOMMU at
fde4b000. Add both nodes (disabled by default) and the NPU power-domain
child under the PMU power-controller, and point rockchip,pmu at the PMU
syscon that controls the NPU NoC bus-idle.

The power-domain deliberately carries no pm_qos: qos_npu sits behind the
NPU NoC, which is gated until the NPU is brought up, so a genpd power-off
QoS save would fault reading it.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
index 64bdd8b7754b5..50ce5a5e4fc24 100644
--- a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
@@ -512,6 +512,13 @@ power-domain@RK3568_PD_GPU {
 				#power-domain-cells = <0>;
 			};
 
+			pd_npu: power-domain@RK3568_PD_NPU {
+				reg = <RK3568_PD_NPU>;
+				clocks = <&cru ACLK_NPU_PRE>,
+					 <&cru HCLK_NPU_PRE>;
+				#power-domain-cells = <0>;
+			};
+
 			/* These power domains are grouped by VD_LOGIC */
 			power-domain@RK3568_PD_VI {
 				reg = <RK3568_PD_VI>;
@@ -948,6 +955,37 @@ qos_rga_wr: qos@fe158300 {
 		reg = <0x0 0xfe158300 0x0 0x20>;
 	};
 
+	rknn_core_0: npu@fde40000 {
+		compatible = "rockchip,rk3568-rknn-core";
+		reg = <0x0 0xfde40000 0x0 0x1000>,
+		      <0x0 0xfde41000 0x0 0x1000>,
+		      <0x0 0xfde43000 0x0 0x1000>;
+		reg-names = "pc", "cna", "core";
+		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>,
+			 <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_PRE>;
+		clock-names = "aclk", "hclk", "npu", "pclk";
+		assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
+		assigned-clock-rates = <200000000>;
+		resets = <&cru SRST_A_NPU>, <&cru SRST_H_NPU>;
+		reset-names = "srst_a", "srst_h";
+		power-domains = <&power RK3568_PD_NPU>;
+		rockchip,pmu = <&pmu>;
+		iommus = <&rknn_mmu_0>;
+		status = "disabled";
+	};
+
+	rknn_mmu_0: iommu@fde4b000 {
+		compatible = "rockchip,iommu";
+		reg = <0x0 0xfde4b000 0x0 0x40>;
+		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
+		clock-names = "aclk", "iface";
+		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>;
+		power-domains = <&power RK3568_PD_NPU>;
+		#iommu-cells = <0>;
+		status = "disabled";
+	};
+
 	qos_npu: qos@fe180000 {
 		compatible = "rockchip,rk3568-qos", "syscon";
 		reg = <0x0 0xfe180000 0x0 0x20>;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v3 9/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (7 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  8 siblings, 0 replies; 20+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Enable the NPU and its IOMMU on ROCK 3B.

vdd_npu is marked always-on so the rail is up before genpd de-idles the
NPU NoC at power-on: the PMU de-idle handshake needs the rail powered.
The PVTPLL compute clock is brought up later by the driver.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts b/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
index 69001e453732e..7ac780ed313d5 100644
--- a/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
@@ -330,8 +330,10 @@ regulator-state-mem {
 
 			vdd_npu: DCDC_REG4 {
 				regulator-name = "vdd_npu";
+				regulator-always-on;
+				regulator-boot-on;
 				regulator-initial-mode = <0x2>;
-				regulator-min-microvolt = <500000>;
+				regulator-min-microvolt = <825000>;
 				regulator-max-microvolt = <1350000>;
 				regulator-ramp-delay = <6001>;
 
@@ -787,3 +789,13 @@ vp0_out_hdmi: endpoint@ROCKCHIP_VOP2_EP_HDMI0 {
 		remote-endpoint = <&hdmi_in_vp0>;
 	};
 };
+
+&rknn_core_0 {
+	npu-supply = <&vdd_npu>;
+	status = "okay";
+};
+
+&rknn_mmu_0 {
+	status = "okay";
+};
+
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
@ 2026-06-04 14:04   ` sashiko-bot
  2026-06-04 14:20   ` Tomeu Vizoso
  1 sibling, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-04 14:04 UTC (permalink / raw)
  To: Midgy BALON; +Cc: dri-devel, devicetree, robh, conor+dt

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The patch writes 0x2 to RK_MMU_AUTO_GATING to clear bit 1, but this value actually sets bit 1 and clears bit 0, doing the exact opposite of the intended behavior.
- [High] An SoC-specific workaround is applied unconditionally to all Rockchip IOMMUs using a brittle hardcoded register overwrite.
--

commit 86bf3e3b9a68d419b8e6ba9d4d3cf8afb0c324bb
Author: Midgy BALON <midgy971@gmail.com>

iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU

This commit aims to fix a hardware lockup on the RK356x v1 IOMMU by
clearing bit 1 of the RK_MMU_AUTO_GATING register. This change is intended
to prevent auto clock-gating of the page-table walker.

> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 0013cf196c573..7d40ecb26123f 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -952,6 +952,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>  
>  	ret = rk_iommu_enable_paging(iommu);
>  
> +	if (!ret) {
> +		/*
> +		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> +		 * auto clock-gating; the walker's AXI master then loses its clock
> +		 * between transactions and a TLB-miss page walk never completes,
> +		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> +		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> +		 */
> +		for (i = 0; i < iommu->num_mmu; i++)

[Severity: High]
Will this loop unconditionally apply the RK356x v1 workaround to all SoCs
supported by the rockchip-iommu driver? It seems older variants like the
RK3288 or RK3399 might unexpectedly receive this register write in
rk_iommu_enable() without a check for the specific SoC version.

> +			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);

[Severity: High]
Does writing 0x2 here achieve the intended result? 

The comment mentions clearing bit 1 and keeping bit 0. Since the register
resets to 0x3 (binary 11), writing 0x2 (binary 10) actually sets bit 1 and
clears bit 0. This seems to do the exact opposite of what is described,
keeping the buggy page-walker auto clock-gating enabled and inadvertently
disabling the working slave-port gate. Should this be 0x1 instead?

Additionally, because rk_iommu_write() completely overwrites the register,
could this clobber other bits in RK_MMU_AUTO_GATING? Would it be safer to
use a read-modify-write pattern here?

> +	}
> +
>  out_disable_stall:
>  	rk_iommu_disable_stall(iommu);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=6

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support
  2026-06-04 13:52 ` [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support Midgy BALON
@ 2026-06-04 14:05   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-04 14:05 UTC (permalink / raw)
  To: Midgy BALON; +Cc: robh, devicetree, conor+dt, dri-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] The error path for `noc_init` failure in `rocket_core_init` calls `rocket_job_fini()` instead of `rocket_core_fini()`, leaking an IOMMU group reference and leaving PM runtime active.

Pre-existing issues:
- [High] The `core->clks` array lacks initialization of its `id` fields before being passed to `devm_clk_bulk_get()`, causing the driver to configure the rate on the wrong clock in `rk3568_noc_init()`.
- [High] Failed probe leaks `struct rocket_device` and associated DRM structures because they are devm-allocated against a global persistent platform device.
--

commit 60f18710aa215157b0b6ac91ed70f00cc049eb2c
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Add RK3568 SoC support

The RK3568 has a single core of the same NVDLA-derived NPU IP as the
RK3588, with a 32-bit AXI master.  Unlike the RK3588 it must be powered
on and de-idled through the PMU, and its PVTPLL clock started via SCMI,
before the NPU is reachable.  Add rk3568_soc_data with an noc_init
callback performing this bring-up.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index 09c445af7de73..a8de876365873 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
> @@ -88,6 +88,15 @@ int rocket_core_init(struct rocket_core *core)
>  		return err;
>  	}
>  
> +	if (core->soc_data->noc_init) {
> +		err = core->soc_data->noc_init(core);
> +		if (err) {
> +			pm_runtime_put_sync(dev);
> +			rocket_job_fini(core);

[Severity: High]
If noc_init() fails, should this call rocket_core_fini() instead of
rocket_job_fini()?

Calling rocket_job_fini() here bypasses the rest of the teardown that
rocket_core_fini() handles. This leaves the PM runtime enabled and
autosuspend incorrectly configured, and skips iommu_group_put() which
leaks the IOMMU group reference.

> +			return err;
> +		}
> +	}
> +
>  	version = rocket_pc_readl(core, VERSION);

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index c18840e5aff76..5a72d0b5f4dff 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c

[ ... ]

> @@ -217,12 +219,63 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }
>  
> +/*
> + * On RK3568 the NPU NOC bus is gated and idle out of reset and must be
> + * powered on and de-idled through the PMU before the NPU is reachable.  PMU
> + * registers use a write-mask protocol: the upper 16 bits enable writes to the
> + * matching lower 16 bits.
> + *
> + * The NPU's high-speed clock is a PVTPLL managed by TF-A via SCMI and must be
> + * running before the NOC acknowledges the de-idle request.  Force a real SCMI
> + * rate change (an intermediate rate defeats the clock framework's
> + * unchanged-rate shortcut) now that the power domain is on and clocks enabled.
> + */
> +#define ROCKET_RK3568_SCMI_CLK	2
> +
> +static int rk3568_noc_init(struct rocket_core *core)
> +{
> +	struct regmap *pmu;
> +	unsigned int val;
> +	int ret;
> +
> +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 600000000UL);
> +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);

[Severity: High]
This is a pre-existing issue, but does core->clks actually contain the correct
SCMI clock here?

Looking at rocket_core_init(), devm_clk_bulk_get() is called without first
initializing the .id fields of the core->clks array:

drivers/accel/rocket/rocket_core.c:rocket_core_init() {
    ...
    err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
    ...
}

Because the IDs are uninitialized (NULL), clk_get() will repeatedly return the
very first clock of the device. Does this mean these clk_set_rate() calls are
erroneously forcing a rate change on the wrong clock while leaving the PVTPLL
unconfigured?

[ ... ]

[Severity: High]
This is a pre-existing issue, but could probe failures (such as noc_init
returning an error) cause an unbounded memory leak?

In rocket_probe(), the global drm_dev is passed to rocket_device_init():

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
    ...
    rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
    ...
}

Inside rocket_device_init(), memory is allocated against this global
device's devres list:

drivers/accel/rocket/rocket_device.c:rocket_device_init() {
    ...
    rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
    ...
}

If probe fails, rocket_probe() calls rocket_device_fini() to unregister
the DRM device. However, it does not free the devm-allocated memory
because the allocations are tied to the global drm_dev that persists until
module unload. Does this mean every deferred probe or bind/unbind action
allocates new memory on the global devres list?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data
  2026-06-04 13:52 ` [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data Midgy BALON
@ 2026-06-04 14:05   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-04 14:05 UTC (permalink / raw)
  To: Midgy BALON; +Cc: robh, dri-devel, devicetree, conor+dt

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Missing bounds check on `rdev->num_cores` against `soc_data->num_cores` causes an out-of-bounds array write.

Pre-existing issues:
- [Critical] Early registration of the DRM device creates a race window leading to a NULL pointer dereference in `rocket_open`.
- [High] Array indexing mismatch on unbind/rebind causes state corruption and failed lookups.
- [High] Race condition during global device initialization and array modification due to lack of locking.
- [High] Failure in `rocket_device_init()` leaves an `ERR_PTR` in the global `rdev` variable, causing a panic on the next probe.
--

commit d76917aa771a1a958c02a6121211647bc25a0160
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Derive DMA width and core count from match data

The probe already has the per-SoC match data, which now records the core
count and DMA width.  Use it for the cores array allocation and the
device DMA mask instead of re-scanning the device tree for available core
nodes.

> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> index 46e6ee1e72c5f..6186f4faa3a2a 100644
> --- a/drivers/accel/rocket/rocket_device.c
> +++ b/drivers/accel/rocket/rocket_device.c
[ ... ]
> @@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
>  	ddev = &rdev->ddev;
>  	dev_set_drvdata(dev, rdev);
>  
> -	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> -		if (of_device_is_available(core_node))
> -			num_cores++;
> -
> -	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> +	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
> +				   GFP_KERNEL);

[Severity: Critical]
This is a pre-existing issue, but does rocket_device_init() register the
DRM device prematurely?

The first core to probe invokes rocket_device_init(), which immediately calls
drm_dev_register() at the end of the function. This exposes the DRM device to
userspace before the caller can initialize the core array elements.

If userspace quickly opens the DRM device node:

rocket_open()
  rocket_iommu_domain_create(rdev->cores[0].dev)

Since the device pointer in the array has not yet been assigned in
rocket_probe() and is still NULL, this results in a NULL pointer dereference.

>  	if (!rdev->cores)
>  		return ERR_PTR(-ENOMEM);
>  
>  	dma_set_max_seg_size(dev, UINT_MAX);
>  
> -	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
> +	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(soc_data->dma_bits));
>  	if (err)
>  		return ERR_PTR(err);
>  
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 384c38e13acce..c18840e5aff76 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -159,11 +159,15 @@ static const struct drm_driver rocket_drm_driver = {
>  
>  static int rocket_probe(struct platform_device *pdev)
>  {
> +	const struct rocket_soc_data *soc_data = of_device_get_match_data(&pdev->dev);
>  	int ret;
>  
> +	if (!soc_data)
> +		return -EINVAL;
> +
>  	if (rdev == NULL) {

[Severity: High]
This isn't a bug introduced by this patch, but does this lack locking?

The kernel driver core can probe multiple matching rockchip,rk3588-rknn-core
devices in parallel. Concurrent probes could observe rdev == NULL
simultaneously, instantiating the DRM device multiple times and leaking
memory. Additionally, concurrent reads and increments of rdev->num_cores can
cause multiple threads to overwrite the same array slot in rdev->cores[]
while leaving others uninitialized.

>  		/* First core probing, initialize DRM device. */
> -		rdev = rocket_device_init(drm_dev, &rocket_drm_driver);
> +		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);

[Severity: High]
This is a pre-existing issue, but what happens if rocket_device_init() fails?

If rocket_device_init() returns an error pointer, it is directly assigned to
the global rdev variable. The error path correctly returns the error but
fails to reset the static global rdev back to NULL. When the driver core
invokes rocket_probe() for the next matching device tree node, the
rdev == NULL check evaluates to false because rdev contains an error pointer.
The probe will then attempt to access rdev->num_cores, causing an invalid
pointer dereference.

>  		if (IS_ERR(rdev)) {
>  			dev_err(&pdev->dev, "failed to initialize rocket device\n");
>  			return PTR_ERR(rdev);
>  		}
>  	}

[Severity: High]
Can this cause an out-of-bounds array write?

The allocation of the rdev->cores array now uses a fixed size defined by
soc_data->num_cores instead of dynamically counting the device tree nodes.
However, rocket_probe() is called by the driver core for every matching DT
node, and it unconditionally uses the auto-incrementing rdev->num_cores as
the array index later in the function.

If a malformed or custom device tree contains more core nodes than
soc_data->num_cores, rocket_probe() will be invoked more times than the array
can hold, writing past the end of the rdev->cores buffer.

[ ... ]
> @@ -214,6 +218,7 @@ static void rocket_remove(struct platform_device *pdev)
>  }

[Severity: High]
This isn't a bug introduced by this patch, but does the array indexing handle
unbind and rebind correctly?

When a core is unbound, rocket_remove() decrements rdev->num_cores without
compacting the remaining active cores in the rdev->cores array to fill the
empty slot. Consequently, find_core_for_dev() (which relies on
rdev->num_cores for its loop bound) fails to find any remaining cores
located at indices >= the new rdev->num_cores, preventing their cleanup.

Furthermore, if a core is re-bound, rocket_probe() will use the decremented
rdev->num_cores as the index and overwrite an existing active core, causing
resource leaks and state corruption.

>  
>  static const struct rocket_soc_data rk3588_soc_data = {
> +	.num_cores = 3,
>  	.dma_bits = 40,
>  };

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs
  2026-06-04 13:52 ` [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs Midgy BALON
@ 2026-06-04 14:08   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-04 14:08 UTC (permalink / raw)
  To: Midgy BALON; +Cc: conor+dt, robh, devicetree, dri-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] The IOMMU hardware registers are left uninitialized after a job timeout and hardware reset, causing subsequent jobs to immediately fault.

Pre-existing issues:
- [High] Incorrect teardown ordering in `rocket_core_fini()` destroys IOMMU and PM resources before stopping the DRM scheduler, leading to data races, use-after-free, and potential hardware faults.
- [High] PM runtime usage counts and unsignaled fences are leaked on error paths in `rocket_job_run()`.
--

commit 49c877f1ba7baae4a28268850b55e2ac3ab4a8cc
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Keep the IOMMU domain attached across jobs

rocket attached the job's IOMMU domain in rocket_job_run() and
detached it again on every completion and reset. Each attach/detach
toggles the rk_iommu stall/force-reset/paging handshake, and on
RK3568 the NPU MMU is idle between jobs, so that handshake times out
and logs a burst of "stall/paging request timed out" errors for
every job.

Attach the per-context domain once and keep it: track the attached
domain in the core, swap it only when a job from a different context
runs, and detach it at core teardown. A reference on the attached
domain is held so it outlives the job that first attached it and is
released on swap/teardown.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index a8de876365873..634f78dfe2887 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c

[ ... ]

> @@ -112,6 +113,11 @@ void rocket_core_fini(struct rocket_core *core)
>  {
>  	pm_runtime_dont_use_autosuspend(core->dev);
>  	pm_runtime_disable(core->dev);
> +	if (core->attached_domain) {
> +		iommu_detach_group(NULL, core->iommu_group);
> +		rocket_iommu_domain_put(core->attached_domain);
> +		core->attached_domain = NULL;
> +	}
>  	iommu_group_put(core->iommu_group);
>  	core->iommu_group = NULL;

[Severity: High]
This is a pre-existing issue, but IOMMU resources are being torn down before
the DRM scheduler is stopped in rocket_job_fini(core).

Because the scheduler might still be active, couldn't an in-flight job be
executing concurrently? This could lead to a data race on core->attached_domain,
the IOMMU domain being forcefully unmapped from a running job causing
AXI bus faults, or a use-after-free when rocket_job_run() accesses
core->iommu_group.

Should rocket_job_fini(core) be called before tearing down the IOMMU state to
ensure the hardware is idle and the scheduler is stopped?

>  	rocket_job_fini(core);
>  }

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index e25234261536b..b248371be8a1e 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c

[ ... ]

> @@ -314,9 +315,26 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
>  	if (ret < 0)
>  		return fence;

[Severity: High]
This isn't a bug introduced by this patch, but if pm_runtime_get_sync()
fails, it returns a negative error code but still increments the PM usage counter.

The error path returns without calling pm_runtime_put_noidle() or signaling
the fence, which leaks the PM counter and causes the scheduler to wait indefinitely.

> -	ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> -	if (ret < 0)
> -		return fence;
> +	/*
> +	 * Attach the job's IOMMU domain only when it differs from the one
> +	 * already attached. Re-attaching per job toggles the rk_iommu
> +	 * stall/reset handshake on an idle NPU MMU, which is slow and
> +	 * noisy; keep the domain attached across jobs instead.
> +	 */
> +	if (core->attached_domain != job->domain) {
> +		if (core->attached_domain) {
> +			iommu_detach_group(NULL, core->iommu_group);
> +			rocket_iommu_domain_put(core->attached_domain);
> +			core->attached_domain = NULL;
> +		}
> +
> +		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> +		if (ret < 0)
> +			return fence;

[Severity: High]
Also a pre-existing issue, but if iommu_attach_group() fails, it returns
the fence early without dropping the successfully acquired PM reference.
Should this path include a pm_runtime_put_noidle() call to avoid leaking
the reference?

> +
> +		kref_get(&job->domain->kref);
> +		core->attached_domain = job->domain;
> +	}
>  
>  	scoped_guard(mutex, &core->job_lock) {
>  		core->in_flight_job = job;

[ ... ]

> @@ -376,8 +393,6 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
>  	 */
>  	rocket_core_reset(core);
>  
> -	iommu_detach_group(NULL, core->iommu_group);
> -
>  	/* NPU has been reset, we can clear the reset pending bit. */
>  	atomic_set(&core->reset.pending, 0);

[Severity: High]
When the NPU is hard-reset by rocket_core_reset(), the IOMMU's hardware registers
(like page table base addresses) are wiped.

Since core->attached_domain is not cleared here, when the scheduler restarts
and executes the next job, rocket_job_run() will see core->attached_domain
matches the job's domain and skip calling iommu_attach_group().

Will this leave the IOMMU hardware unconfigured and cause subsequent jobs to
immediately throw DMA faults? Should core->attached_domain be cleared here
to force a re-attach and reprogramming of the IOMMU?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  2026-06-04 13:52 ` [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 Midgy BALON
@ 2026-06-04 14:08   ` sashiko-bot
  2026-06-04 16:55     ` Conor Dooley
  0 siblings, 1 reply; 20+ messages in thread
From: sashiko-bot @ 2026-06-04 14:08 UTC (permalink / raw)
  To: Midgy BALON; +Cc: conor+dt, dri-devel, robh, devicetree

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [High] The value written to `RK_MMU_AUTO_GATING` sets bit 1 instead of clearing it, contradicting the inline comment and hardware workaround logic.
- [High] The error handling path for `noc_init()` failure leaks the IOMMU group reference and leaves the device in an unbalanced PM runtime state.
- [High] A malformed device tree with excessive NPU core nodes triggers an out-of-bounds write in `rocket_probe()` array access.
- [High] The NOC bus de-idle sequence via the PMU is missing from the runtime PM resume path.
- [Medium] The `rockchip,pmu` property is functionally mandatory for RK3568 but is not added to the schema's conditional `required` list.

Pre-existing issues:
- [High] The clock names in the `core->clks` array are uninitialized before being passed to `devm_clk_bulk_get()`, causing the framework to fetch the wrong clocks.
--

commit 6c61da961bed3360a99f625d4a3b1599e7bfe685
Author: Midgy BALON <midgy971@gmail.com>

    dt-bindings: npu: rockchip, rk3588-rknn-core: Add RK3568
    
    The RK3568 carries a single core of the same NVDLA-derived NPU IP as the
    RK3588. Add its compatible. On RK3568 the NPU NOC bus-idle and power
    gating are controlled through the system PMU rather than a dedicated
    register block.

> diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> index caca2a4903cd1..af9936b32e9fe 100644
> --- a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
[ ... ]
> @@ -75,7 +83,15 @@ required:
>    - resets
>    - reset-names
>    - npu-supply
> -  - sram-supply
> +
> +if:
> +  properties:
> +    compatible:
> +      contains:
> +        const: rockchip,rk3588-rknn-core
> +then:
> +  required:
> +    - sram-supply

[Severity: Medium]
Does the missing rockchip,pmu property from the conditional required block
cause incomplete device trees to pass validation?

The rockchip,pmu property is functionally mandatory for the
rockchip,rk3568-rknn-core variant as the driver unconditionally fails to probe
without it. Missing this mandatory property here could allow invalid device
trees to go undetected.

>  additionalProperties: false
>  
> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index b3b2fa9ba645a..634f78dfe2887 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
[ ... ]
> @@ -83,6 +89,15 @@ int rocket_core_init(struct rocket_core *core)
>  		return err;
>  	}
>  
> +	if (core->soc_data->noc_init) {
> +		err = core->soc_data->noc_init(core);
> +		if (err) {
> +			pm_runtime_put_sync(dev);
> +			rocket_job_fini(core);
> +			return err;
> +		}
> +	}

[Severity: High]
Does this error path leak the IOMMU group reference and leave the device in an
unbalanced PM runtime state?

By returning directly after pm_runtime_put_sync() and rocket_job_fini(), the
code misses calling rocket_core_fini() which handles iommu_group_put(),
pm_runtime_disable(), and pm_runtime_dont_use_autosuspend().

> +
>  	version = rocket_pc_readl(core, VERSION);
>  	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
>  
> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> index 46e6ee1e72c5f..6186f4faa3a2a 100644
> --- a/drivers/accel/rocket/rocket_device.c
> +++ b/drivers/accel/rocket/rocket_device.c
[ ... ]
> @@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
>  	ddev = &rdev->ddev;
>  	dev_set_drvdata(dev, rdev);
>  
> -	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> -		if (of_device_is_available(core_node))
> -			num_cores++;
> -
> -	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> +	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
> +				   GFP_KERNEL);

[Severity: High]
Can this code overflow rdev->cores[] if the device tree contains more NPU
nodes than soc_data->num_cores?

With this change, the array size is fixed to soc_data->num_cores, but
rocket_probe() in drivers/accel/rocket/rocket_drv.c is still called for every
matching node. If there are too many nodes in the device tree, it appears
rocket_probe() will write out of bounds here:

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
    ...
	unsigned int core = rdev->num_cores;
    ...
	rdev->cores[core].rdev = rdev;
	rdev->cores[core].dev = &pdev->dev;
	rdev->cores[core].index = core;
    ...
}

>  	if (!rdev->cores)
>  		return ERR_PTR(-ENOMEM);
>  
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 8bbbce594883e..5a72d0b5f4dff 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
[ ... ]
> @@ -213,8 +219,64 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }
>  
> +/*
> + * On RK3568 the NPU NOC bus is gated and idle out of reset and must be
> + * powered on and de-idled through the PMU before the NPU is reachable.  PMU
> + * registers use a write-mask protocol: the upper 16 bits enable writes to the
> + * matching lower 16 bits.
> + *
> + * The NPU's high-speed clock is a PVTPLL managed by TF-A via SCMI and must be
> + * running before the NOC acknowledges the de-idle request.  Force a real SCMI
> + * rate change (an intermediate rate defeats the clock framework's
> + * unchanged-rate shortcut) now that the power domain is on and clocks enabled.
> + */
> +#define ROCKET_RK3568_SCMI_CLK	2
> +
> +static int rk3568_noc_init(struct rocket_core *core)
> +{
> +	struct regmap *pmu;
> +	unsigned int val;
> +	int ret;
> +
> +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 600000000UL);
> +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);

[Severity: High]
This is a pre-existing issue, but does this unintentionally alter the system
aclk rate?

Looking at rocket_core_init(), the core->clks array is passed to
devm_clk_bulk_get() without initializing the id strings:

drivers/accel/rocket/rocket_core.c:rocket_core_init() {
    ...
	err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
    ...
}

Because the id fields are NULL, the framework fetches the first DT clock
(aclk) for all elements. Calling clk_set_rate() here seems to alter the
system aclk frequency instead of the intended SCMI clock.

> +
> +	pmu = syscon_regmap_lookup_by_phandle(core->dev->of_node, "rockchip,pmu");
> +	if (IS_ERR(pmu))
> +		return dev_err_probe(core->dev, PTR_ERR(pmu),
> +				     "failed to get PMU regmap\n");
> +
> +	/* Power on the NPU power domain (PWR_GATE_SFTCON bit 1 = 0). */
> +	regmap_write(pmu, 0xa0, BIT(1 + 16));
> +
> +	/* Disable NPU NOC auto-idle (NOC_AUTO_CON0 bit 2). */
> +	regmap_write(pmu, 0x70, BIT(2 + 16));
> +
> +	/* Request NPU bus de-idle (BUS_IDLE_SFTCON0 bit 2 = 0). */
> +	regmap_write(pmu, 0x50, BIT(2 + 16));
> +
> +	/* Wait for the bus to report active (BUS_IDLE_ST bit 2 = 0). */
> +	ret = regmap_read_poll_timeout(pmu, 0x68, val, !(val & BIT(2)), 10, 1000);
> +	if (ret)
> +		dev_err(core->dev, "timed out waiting for NPU bus de-idle\n");
> +
> +	return ret;
> +}

[Severity: High]
Is a NOC bus de-idle sequence required in the runtime PM resume path as well?

If the generic power domain (genpd) powers off the NPU during runtime suspend,
the NOC bus idle state would reset. Currently, rocket_device_runtime_resume()
enables clocks but doesn't repeat this PMU de-idle sequence, which could leave
the AXI bus gated and cause subsequent hardware accesses to hang or crash.

> +
>  static const struct rocket_soc_data rk3568_soc_data = {
>  	.num_cores = 1,
>  	.dma_bits = 32,
>  	.noc_init = rk3568_noc_init,
>  };
>  
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 0013cf196c573..7d40ecb26123f 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -952,6 +952,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>  
>  	ret = rk_iommu_enable_paging(iommu);
>  
> +	if (!ret) {
> +		/*
> +		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> +		 * auto clock-gating; the walker's AXI master then loses its clock
> +		 * between transactions and a TLB-miss page walk never completes,
> +		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> +		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> +		 */
> +		for (i = 0; i < iommu->num_mmu; i++)
> +			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);

[Severity: High]
Will this write operation inadvertently set bit 1 instead of clearing it?

Writing 0x2 (binary 10) sets bit 1 and clears bit 0, which contradicts the
inline comment indicating that bit 1 should be cleared and bit 0 kept. This
appears to enable the buggy page-walker auto clock-gating and disable the
slave-port gate.

> +	}
> +
>  out_disable_stall:
>  	rk_iommu_disable_stall(iommu);
>  out_disable_clocks:

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=7

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data
  2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
@ 2026-06-04 14:08   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-04 14:08 UTC (permalink / raw)
  To: Midgy BALON; +Cc: devicetree, robh, dri-devel, conor+dt

Thank you for your contribution! Sashiko AI review found 11 potential issue(s) to consider:

Pre-existing issues:
- [Critical] kref_put() dereferences a NULL pointer when job initialization fails early or a BO is imported.
- [High] Error path in BO creation frees an object that is already owned by the handle table.
- [High] Device is permanently left in an active power state upon unbind.
- [High] Array mismanagement during core removal prevents multi-core unbind.
- [High] Memory and DRM device leak on probe failure or deferral.
- [High] Asynchronous reset work is canceled after the DRM scheduler is destroyed.
- [High] The DRM device is registered and exposed to userspace before the NPU cores are initialized.
- [High] The global `rdev` pointer and `rdev->num_cores` are accessed and modified locklessly during probe and removal.
- [High] The shared interrupt handler accesses hardware registers without checking the PM state, and suspend executes without synchronizing IRQs.
- [High] A failure in `rocket_job_run` leaks the PM usage counter and returns an unsignaled hardware fence, hanging the DRM scheduler.
- [High] Registers are accessed during initialization before resets are deasserted.
--

commit e6cc2e46cba8fcba41d70a8b1e9422607b444db8
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Introduce per-SoC rocket_soc_data

Add a per-SoC data structure carried in the OF match table, currently
holding only the NPU AXI address width, and use it for the per-core DMA
mask instead of a hardcoded 40-bit value.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index b3b2fa9ba645a..09c445af7de73 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
> @@ -21,6 +22,10 @@ int rocket_core_init(struct rocket_core *core)
>  	u32 version;
>  	int err = 0;
>  
> +	core->soc_data = of_device_get_match_data(dev);
> +	if (!core->soc_data)
> +		return dev_err_probe(dev, -EINVAL, "missing SoC match data\n");
> +

[Severity: High]
This is a pre-existing issue, but I noticed that later in this function,
registers are accessed before resets are deasserted:

drivers/accel/rocket/rocket_core.c:rocket_core_init() {
	...
	core->resets[0].id = "srst_a";
	core->resets[1].id = "srst_h";
	err = devm_reset_control_bulk_get_exclusive(&pdev->dev, ARRAY_SIZE(core->resets),
						    core->resets);
	...
	version = rocket_pc_readl(core, VERSION);
	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
	...
}

Will this cause a bus stall and trigger an asynchronous external abort if the
hardware or bootloader leaves these resets asserted?

[Severity: High]
This is a pre-existing issue, but does rocket_core_fini() leave the device
permanently powered on with its clocks running after the driver has been
detached?

drivers/accel/rocket/rocket_core.c:rocket_core_fini() {
	pm_runtime_dont_use_autosuspend(core->dev);
	pm_runtime_disable(core->dev);
	...
}

Since rocket_core_init() drops its Runtime PM reference asynchronously via
pm_runtime_put_autosuspend(dev), if the driver is unbound before the
autosuspend delay elapses, pm_runtime_disable() forcefully cancels the
pending asynchronous suspend work without first synchronously powering down
the hardware (e.g., via pm_runtime_put_sync()).

>  	core->resets[0].id = "srst_a";
>  	core->resets[1].id = "srst_h";
>  	err = devm_reset_control_bulk_get_exclusive(&pdev->dev, ARRAY_SIZE(core->resets),
[ ... ]
> @@ -52,7 +57,7 @@ int rocket_core_init(struct rocket_core *core)
>  
>  	dma_set_max_seg_size(dev, UINT_MAX);
>  
> -	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
> +	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(core->soc_data->dma_bits));
>  	if (err)
>  		return err;
>  
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 8bbbce594883e..384c38e13acce 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -213,8 +213,12 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }
>  

[Severity: High]
This is a pre-existing issue, but can this array mismanagement during core
removal prevent multi-core unbind?

drivers/accel/rocket/rocket_drv.c:rocket_remove() {
	...
	rocket_core_fini(&rdev->cores[core]);
	rdev->num_cores--;
	...
}

drivers/accel/rocket/rocket_drv.c:find_core_for_dev() {
	for (unsigned int core = 0; core < rdev->num_cores; core++) {
		if (dev == rdev->cores[core].dev)
			return core;
	}
	...
}

When a core is unbound, rdev->num_cores is decremented, but the unbound core's
slot in the static rdev->cores array is left unmodified, and the remaining
elements are not shifted.

Because find_core_for_dev() unconditionally loops only rdev->num_cores times,
any core residing at an array index greater than or equal to the decremented
num_cores becomes invisible.

Will subsequent attempts to unbind those remaining cores fail, silently
exiting early and completely leaking the cores' resources and power state?

[Severity: High]
This is a pre-existing issue, but could rocket_probe() leak memory and the DRM
device on probe failure or deferral?

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
	if (rdev == NULL) {
		/* First core probing, initialize DRM device. */
		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
	...
}

drivers/accel/rocket/rocket_device.c:rocket_device_init() {
	rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
	...
	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores), GFP_KERNEL);
	...
}

The global facade device rdev is dynamically initialized via rocket_device_init()
using the global, virtually registered drm_dev platform device. Allocations like
devm_drm_dev_alloc() and devm_kcalloc() are bound to the drm_dev lifecycle
rather than the physical pdev currently being probed.

If rocket_core_init() subsequently fails (e.g., due to -EPROBE_DEFER),
rocket_probe() cleans up by calling rocket_device_fini() and resetting
rdev = NULL. However, rocket_device_fini() only unregisters the DRM device.

Because drm_dev remains globally alive until module unload, none of the devm_
allocations are freed. On the next probe attempt, a new rdev is allocated on
drm_dev. Can this create a permanent memory leak?

[Severity: High]
This is a pre-existing issue, but might the DRM device be registered and
exposed to userspace before the NPU cores are initialized?

drivers/accel/rocket/rocket_device.c:rocket_device_init() {
	...
	err = drm_dev_register(ddev, 0);
	if (err)
		return ERR_PTR(err);

	return rdev;
}

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
	...
		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
	...
	unsigned int core = rdev->num_cores;
	dev_set_drvdata(&pdev->dev, rdev);

	rdev->cores[core].rdev = rdev;
	rdev->cores[core].dev = &pdev->dev;
	...
}

When the first core is probed, it calls rocket_device_init(), which ends with
drm_dev_register(). At this moment, the DRM character device becomes visible
and openable by userspace.

However, the core hardware, IRQs, and core pointers (e.g. rdev->cores[core].dev)
are not assigned or initialized until immediately afterward, when
rocket_core_init() runs.

If userspace opens the device in this narrow window, will rocket_open()
evaluate the uninitialized rdev->cores[0].dev (which is NULL) and pass it to
rocket_iommu_domain_create(), causing an immediate NULL pointer dereference?

[Severity: High]
This is a pre-existing issue, but is it possible that the global rdev pointer
and rdev->num_cores are accessed and modified locklessly during probe and
removal?

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
	if (rdev == NULL) {
		/* First core probing, initialize DRM device. */
		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
	...
	rdev->num_cores++;
	...
}

If multiple cores are probed concurrently (e.g., via sysfs asynchronous
binding), multiple threads will evaluate `if (rdev == NULL)` simultaneously
without a lock.

Can this create a data race where concurrent threads allocate redundant DRM
devices, overwrite the rdev singleton (causing memory leaks), and corrupt
the shared rdev->num_cores counter and rdev->cores array?

[Severity: Critical]
This is a pre-existing issue, but will kref_put() dereference a NULL pointer
when job initialization fails early or a BO is imported?

drivers/accel/rocket/rocket_job.c:rocket_ioctl_submit_job() {
	...
	goto out_cleanup_job;
	...
out_cleanup_job:
	if (ret)
		drm_sched_job_cleanup(&rjob->base);
out_put_job:
	rocket_job_put(rjob);
}

drivers/accel/rocket/rocket_drv.c:rocket_iommu_domain_put() {
	kref_put(&domain->kref, rocket_iommu_domain_destroy);
}

If an early failure occurs in rocket_ioctl_submit_job() (e.g., in
rocket_copy_tasks() or drm_gem_objects_lookup()), the error path calls
rocket_job_put(), which invokes rocket_job_cleanup(). At this point,
job->domain is still NULL.

rocket_job_cleanup() unconditionally passes this to rocket_iommu_domain_put(),
which immediately evaluates &domain->kref and causes a NULL pointer
dereference.

The same panic appears to occur when a BO is imported via PRIME
(rocket_gem_create_object() leaves domain NULL), and rocket_gem_bo_free()
subsequently accesses bo->domain->domain and bo->domain.

[Severity: High]
This is a pre-existing issue, but could the error path in BO creation free an
object that is already owned by the handle table?

drivers/accel/rocket/rocket_gem.c:rocket_ioctl_create_bo() {
	...
	ret = drm_gem_handle_create(file, gem_obj, &args->handle);
	drm_gem_object_put(gem_obj);
	if (ret)
		goto err;
	...
err:
	drm_gem_shmem_object_free(gem_obj);
}

drm_gem_handle_create() successfully exposes gem_obj to userspace and
increments its handle refcount, after which the local reference is correctly
dropped via drm_gem_object_put().

If a subsequent step fails (such as drm_gem_shmem_get_pages_sgt()), the error
path jumps to err: and manually calls drm_gem_shmem_object_free(gem_obj).

This explicitly frees the backing memory while the object is still actively
referenced by the handle table. When userspace later closes the handle, will
the GEM core double-free the object?

[Severity: High]
This is a pre-existing issue, but should the asynchronous reset work be
canceled before the DRM scheduler is destroyed?

drivers/accel/rocket/rocket_job.c:rocket_job_fini() {
	drm_sched_fini(&core->sched);

	cancel_work_sync(&core->reset.work);
	destroy_workqueue(core->reset.wq);
}

drm_sched_fini(&core->sched) is called before cancel_work_sync(&core->reset.work).

If the background reset task rocket_reset_work() is running concurrently,
it will execute rocket_reset(), which attempts to call drm_sched_stop() and
drm_sched_start() on the core->sched instance that has already been torn down
or freed.

[Severity: High]
This is a pre-existing issue, but could the shared interrupt handler access
hardware registers without checking the PM state, and does suspend execute
without synchronizing IRQs?

drivers/accel/rocket/rocket_job.c:rocket_job_irq_handler() {
	struct rocket_core *core = data;
	u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
	...
}

drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend() {
	...
	clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);

	return 0;
}

The driver requests its interrupt using IRQF_SHARED, meaning
rocket_job_irq_handler() can execute at any time if another device on the same
line interrupts the CPU.

When called, the handler immediately reads hardware registers
(INTERRUPT_RAW_STATUS) without using pm_runtime_get_if_active() to verify the
device is powered. If the device was suspended, will this read return garbage
or cause a bus fault?

Additionally, the rocket_device_runtime_suspend() callback disables the
device's clocks without invoking synchronize_irq(), which might allow
hardware to lose power while an interrupt handler is executing mid-flight.

[Severity: High]
This is a pre-existing issue, but can a failure in rocket_job_run() leak
the PM usage counter and return an unsignaled hardware fence, hanging the
DRM scheduler?

drivers/accel/rocket/rocket_job.c:rocket_job_run() {
	...
	ret = pm_runtime_get_sync(core->dev);
	if (ret < 0)
		return fence;
	...
		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
		if (ret < 0)
			return fence;
	...
}

drivers/accel/rocket/rocket_job.c:rocket_reset() {
	scoped_guard(mutex, &core->job_lock) {
		if (core->in_flight_job)
			pm_runtime_put_noidle(core->dev);

		core->in_flight_job = NULL;
	}
}

If pm_runtime_get_sync() returns an error, it returns the created fence
directly. Since pm_runtime_get_sync() increments the usage counter even upon
failure, this leaks the PM reference (missing pm_runtime_put_noidle()).

Similarly, if iommu_attach_group() fails shortly after, the function again
returns the fence without dropping the successfully acquired PM reference.

Because the hardware submission is skipped in both error paths, the returned
fence is never signaled. The DRM scheduler will block indefinitely waiting for
it, triggering a job timeout.

Will the timeout reset logic (rocket_reset()) fail to clean up these leaked
PM references because core->in_flight_job is evaluated as NULL (it was never
assigned before the early returns)?

> +static const struct rocket_soc_data rk3588_soc_data = {
> +	.dma_bits = 40,
> +};
> +
>  static const struct of_device_id dt_match[] = {
> -	{ .compatible = "rockchip,rk3588-rknn-core" },
> +	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
>  	{}
>  };
>  MODULE_DEVICE_TABLE(of, dt_match);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  2026-06-04 13:52 ` [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout Midgy BALON
@ 2026-06-04 14:10   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-04 14:10 UTC (permalink / raw)
  To: Midgy BALON; +Cc: conor+dt, devicetree, robh, dri-devel

Thank you for your contribution! Sashiko AI review found 9 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Double free of rjob->tasks array when copy_from_user() fails.
- [High] The shared IRQ handler unconditionally accesses hardware registers without verifying the device's PM active state.
- [High] Missing PM reference drop and unsignaled fence return on error paths in `rocket_job_run()`.
- [High] Concurrent hardware reset and IRQ handler execution can cause an external abort.
- [High] Unbalanced teardown order in rocket_core_fini() causes UAF and data races.
- [High] NULL pointer dereference of job->domain in rocket_job_cleanup() on early error paths.
- [High] Missing `synchronize_irq()` in runtime suspend callback.
- [Medium] The TDR reset path permanently leaks power by preventing the device from entering runtime suspend.
- [Medium] The hardware reset is performed after dropping the PM runtime reference, potentially targeting an unpowered device.
--

commit 323afa3b038052ef6c02ee0548658a65126910a0
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Reset the NPU before detaching the IOMMU on timeout

On a job timeout the NPU AXI master can be left wedged with outstanding
transactions. rocket_reset() detached the IOMMU group before resetting
the hardware, causing the IOMMU core to stall. Assert the core reset
first to quiesce the master so the detach completes cleanly.

> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
> @@ -364,14 +364,20 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
>  		if (core->in_flight_job)
>  			pm_runtime_put_noidle(core->dev);

[Severity: Medium]
This is a pre-existing issue, but does this TDR reset path permanently leak
power by preventing the device from entering runtime suspend?

It appears pm_runtime_put_noidle() drops the reference without triggering an
idle power-down. If the timed-out job was the last active job, the usage count
reaches zero but the device remains in the RPM_ACTIVE state indefinitely,
staying powered on until another job is submitted and correctly triggers
autosuspend.

> -		iommu_detach_group(NULL, core->iommu_group);
> -
>  		core->in_flight_job = NULL;
>  	}
>  
> -	/* Proceed with reset now. */
> +	/*
> +	 * Reset the NPU hardware before detaching the IOMMU. A timed-out job
> +	 * leaves the NPU AXI master wedged; detaching the IOMMU then issues a
> +	 * stall request that never drains and times out (warning in the IOMMU
> +	 * core). Asserting the core reset first quiesces the master so the
> +	 * detach completes cleanly.
> +	 */
>  	rocket_core_reset(core);

[Severity: High]
This isn't a bug introduced by this patch, but can a concurrent hardware reset
and IRQ handler execution cause an external abort?

rocket_core_reset() asserts a hardware reset without first disabling device
interrupts. If an interrupt fires just before or during the reset,
rocket_job_irq_handler() will execute and perform MMIO reads and writes to the
bus while the hardware is actively in reset.

[Severity: Medium]
This is also a pre-existing issue, but is it safe to perform the hardware
reset after dropping the PM runtime reference?

Dropping the PM reference via pm_runtime_put_noidle() earlier decrements the
usage count. A concurrently scheduled autosuspend timer could expire precisely
in this window and power down the device before or during the reset operations.

Asserting resets on a suspended device can lead to system bus faults. Should
the PM reference be dropped after the hardware reset is complete?

> +
> +	iommu_detach_group(NULL, core->iommu_group);
> +
>  	/* NPU has been reset, we can clear the reset pending bit. */
>  	atomic_set(&core->reset.pending, 0);

[Severity: High]
This isn't a bug introduced by this patch, but does the shared IRQ handler
unconditionally access hardware registers without verifying the device's PM
active state?

drivers/accel/rocket/rocket_job.c:rocket_job_irq_handler() {
    ...
    u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
    ...
}

The driver registers rocket_job_irq_handler() with the IRQF_SHARED flag.
Without first calling pm_runtime_get_if_active(), if the NPU is runtime
suspended, this read accesses an unpowered register domain. This typically
returns 0xffffffff (causing spurious handling) or triggers a synchronous
external abort.

[Severity: High]
This is a pre-existing issue, but does rocket_job_run() leak the PM reference
and return an unsignaled fence on error paths?

drivers/accel/rocket/rocket_job.c:rocket_job_run() {
    ...
    ret = pm_runtime_get_sync(core->dev);
    if (ret < 0)
        return fence;
    ...
}

If pm_runtime_get_sync() or iommu_attach_group() fails, the function returns
a newly created, un-signaled fence instead of an error pointer, and leaves
the PM usage counter incremented. Because the hardware submission step is
skipped, the hardware will never signal this fence, causing the DRM scheduler
to hang indefinitely.

[Severity: High]
This is a pre-existing issue, but does the teardown order in rocket_core_fini()
cause a use-after-free and data races?

drivers/accel/rocket/rocket_core.c:rocket_core_fini() {
    ...
    iommu_group_put(core->iommu_group);
    core->iommu_group = NULL;
    rocket_job_fini(core);
    ...
}

Resources are torn down in the wrong order. The iommu_group and attached_domain
members are freed, and pm_runtime_disable() is called before rocket_job_fini().

If a scheduled job is processed by the DRM scheduler during this window,
rocket_job_run() will concurrently access the now-NULL iommu_group without
locking. Should rocket_job_fini() be called first to stop the scheduler and
wait for jobs?

[Severity: Critical]
This isn't a bug introduced by this patch, but is there a double free of the
rjob->tasks array when copy_from_user() fails?

drivers/accel/rocket/rocket_job.c:rocket_copy_tasks() {
    ...
fail:
    kvfree(rjob->tasks);
    return ret;
}

If copy_from_user() fails, it frees rjob->tasks but leaves the dangling pointer.

The function returns an error, prompting rocket_ioctl_submit_job() to jump to
out_cleanup_job and call rocket_job_put(rjob). This invokes
rocket_job_cleanup(), which unconditionally calls kvfree(job->tasks),
resulting in a double-free.

[Severity: High]
This is a pre-existing issue, but does rocket_job_cleanup() dereference a NULL
job->domain on early error paths?

drivers/accel/rocket/rocket_job.c:rocket_job_cleanup() {
    ...
    rocket_iommu_domain_put(job->domain);
    ...
}

The job->domain structure is initially NULL and only populated late in the
function. If an earlier step fails (such as rocket_copy_tasks()), the cleanup
unconditionally calls rocket_iommu_domain_put(job->domain).

This wrapper dereferences the domain struct to access its kref without a
NULL check:

drivers/accel/rocket/rocket_drv.c:rocket_iommu_domain_put() {
    kref_put(&domain->kref, rocket_iommu_domain_destroy);
}

[Severity: High]
This isn't a bug introduced by this patch, but is there a missing
synchronize_irq() in the runtime suspend callback?

drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend() {
    ...
    clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks),
                               rdev->cores[core].clks);
    ...
}

The driver registers a shared interrupt but fails to call synchronize_irq().
Disabling the clocks while the IRQ handler is executing concurrently on another
CPU could cause an AXI bus fault or a hard system crash.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  2026-06-04 13:52 ` [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU Midgy BALON
@ 2026-06-04 14:11   ` sashiko-bot
  0 siblings, 0 replies; 20+ messages in thread
From: sashiko-bot @ 2026-06-04 14:11 UTC (permalink / raw)
  To: Midgy BALON; +Cc: conor+dt, devicetree, robh, dri-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] The NPU node simultaneously defines `power-domains` and `rockchip,pmu`, causing the generic power domain framework and the NPU driver to concurrently manipulate the same PMU power and idle registers.
- [Low] Device tree nodes `npu@fde40000` and `iommu@fde4b000` are not sorted by unit address.

Pre-existing issues:
- [Critical] The driver calls `devm_clk_bulk_get` without initializing the `.id` fields of the clock bulk data array, causing all clock requests to incorrectly default to the first clock in the device tree node.
--

commit dff3ba648a490ef1a1c6644d672ca0b9e1d68360
Author: Midgy BALON <midgy971@gmail.com>

arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU

This commit adds the NPU and its associated IOMMU device tree nodes to the 
RK3568 base devicetree. It also configures the necessary clocks, resets, and
power domain for the NPU.

> diff --git a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> index 64bdd8b7754b5..50ce5a5e4fc24 100644
> --- a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi

[ ... ]

> @@ -948,6 +955,37 @@ qos_rga_wr: qos@fe158300 {
>  		reg = <0x0 0xfe158300 0x0 0x20>;
>  	};
>  
> +	rknn_core_0: npu@fde40000 {
[Severity: Low]
This isn't a bug, but should the npu@fde40000 and iommu@fde4b000 nodes be
ordered alphanumerically by unit address? They appear to be inserted after
qos@fe158300 rather than being placed right before gpu@fde60000.

> +		compatible = "rockchip,rk3568-rknn-core";
> +		reg = <0x0 0xfde40000 0x0 0x1000>,
> +		      <0x0 0xfde41000 0x0 0x1000>,
> +		      <0x0 0xfde43000 0x0 0x1000>;
> +		reg-names = "pc", "cna", "core";
> +		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
> +		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>,
> +			 <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_PRE>;
[Severity: Critical]
This is a pre-existing issue, but enabling this node exposes a bug where
devm_clk_bulk_get() is called without initializing the .id fields in
rocket_core_init().

Since the connection ID is NULL, does clk_get() implicitly fall back to
of_clk_get(np, 0) for each loop iteration, mapping all four clock array
entries to ACLK_NPU?

Later, when rk3568_noc_init() dynamically scales the SCMI clock rate to
1 GHz:

    clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);

Does this mistakenly configure the rate of ACLK_NPU instead, which could
overclock the AXI bus and lead to immediate hardware instability?

> +		clock-names = "aclk", "hclk", "npu", "pclk";
> +		assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
> +		assigned-clock-rates = <200000000>;
> +		resets = <&cru SRST_A_NPU>, <&cru SRST_H_NPU>;
> +		reset-names = "srst_a", "srst_h";
> +		power-domains = <&power RK3568_PD_NPU>;
> +		rockchip,pmu = <&pmu>;
[Severity: High]
Could setting both power-domains and rockchip,pmu cause a race condition
with TF-A and corrupt hardware state?

The generic power domain framework inherently manages power gating and NoC
bus idle requests for this domain, and explicitly protects these PMU
transitions with pmu->mutex.

Concurrently, the rocket driver's rk3568_noc_init() issues direct, un-polled
regmap_write() operations to the exact same PMU offsets:

    regmap_write(pmu, 0xa0, BIT(1 + 16));
    ...
    regmap_write(pmu, 0x50, BIT(2 + 16));

Does bypassing GenPD's tracked state machine and the pmu->mutex lock create
a layering violation here?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=8

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
  2026-06-04 14:04   ` sashiko-bot
@ 2026-06-04 14:20   ` Tomeu Vizoso
  1 sibling, 0 replies; 20+ messages in thread
From: Tomeu Vizoso @ 2026-06-04 14:20 UTC (permalink / raw)
  To: Midgy BALON
  Cc: ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will, robin.murphy,
	dri-devel, linux-rockchip, devicetree, linux-arm-kernel, iommu,
	linux-kernel

On Thu, Jun 4, 2026 at 3:53 PM Midgy BALON <midgy971@gmail.com> wrote:
>
> On the RK356x v1 IOMMU, RK_MMU_AUTO_GATING resets to 0x3. Bit 1 enables
> auto clock-gating of the page-table walker, so the walker's AXI master
> loses its clock between transactions; a TLB-miss page walk then never
> completes and the IOMMU is left stuck (PAGING_ENABLED, never IDLE).
>
> Clear bit 1 (keeping bit 0, the slave-port gate) once paging is enabled
> so the walker keeps its clock. This is required for the RK3568 NPU MMU.

Hi,

I'm not able to review this patch myself, but maybe it can be
submitted separately while we work on the NPU bits?

Regards,

Tomeu

> Signed-off-by: Midgy BALON <midgy971@gmail.com>
> ---
>  drivers/iommu/rockchip-iommu.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 4da80136933c4..e3d8b6e9ca12b 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -953,6 +953,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>
>         ret = rk_iommu_enable_paging(iommu);
>
> +       if (!ret) {
> +               /*
> +                * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> +                * auto clock-gating; the walker's AXI master then loses its clock
> +                * between transactions and a TLB-miss page walk never completes,
> +                * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> +                * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> +                */
> +               for (i = 0; i < iommu->num_mmu; i++)
> +                       rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);
> +       }
> +
>  out_disable_stall:
>         rk_iommu_disable_stall(iommu);
>  out_disable_clocks:
> --
> 2.39.5
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  2026-06-04 14:08   ` sashiko-bot
@ 2026-06-04 16:55     ` Conor Dooley
  0 siblings, 0 replies; 20+ messages in thread
From: Conor Dooley @ 2026-06-04 16:55 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: Midgy BALON, conor+dt, dri-devel, robh, devicetree

[-- Attachment #1: Type: text/plain, Size: 10212 bytes --]

On Thu, Jun 04, 2026 at 02:08:50PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> 
> New issues:
> - [High] The value written to `RK_MMU_AUTO_GATING` sets bit 1 instead of clearing it, contradicting the inline comment and hardware workaround logic.
> - [High] The error handling path for `noc_init()` failure leaks the IOMMU group reference and leaves the device in an unbalanced PM runtime state.
> - [High] A malformed device tree with excessive NPU core nodes triggers an out-of-bounds write in `rocket_probe()` array access.
> - [High] The NOC bus de-idle sequence via the PMU is missing from the runtime PM resume path.
> - [Medium] The `rockchip,pmu` property is functionally mandatory for RK3568 but is not added to the schema's conditional `required` list.
> 
> Pre-existing issues:
> - [High] The clock names in the `core->clks` array are uninitialized before being passed to `devm_clk_bulk_get()`, causing the framework to fetch the wrong clocks.
> --
> 
> commit 6c61da961bed3360a99f625d4a3b1599e7bfe685
> Author: Midgy BALON <midgy971@gmail.com>
> 
>     dt-bindings: npu: rockchip, rk3588-rknn-core: Add RK3568
>     
>     The RK3568 carries a single core of the same NVDLA-derived NPU IP as the
>     RK3588. Add its compatible. On RK3568 the NPU NOC bus-idle and power
>     gating are controlled through the system PMU rather than a dedicated
>     register block.
> 
> > diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> > index caca2a4903cd1..af9936b32e9fe 100644
> > --- a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> > +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> [ ... ]
> > @@ -75,7 +83,15 @@ required:
> >    - resets
> >    - reset-names
> >    - npu-supply
> > -  - sram-supply
> > +
> > +if:
> > +  properties:
> > +    compatible:
> > +      contains:
> > +        const: rockchip,rk3588-rknn-core
> > +then:
> > +  required:
> > +    - sram-supply
> 
> [Severity: Medium]
> Does the missing rockchip,pmu property from the conditional required block
> cause incomplete device trees to pass validation?
> 
> The rockchip,pmu property is functionally mandatory for the
> rockchip,rk3568-rknn-core variant as the driver unconditionally fails to probe
> without it. Missing this mandatory property here could allow invalid device
> trees to go undetected.

This looks valid. Additionally, the condition should forbid this
property on rk3588.
pw-bot: changes-requested
> 
> >  additionalProperties: false
> >  
> > diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c

Why is the bot commenting on driver stuff here?

Cheers,
Conor.

> > index b3b2fa9ba645a..634f78dfe2887 100644
> > --- a/drivers/accel/rocket/rocket_core.c
> > +++ b/drivers/accel/rocket/rocket_core.c
> [ ... ]
> > @@ -83,6 +89,15 @@ int rocket_core_init(struct rocket_core *core)
> >  		return err;
> >  	}
> >  
> > +	if (core->soc_data->noc_init) {
> > +		err = core->soc_data->noc_init(core);
> > +		if (err) {
> > +			pm_runtime_put_sync(dev);
> > +			rocket_job_fini(core);
> > +			return err;
> > +		}
> > +	}
> 
> [Severity: High]
> Does this error path leak the IOMMU group reference and leave the device in an
> unbalanced PM runtime state?
> 
> By returning directly after pm_runtime_put_sync() and rocket_job_fini(), the
> code misses calling rocket_core_fini() which handles iommu_group_put(),
> pm_runtime_disable(), and pm_runtime_dont_use_autosuspend().
> 
> > +
> >  	version = rocket_pc_readl(core, VERSION);
> >  	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
> >  
> > diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> > index 46e6ee1e72c5f..6186f4faa3a2a 100644
> > --- a/drivers/accel/rocket/rocket_device.c
> > +++ b/drivers/accel/rocket/rocket_device.c
> [ ... ]
> > @@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
> >  	ddev = &rdev->ddev;
> >  	dev_set_drvdata(dev, rdev);
> >  
> > -	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> > -		if (of_device_is_available(core_node))
> > -			num_cores++;
> > -
> > -	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> > +	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
> > +				   GFP_KERNEL);
> 
> [Severity: High]
> Can this code overflow rdev->cores[] if the device tree contains more NPU
> nodes than soc_data->num_cores?
> 
> With this change, the array size is fixed to soc_data->num_cores, but
> rocket_probe() in drivers/accel/rocket/rocket_drv.c is still called for every
> matching node. If there are too many nodes in the device tree, it appears
> rocket_probe() will write out of bounds here:
> 
> drivers/accel/rocket/rocket_drv.c:rocket_probe() {
>     ...
> 	unsigned int core = rdev->num_cores;
>     ...
> 	rdev->cores[core].rdev = rdev;
> 	rdev->cores[core].dev = &pdev->dev;
> 	rdev->cores[core].index = core;
>     ...
> }
> 
> >  	if (!rdev->cores)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> > index 8bbbce594883e..5a72d0b5f4dff 100644
> > --- a/drivers/accel/rocket/rocket_drv.c
> > +++ b/drivers/accel/rocket/rocket_drv.c
> [ ... ]
> > @@ -213,8 +219,64 @@ static void rocket_remove(struct platform_device *pdev)
> >  	}
> >  }
> >  
> > +/*
> > + * On RK3568 the NPU NOC bus is gated and idle out of reset and must be
> > + * powered on and de-idled through the PMU before the NPU is reachable.  PMU
> > + * registers use a write-mask protocol: the upper 16 bits enable writes to the
> > + * matching lower 16 bits.
> > + *
> > + * The NPU's high-speed clock is a PVTPLL managed by TF-A via SCMI and must be
> > + * running before the NOC acknowledges the de-idle request.  Force a real SCMI
> > + * rate change (an intermediate rate defeats the clock framework's
> > + * unchanged-rate shortcut) now that the power domain is on and clocks enabled.
> > + */
> > +#define ROCKET_RK3568_SCMI_CLK	2
> > +
> > +static int rk3568_noc_init(struct rocket_core *core)
> > +{
> > +	struct regmap *pmu;
> > +	unsigned int val;
> > +	int ret;
> > +
> > +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 600000000UL);
> > +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);
> 
> [Severity: High]
> This is a pre-existing issue, but does this unintentionally alter the system
> aclk rate?
> 
> Looking at rocket_core_init(), the core->clks array is passed to
> devm_clk_bulk_get() without initializing the id strings:
> 
> drivers/accel/rocket/rocket_core.c:rocket_core_init() {
>     ...
> 	err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
>     ...
> }
> 
> Because the id fields are NULL, the framework fetches the first DT clock
> (aclk) for all elements. Calling clk_set_rate() here seems to alter the
> system aclk frequency instead of the intended SCMI clock.
> 
> > +
> > +	pmu = syscon_regmap_lookup_by_phandle(core->dev->of_node, "rockchip,pmu");
> > +	if (IS_ERR(pmu))
> > +		return dev_err_probe(core->dev, PTR_ERR(pmu),
> > +				     "failed to get PMU regmap\n");
> > +
> > +	/* Power on the NPU power domain (PWR_GATE_SFTCON bit 1 = 0). */
> > +	regmap_write(pmu, 0xa0, BIT(1 + 16));
> > +
> > +	/* Disable NPU NOC auto-idle (NOC_AUTO_CON0 bit 2). */
> > +	regmap_write(pmu, 0x70, BIT(2 + 16));
> > +
> > +	/* Request NPU bus de-idle (BUS_IDLE_SFTCON0 bit 2 = 0). */
> > +	regmap_write(pmu, 0x50, BIT(2 + 16));
> > +
> > +	/* Wait for the bus to report active (BUS_IDLE_ST bit 2 = 0). */
> > +	ret = regmap_read_poll_timeout(pmu, 0x68, val, !(val & BIT(2)), 10, 1000);
> > +	if (ret)
> > +		dev_err(core->dev, "timed out waiting for NPU bus de-idle\n");
> > +
> > +	return ret;
> > +}
> 
> [Severity: High]
> Is a NOC bus de-idle sequence required in the runtime PM resume path as well?
> 
> If the generic power domain (genpd) powers off the NPU during runtime suspend,
> the NOC bus idle state would reset. Currently, rocket_device_runtime_resume()
> enables clocks but doesn't repeat this PMU de-idle sequence, which could leave
> the AXI bus gated and cause subsequent hardware accesses to hang or crash.
> 
> > +
> >  static const struct rocket_soc_data rk3568_soc_data = {
> >  	.num_cores = 1,
> >  	.dma_bits = 32,
> >  	.noc_init = rk3568_noc_init,
> >  };
> >  
> > diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> > index 0013cf196c573..7d40ecb26123f 100644
> > --- a/drivers/iommu/rockchip-iommu.c
> > +++ b/drivers/iommu/rockchip-iommu.c
> > @@ -952,6 +952,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
> >  
> >  	ret = rk_iommu_enable_paging(iommu);
> >  
> > +	if (!ret) {
> > +		/*
> > +		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> > +		 * auto clock-gating; the walker's AXI master then loses its clock
> > +		 * between transactions and a TLB-miss page walk never completes,
> > +		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> > +		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> > +		 */
> > +		for (i = 0; i < iommu->num_mmu; i++)
> > +			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);
> 
> [Severity: High]
> Will this write operation inadvertently set bit 1 instead of clearing it?
> 
> Writing 0x2 (binary 10) sets bit 1 and clears bit 0, which contradicts the
> inline comment indicating that bit 1 should be cleared and bit 0 kept. This
> appears to enable the buggy page-walker auto clock-gating and disable the
> slave-port gate.
> 
> > +	}
> > +
> >  out_disable_stall:
> >  	rk_iommu_disable_stall(iommu);
> >  out_disable_clocks:
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=7

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-06-04 16:55 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
2026-06-04 14:08   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data Midgy BALON
2026-06-04 14:05   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support Midgy BALON
2026-06-04 14:05   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout Midgy BALON
2026-06-04 14:10   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs Midgy BALON
2026-06-04 14:08   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
2026-06-04 14:04   ` sashiko-bot
2026-06-04 14:20   ` Tomeu Vizoso
2026-06-04 13:52 ` [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 Midgy BALON
2026-06-04 14:08   ` sashiko-bot
2026-06-04 16:55     ` Conor Dooley
2026-06-04 13:52 ` [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU Midgy BALON
2026-06-04 14:11   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 9/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU Midgy BALON

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox