Devicetree
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
@ 2026-06-04 13:52 Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

RFC, not for merge. End-to-end inference does not produce correct output
yet (see Status), so per the v2 discussion this is a request for design
feedback. It now probes, attaches, and submits cleanly on a stock
v7.1-rc6 tree; what remains is one hardware-internal issue.

The RK3568 has a single NVDLA-derived NPU core, the same IP family as the
RK3588 NPU the driver already supports; the register layout matches. The
RK3568 differences are a 32-bit NPU AXI/IOMMU (vs 40-bit) and explicit
PVTPLL/PMU bring-up to power and de-idle the NPU before it is reachable.

Patches:
  1-2  rocket: per-SoC data struct, then derive DMA width and core count
       from match data (refactors, no functional change).
  3    rocket: RK3568 SoC data + PVTPLL/PMU/NOC bring-up.
  4    rocket: reset the NPU before detaching the IOMMU on a job timeout
       (the detach otherwise stalls a wedged AXI master and WARNs).
  5    rocket: keep the IOMMU domain attached across jobs instead of
       re-attaching per job (the per-job rk_iommu handshake on the idle
       NPU MMU is slow and noisy).
  6    iommu/rockchip: clear AUTO_GATING bit 1 on the RK356x v1 IOMMU so
       the page-walker keeps its clock (else a TLB-miss walk never
       completes).
  7    dt-bindings: add the RK3568 NPU compatible.
  8-9  arm64 dts: add the NPU and its IOMMU, and enable them on ROCK 3B.

Dependency. The NPU MMU is rockchip-iommu v1 (32-bit) while the rest of
the RK3568 uses v2 (40-bit). They cannot coexist until the driver carries
per-device ops; this series is developed on top of Simon Xue's
"iommu/rockchip: Drop global rk_ops in favor of per-device ops" [1].
Without it the NPU IOMMU fails to probe on a full RK3568 boot.

Power bring-up. The NPU is brought up through the power-domain layer (no
driver hack): the NPU power-domain keeps its clocks but drops the pm_qos
phandle (qos_npu sits behind the gated NPU NoC, so genpd's power-off QoS
save faults reading it), and vdd_npu is marked always-on so the rail is
up before genpd de-idles the NoC at power-on. The PMU de-idle then ACKs
without PVTPLL running; PVTPLL is only needed for compute.

Status. On v7.1-rc6 the driver probes, creates /dev/accel/accel0,
attaches an IOMMU domain, and submits jobs; the program controller
fetches and broadcasts the command list. Inference output is still wrong,
and the cause is split across three layers:
  - kernel (this series): the RK3568 differences appear handled;
  - mesa/Teflon userspace: still emits RK3588-tuned config, wrong for
    RK3568 (to be filed separately on mesa-dev);
  - hardware: with corrected config the NPU's DMA reads the full input
    and weight tensors (confirmed via its DMA bandwidth counters), but
    the MAC/output stage never completes, the job times out, and the
    output stays at the buffer's zero-point. I have not found the missing
    step; it is not in the command list (replaying the vendor's
    byte-exact command list behaves the same). Pointers welcome,
    especially from anyone with RK3568 NPU experience.

Known residual. On the first IOMMU attach the NPU MMU is idle with paging
already enabled; the rk_iommu stall/reset handshake does not complete in
that state and logs one burst of timeouts before the (kept) domain
settles. It is harmless here because the job times out regardless, but it
points at an idle-MMU reconfiguration corner the rk_iommu code does not
handle on this block.

[1] https://lore.kernel.org/linux-rockchip/20260310105303.128859-1-xxm@rock-chips.com/

Changes since v2:
  - Tagged RFC; now tested on a stock v7.1-rc6 tree.
  - Bring-up moved into the power-domain/DT layer (no initcall hack).
  - Added the IOMMU detach-on-timeout and attach-once driver fixes.
  - Split the driver patch (Heiko): soc_data / match-data / RK3568.
  - Derive DMA width and core count from match data; drop the DT rescans.
  - Binding describes the hardware; added the missing $ref on rockchip,pmu.
  - Disclosed the per-device-ops IOMMU dependency.

Midgy BALON (9):
  accel: rocket: Introduce per-SoC rocket_soc_data
  accel: rocket: Derive DMA width and core count from match data
  accel: rocket: Add RK3568 SoC support
  accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  accel: rocket: Keep the IOMMU domain attached across jobs
  iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU

 .../npu/rockchip,rk3588-rknn-core.yaml        | 18 ++++-
 .../boot/dts/rockchip/rk3568-rock-3b.dts      | 14 +++-
 arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 +++++++++++
 drivers/accel/rocket/rocket_core.c            | 22 ++++++-
 drivers/accel/rocket/rocket_core.h            | 19 ++++++
 drivers/accel/rocket/rocket_device.c          | 15 ++---
 drivers/accel/rocket/rocket_device.h          |  3 +-
 drivers/accel/rocket/rocket_drv.c             | 66 ++++++++++++++++++-
 drivers/accel/rocket/rocket_job.c             | 35 ++++++++--
 drivers/iommu/rockchip-iommu.c                | 12 ++++
 10 files changed, 219 insertions(+), 23 deletions(-)


base-commit: 52c800fdcf11888ebeb50c3d707f782cc15b66eb
-- 
2.39.5


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:08   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data Midgy BALON
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Add a per-SoC data structure carried in the OF match table, currently
holding only the NPU AXI address width, and use it for the per-core DMA
mask instead of a hardcoded 40-bit value.  No functional change: the
RK3588 AXI master is 40-bit.  This prepares for SoCs with a narrower
address width.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c |  7 ++++++-
 drivers/accel/rocket/rocket_core.h | 11 +++++++++++
 drivers/accel/rocket/rocket_drv.c  |  6 +++++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index b3b2fa9ba645a..09c445af7de73 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -7,6 +7,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/err.h>
 #include <linux/iommu.h>
+#include <linux/of.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 #include <linux/reset.h>
@@ -21,6 +22,10 @@ int rocket_core_init(struct rocket_core *core)
 	u32 version;
 	int err = 0;
 
+	core->soc_data = of_device_get_match_data(dev);
+	if (!core->soc_data)
+		return dev_err_probe(dev, -EINVAL, "missing SoC match data\n");
+
 	core->resets[0].id = "srst_a";
 	core->resets[1].id = "srst_h";
 	err = devm_reset_control_bulk_get_exclusive(&pdev->dev, ARRAY_SIZE(core->resets),
@@ -52,7 +57,7 @@ int rocket_core_init(struct rocket_core *core)
 
 	dma_set_max_seg_size(dev, UINT_MAX);
 
-	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(core->soc_data->dma_bits));
 	if (err)
 		return err;
 
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index f6d7382854ca9..8ee105a0be40e 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -12,6 +12,16 @@
 
 #include "rocket_registers.h"
 
+struct rocket_core;
+
+/**
+ * struct rocket_soc_data - per-SoC configuration data
+ * @dma_bits: Physical address width reachable by the NPU's AXI master.
+ */
+struct rocket_soc_data {
+	unsigned int dma_bits;
+};
+
 #define rocket_pc_readl(core, reg) \
 	readl((core)->pc_iomem + (REG_PC_##reg))
 #define rocket_pc_writel(core, reg, value) \
@@ -31,6 +41,7 @@ struct rocket_core {
 	struct device *dev;
 	struct rocket_device *rdev;
 	unsigned int index;
+	const struct rocket_soc_data *soc_data;
 
 	int irq;
 	void __iomem *pc_iomem;
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 8bbbce594883e..384c38e13acce 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -213,8 +213,12 @@ static void rocket_remove(struct platform_device *pdev)
 	}
 }
 
+static const struct rocket_soc_data rk3588_soc_data = {
+	.dma_bits = 40,
+};
+
 static const struct of_device_id dt_match[] = {
-	{ .compatible = "rockchip,rk3588-rknn-core" },
+	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
 	{}
 };
 MODULE_DEVICE_TABLE(of, dt_match);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:05   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support Midgy BALON
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

The probe already has the per-SoC match data, which now records the core
count and DMA width.  Use it for the cores array allocation and the
device DMA mask instead of re-scanning the device tree for available core
nodes.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.h   |  2 ++
 drivers/accel/rocket/rocket_device.c | 15 +++++----------
 drivers/accel/rocket/rocket_device.h |  3 ++-
 drivers/accel/rocket/rocket_drv.c    |  7 ++++++-
 4 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 8ee105a0be40e..d6421251670dc 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -16,9 +16,11 @@ struct rocket_core;
 
 /**
  * struct rocket_soc_data - per-SoC configuration data
+ * @num_cores: Number of NPU cores in this SoC.
  * @dma_bits: Physical address width reachable by the NPU's AXI master.
  */
 struct rocket_soc_data {
+	unsigned int num_cores;
 	unsigned int dma_bits;
 };
 
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index 46e6ee1e72c5f..6186f4faa3a2a 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -6,18 +6,16 @@
 #include <linux/clk.h>
 #include <linux/dma-mapping.h>
 #include <linux/platform_device.h>
-#include <linux/of.h>
 
 #include "rocket_device.h"
 
 struct rocket_device *rocket_device_init(struct platform_device *pdev,
-					 const struct drm_driver *rocket_drm_driver)
+					 const struct drm_driver *rocket_drm_driver,
+					 const struct rocket_soc_data *soc_data)
 {
 	struct device *dev = &pdev->dev;
-	struct device_node *core_node;
 	struct rocket_device *rdev;
 	struct drm_device *ddev;
-	unsigned int num_cores = 0;
 	int err;
 
 	rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
@@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
 	ddev = &rdev->ddev;
 	dev_set_drvdata(dev, rdev);
 
-	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
-		if (of_device_is_available(core_node))
-			num_cores++;
-
-	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
+	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
+				   GFP_KERNEL);
 	if (!rdev->cores)
 		return ERR_PTR(-ENOMEM);
 
 	dma_set_max_seg_size(dev, UINT_MAX);
 
-	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(soc_data->dma_bits));
 	if (err)
 		return ERR_PTR(err);
 
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index ce662abc01d3d..2f74e078974e3 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -22,7 +22,8 @@ struct rocket_device {
 };
 
 struct rocket_device *rocket_device_init(struct platform_device *pdev,
-					 const struct drm_driver *rocket_drm_driver);
+					 const struct drm_driver *rocket_drm_driver,
+					 const struct rocket_soc_data *soc_data);
 void rocket_device_fini(struct rocket_device *rdev);
 #define to_rocket_device(drm_dev) \
 	((struct rocket_device *)(container_of((drm_dev), struct rocket_device, ddev)))
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 384c38e13acce..c18840e5aff76 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -159,11 +159,15 @@ static const struct drm_driver rocket_drm_driver = {
 
 static int rocket_probe(struct platform_device *pdev)
 {
+	const struct rocket_soc_data *soc_data = of_device_get_match_data(&pdev->dev);
 	int ret;
 
+	if (!soc_data)
+		return -EINVAL;
+
 	if (rdev == NULL) {
 		/* First core probing, initialize DRM device. */
-		rdev = rocket_device_init(drm_dev, &rocket_drm_driver);
+		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
 		if (IS_ERR(rdev)) {
 			dev_err(&pdev->dev, "failed to initialize rocket device\n");
 			return PTR_ERR(rdev);
@@ -214,6 +218,7 @@ static void rocket_remove(struct platform_device *pdev)
 }
 
 static const struct rocket_soc_data rk3588_soc_data = {
+	.num_cores = 3,
 	.dma_bits = 40,
 };
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
  2026-06-04 13:52 ` [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:05   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout Midgy BALON
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

The RK3568 has a single core of the same NVDLA-derived NPU IP as the
RK3588, with a 32-bit AXI master.  Unlike the RK3588 it must be powered
on and de-idled through the PMU, and its PVTPLL clock started via SCMI,
before the NPU is reachable.  Add rk3568_soc_data with an noc_init
callback performing this bring-up.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c |  9 +++++
 drivers/accel/rocket/rocket_core.h |  3 ++
 drivers/accel/rocket/rocket_drv.c  | 53 ++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index 09c445af7de73..a8de876365873 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -88,6 +88,15 @@ int rocket_core_init(struct rocket_core *core)
 		return err;
 	}
 
+	if (core->soc_data->noc_init) {
+		err = core->soc_data->noc_init(core);
+		if (err) {
+			pm_runtime_put_sync(dev);
+			rocket_job_fini(core);
+			return err;
+		}
+	}
+
 	version = rocket_pc_readl(core, VERSION);
 	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
 
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index d6421251670dc..66d138a8ed773 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -18,10 +18,13 @@ struct rocket_core;
  * struct rocket_soc_data - per-SoC configuration data
  * @num_cores: Number of NPU cores in this SoC.
  * @dma_bits: Physical address width reachable by the NPU's AXI master.
+ * @noc_init: Optional callback to power on and de-idle the NPU NOC bus.
+ *            Required on RK3568, where this is done through the PMU.
  */
 struct rocket_soc_data {
 	unsigned int num_cores;
 	unsigned int dma_bits;
+	int (*noc_init)(struct rocket_core *core);
 };
 
 #define rocket_pc_readl(core, reg) \
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index c18840e5aff76..5a72d0b5f4dff 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -9,9 +9,11 @@
 #include <linux/clk.h>
 #include <linux/err.h>
 #include <linux/iommu.h>
+#include <linux/mfd/syscon.h>
 #include <linux/of.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
+#include <linux/regmap.h>
 
 #include "rocket_device.h"
 #include "rocket_drv.h"
@@ -217,12 +219,63 @@ static void rocket_remove(struct platform_device *pdev)
 	}
 }
 
+/*
+ * On RK3568 the NPU NOC bus is gated and idle out of reset and must be
+ * powered on and de-idled through the PMU before the NPU is reachable.  PMU
+ * registers use a write-mask protocol: the upper 16 bits enable writes to the
+ * matching lower 16 bits.
+ *
+ * The NPU's high-speed clock is a PVTPLL managed by TF-A via SCMI and must be
+ * running before the NOC acknowledges the de-idle request.  Force a real SCMI
+ * rate change (an intermediate rate defeats the clock framework's
+ * unchanged-rate shortcut) now that the power domain is on and clocks enabled.
+ */
+#define ROCKET_RK3568_SCMI_CLK	2
+
+static int rk3568_noc_init(struct rocket_core *core)
+{
+	struct regmap *pmu;
+	unsigned int val;
+	int ret;
+
+	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 600000000UL);
+	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);
+
+	pmu = syscon_regmap_lookup_by_phandle(core->dev->of_node, "rockchip,pmu");
+	if (IS_ERR(pmu))
+		return dev_err_probe(core->dev, PTR_ERR(pmu),
+				     "failed to get PMU regmap\n");
+
+	/* Power on the NPU power domain (PWR_GATE_SFTCON bit 1 = 0). */
+	regmap_write(pmu, 0xa0, BIT(1 + 16));
+
+	/* Disable NPU NOC auto-idle (NOC_AUTO_CON0 bit 2). */
+	regmap_write(pmu, 0x70, BIT(2 + 16));
+
+	/* Request NPU bus de-idle (BUS_IDLE_SFTCON0 bit 2 = 0). */
+	regmap_write(pmu, 0x50, BIT(2 + 16));
+
+	/* Wait for the bus to report active (BUS_IDLE_ST bit 2 = 0). */
+	ret = regmap_read_poll_timeout(pmu, 0x68, val, !(val & BIT(2)), 10, 1000);
+	if (ret)
+		dev_err(core->dev, "timed out waiting for NPU bus de-idle\n");
+
+	return ret;
+}
+
+static const struct rocket_soc_data rk3568_soc_data = {
+	.num_cores = 1,
+	.dma_bits = 32,
+	.noc_init = rk3568_noc_init,
+};
+
 static const struct rocket_soc_data rk3588_soc_data = {
 	.num_cores = 3,
 	.dma_bits = 40,
 };
 
 static const struct of_device_id dt_match[] = {
+	{ .compatible = "rockchip,rk3568-rknn-core", .data = &rk3568_soc_data },
 	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
 	{}
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (2 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:10   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs Midgy BALON
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

On a job timeout the NPU AXI master can be left wedged with
outstanding transactions. rocket_reset() detached the IOMMU group
before resetting the hardware, so iommu_detach_group() ->
__iommu_group_set_core_domain() asked the rk_iommu to stall and wait
for the in-flight transactions to drain. They never did, the stall
request timed out (-ETIMEDOUT) and the IOMMU core WARNed:

  WARNING: drivers/iommu/iommu.c:157 __iommu_group_set_core_domain
    iommu_detach_group
    rocket_reset
    rocket_job_timedout

Assert the core reset first: it quiesces the AXI master so the
following IOMMU detach completes cleanly. Move the detach after
rocket_core_reset() and out of the job_lock (it does not touch
in_flight_job).

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_job.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index ac51bff39833f..e25234261536b 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -364,14 +364,20 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
 		if (core->in_flight_job)
 			pm_runtime_put_noidle(core->dev);
 
-		iommu_detach_group(NULL, core->iommu_group);
-
 		core->in_flight_job = NULL;
 	}
 
-	/* Proceed with reset now. */
+	/*
+	 * Reset the NPU hardware before detaching the IOMMU. A timed-out job
+	 * leaves the NPU AXI master wedged; detaching the IOMMU then issues a
+	 * stall request that never drains and times out (warning in the IOMMU
+	 * core). Asserting the core reset first quiesces the master so the
+	 * detach completes cleanly.
+	 */
 	rocket_core_reset(core);
 
+	iommu_detach_group(NULL, core->iommu_group);
+
 	/* NPU has been reset, we can clear the reset pending bit. */
 	atomic_set(&core->reset.pending, 0);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (3 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:08   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

rocket attached the job's IOMMU domain in rocket_job_run() and
detached it again on every completion and reset. Each attach/detach
toggles the rk_iommu stall/force-reset/paging handshake, and on
RK3568 the NPU MMU is idle between jobs, so that handshake times out
and logs a burst of "stall/paging request timed out" errors for
every job.

Attach the per-context domain once and keep it: track the attached
domain in the core, swap it only when a job from a different context
runs, and detach it at core teardown. A reference on the attached
domain is held so it outlives the job that first attached it and is
released on swap/teardown.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/accel/rocket/rocket_core.c |  6 ++++++
 drivers/accel/rocket/rocket_core.h |  3 +++
 drivers/accel/rocket/rocket_job.c  | 27 +++++++++++++++++++++------
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index a8de876365873..634f78dfe2887 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -13,6 +13,7 @@
 #include <linux/reset.h>
 
 #include "rocket_core.h"
+#include "rocket_drv.h"
 #include "rocket_job.h"
 
 int rocket_core_init(struct rocket_core *core)
@@ -112,6 +113,11 @@ void rocket_core_fini(struct rocket_core *core)
 {
 	pm_runtime_dont_use_autosuspend(core->dev);
 	pm_runtime_disable(core->dev);
+	if (core->attached_domain) {
+		iommu_detach_group(NULL, core->iommu_group);
+		rocket_iommu_domain_put(core->attached_domain);
+		core->attached_domain = NULL;
+	}
 	iommu_group_put(core->iommu_group);
 	core->iommu_group = NULL;
 	rocket_job_fini(core);
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 66d138a8ed773..05a197a9c0113 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -42,6 +42,8 @@ struct rocket_soc_data {
 #define rocket_core_writel(core, reg, value) \
 	writel(value, (core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
 
+struct rocket_iommu_domain;
+
 struct rocket_core {
 	struct device *dev;
 	struct rocket_device *rdev;
@@ -56,6 +58,7 @@ struct rocket_core {
 	struct reset_control_bulk_data resets[2];
 
 	struct iommu_group *iommu_group;
+	struct rocket_iommu_domain *attached_domain;
 
 	struct mutex job_lock;
 	struct rocket_job *in_flight_job;
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index e25234261536b..b248371be8a1e 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -9,6 +9,7 @@
 #include <drm/rocket_accel.h>
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
+#include <linux/kref.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 
@@ -314,9 +315,26 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
 	if (ret < 0)
 		return fence;
 
-	ret = iommu_attach_group(job->domain->domain, core->iommu_group);
-	if (ret < 0)
-		return fence;
+	/*
+	 * Attach the job's IOMMU domain only when it differs from the one
+	 * already attached. Re-attaching per job toggles the rk_iommu
+	 * stall/reset handshake on an idle NPU MMU, which is slow and
+	 * noisy; keep the domain attached across jobs instead.
+	 */
+	if (core->attached_domain != job->domain) {
+		if (core->attached_domain) {
+			iommu_detach_group(NULL, core->iommu_group);
+			rocket_iommu_domain_put(core->attached_domain);
+			core->attached_domain = NULL;
+		}
+
+		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
+		if (ret < 0)
+			return fence;
+
+		kref_get(&job->domain->kref);
+		core->attached_domain = job->domain;
+	}
 
 	scoped_guard(mutex, &core->job_lock) {
 		core->in_flight_job = job;
@@ -340,7 +358,6 @@ static void rocket_job_handle_irq(struct rocket_core *core)
 				return;
 			}
 
-			iommu_detach_group(NULL, iommu_group_get(core->dev));
 			dma_fence_signal(core->in_flight_job->done_fence);
 			pm_runtime_put_autosuspend(core->dev);
 			core->in_flight_job = NULL;
@@ -376,8 +393,6 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
 	 */
 	rocket_core_reset(core);
 
-	iommu_detach_group(NULL, core->iommu_group);
-
 	/* NPU has been reset, we can clear the reset pending bit. */
 	atomic_set(&core->reset.pending, 0);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (4 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:04   ` sashiko-bot
                     ` (2 more replies)
  2026-06-04 13:52 ` [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 Midgy BALON
                   ` (3 subsequent siblings)
  9 siblings, 3 replies; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

On the RK356x v1 IOMMU, RK_MMU_AUTO_GATING resets to 0x3. Bit 1 enables
auto clock-gating of the page-table walker, so the walker's AXI master
loses its clock between transactions; a TLB-miss page walk then never
completes and the IOMMU is left stuck (PAGING_ENABLED, never IDLE).

Clear bit 1 (keeping bit 0, the slave-port gate) once paging is enabled
so the walker keeps its clock. This is required for the RK3568 NPU MMU.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/iommu/rockchip-iommu.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 4da80136933c4..e3d8b6e9ca12b 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -953,6 +953,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
 
 	ret = rk_iommu_enable_paging(iommu);
 
+	if (!ret) {
+		/*
+		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
+		 * auto clock-gating; the walker's AXI master then loses its clock
+		 * between transactions and a TLB-miss page walk never completes,
+		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
+		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
+		 */
+		for (i = 0; i < iommu->num_mmu; i++)
+			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);
+	}
+
 out_disable_stall:
 	rk_iommu_disable_stall(iommu);
 out_disable_clocks:
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (5 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:08   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU Midgy BALON
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

The RK3568 carries a single core of the same NVDLA-derived NPU IP as the
RK3588.  Add its compatible.

On RK3568 the NPU NOC bus-idle and power gating are controlled through the
system PMU rather than a dedicated register block, so add a rockchip,pmu
phandle to that syscon.  The RK3568 NPU has no dedicated SRAM rail, so
sram-supply is required only on RK3588.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 .../npu/rockchip,rk3588-rknn-core.yaml         | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
index caca2a4903cd1..af9936b32e9fe 100644
--- a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
+++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
@@ -21,6 +21,7 @@ properties:
 
   compatible:
     enum:
+      - rockchip,rk3568-rknn-core
       - rockchip,rk3588-rknn-core
 
   reg:
@@ -50,6 +51,13 @@ properties:
 
   npu-supply: true
 
+  rockchip,pmu:
+    $ref: /schemas/types.yaml#/definitions/phandle
+    description:
+      Phandle to the PMU syscon.  On RK3568 the NPU's NOC bus-idle and
+      power gating are controlled through the PMU; this points to that
+      syscon so those registers can be reached.
+
   power-domains:
     maxItems: 1
 
@@ -75,7 +83,15 @@ required:
   - resets
   - reset-names
   - npu-supply
-  - sram-supply
+
+if:
+  properties:
+    compatible:
+      contains:
+        const: rockchip,rk3588-rknn-core
+then:
+  required:
+    - sram-supply
 
 additionalProperties: false
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (6 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-04 14:11   ` sashiko-bot
  2026-06-04 13:52 ` [RFC PATCH v3 9/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU Midgy BALON
  2026-06-05  1:36 ` [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Chaoyi Chen
  9 siblings, 1 reply; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

The RK3568 has an NVDLA-derived NPU at fde40000 with its own IOMMU at
fde4b000. Add both nodes (disabled by default) and the NPU power-domain
child under the PMU power-controller, and point rockchip,pmu at the PMU
syscon that controls the NPU NoC bus-idle.

The power-domain deliberately carries no pm_qos: qos_npu sits behind the
NPU NoC, which is gated until the NPU is brought up, so a genpd power-off
QoS save would fault reading it.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
index 64bdd8b7754b5..50ce5a5e4fc24 100644
--- a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
@@ -512,6 +512,13 @@ power-domain@RK3568_PD_GPU {
 				#power-domain-cells = <0>;
 			};
 
+			pd_npu: power-domain@RK3568_PD_NPU {
+				reg = <RK3568_PD_NPU>;
+				clocks = <&cru ACLK_NPU_PRE>,
+					 <&cru HCLK_NPU_PRE>;
+				#power-domain-cells = <0>;
+			};
+
 			/* These power domains are grouped by VD_LOGIC */
 			power-domain@RK3568_PD_VI {
 				reg = <RK3568_PD_VI>;
@@ -948,6 +955,37 @@ qos_rga_wr: qos@fe158300 {
 		reg = <0x0 0xfe158300 0x0 0x20>;
 	};
 
+	rknn_core_0: npu@fde40000 {
+		compatible = "rockchip,rk3568-rknn-core";
+		reg = <0x0 0xfde40000 0x0 0x1000>,
+		      <0x0 0xfde41000 0x0 0x1000>,
+		      <0x0 0xfde43000 0x0 0x1000>;
+		reg-names = "pc", "cna", "core";
+		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>,
+			 <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_PRE>;
+		clock-names = "aclk", "hclk", "npu", "pclk";
+		assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
+		assigned-clock-rates = <200000000>;
+		resets = <&cru SRST_A_NPU>, <&cru SRST_H_NPU>;
+		reset-names = "srst_a", "srst_h";
+		power-domains = <&power RK3568_PD_NPU>;
+		rockchip,pmu = <&pmu>;
+		iommus = <&rknn_mmu_0>;
+		status = "disabled";
+	};
+
+	rknn_mmu_0: iommu@fde4b000 {
+		compatible = "rockchip,iommu";
+		reg = <0x0 0xfde4b000 0x0 0x40>;
+		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
+		clock-names = "aclk", "iface";
+		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>;
+		power-domains = <&power RK3568_PD_NPU>;
+		#iommu-cells = <0>;
+		status = "disabled";
+	};
+
 	qos_npu: qos@fe180000 {
 		compatible = "rockchip,rk3568-qos", "syscon";
 		reg = <0x0 0xfe180000 0x0 0x20>;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v3 9/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (7 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU Midgy BALON
@ 2026-06-04 13:52 ` Midgy BALON
  2026-06-05  1:36 ` [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Chaoyi Chen
  9 siblings, 0 replies; 35+ messages in thread
From: Midgy BALON @ 2026-06-04 13:52 UTC (permalink / raw)
  To: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will
  Cc: robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Enable the NPU and its IOMMU on ROCK 3B.

vdd_npu is marked always-on so the rail is up before genpd de-idles the
NPU NoC at power-on: the PMU de-idle handshake needs the rail powered.
The PVTPLL compute clock is brought up later by the driver.

Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts b/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
index 69001e453732e..7ac780ed313d5 100644
--- a/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3568-rock-3b.dts
@@ -330,8 +330,10 @@ regulator-state-mem {
 
 			vdd_npu: DCDC_REG4 {
 				regulator-name = "vdd_npu";
+				regulator-always-on;
+				regulator-boot-on;
 				regulator-initial-mode = <0x2>;
-				regulator-min-microvolt = <500000>;
+				regulator-min-microvolt = <825000>;
 				regulator-max-microvolt = <1350000>;
 				regulator-ramp-delay = <6001>;
 
@@ -787,3 +789,13 @@ vp0_out_hdmi: endpoint@ROCKCHIP_VOP2_EP_HDMI0 {
 		remote-endpoint = <&hdmi_in_vp0>;
 	};
 };
+
+&rknn_core_0 {
+	npu-supply = <&vdd_npu>;
+	status = "okay";
+};
+
+&rknn_mmu_0 {
+	status = "okay";
+};
+
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
@ 2026-06-04 14:04   ` sashiko-bot
  2026-06-04 14:20   ` Tomeu Vizoso
  2026-06-05  1:59   ` Chaoyi Chen
  2 siblings, 0 replies; 35+ messages in thread
From: sashiko-bot @ 2026-06-04 14:04 UTC (permalink / raw)
  To: Midgy BALON; +Cc: dri-devel, devicetree, robh, conor+dt

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The patch writes 0x2 to RK_MMU_AUTO_GATING to clear bit 1, but this value actually sets bit 1 and clears bit 0, doing the exact opposite of the intended behavior.
- [High] An SoC-specific workaround is applied unconditionally to all Rockchip IOMMUs using a brittle hardcoded register overwrite.
--

commit 86bf3e3b9a68d419b8e6ba9d4d3cf8afb0c324bb
Author: Midgy BALON <midgy971@gmail.com>

iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU

This commit aims to fix a hardware lockup on the RK356x v1 IOMMU by
clearing bit 1 of the RK_MMU_AUTO_GATING register. This change is intended
to prevent auto clock-gating of the page-table walker.

> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 0013cf196c573..7d40ecb26123f 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -952,6 +952,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>  
>  	ret = rk_iommu_enable_paging(iommu);
>  
> +	if (!ret) {
> +		/*
> +		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> +		 * auto clock-gating; the walker's AXI master then loses its clock
> +		 * between transactions and a TLB-miss page walk never completes,
> +		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> +		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> +		 */
> +		for (i = 0; i < iommu->num_mmu; i++)

[Severity: High]
Will this loop unconditionally apply the RK356x v1 workaround to all SoCs
supported by the rockchip-iommu driver? It seems older variants like the
RK3288 or RK3399 might unexpectedly receive this register write in
rk_iommu_enable() without a check for the specific SoC version.

> +			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);

[Severity: High]
Does writing 0x2 here achieve the intended result? 

The comment mentions clearing bit 1 and keeping bit 0. Since the register
resets to 0x3 (binary 11), writing 0x2 (binary 10) actually sets bit 1 and
clears bit 0. This seems to do the exact opposite of what is described,
keeping the buggy page-walker auto clock-gating enabled and inadvertently
disabling the working slave-port gate. Should this be 0x1 instead?

Additionally, because rk_iommu_write() completely overwrites the register,
could this clobber other bits in RK_MMU_AUTO_GATING? Would it be safer to
use a read-modify-write pattern here?

> +	}
> +
>  out_disable_stall:
>  	rk_iommu_disable_stall(iommu);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=6

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support
  2026-06-04 13:52 ` [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support Midgy BALON
@ 2026-06-04 14:05   ` sashiko-bot
  0 siblings, 0 replies; 35+ messages in thread
From: sashiko-bot @ 2026-06-04 14:05 UTC (permalink / raw)
  To: Midgy BALON; +Cc: robh, devicetree, conor+dt, dri-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] The error path for `noc_init` failure in `rocket_core_init` calls `rocket_job_fini()` instead of `rocket_core_fini()`, leaking an IOMMU group reference and leaving PM runtime active.

Pre-existing issues:
- [High] The `core->clks` array lacks initialization of its `id` fields before being passed to `devm_clk_bulk_get()`, causing the driver to configure the rate on the wrong clock in `rk3568_noc_init()`.
- [High] Failed probe leaks `struct rocket_device` and associated DRM structures because they are devm-allocated against a global persistent platform device.
--

commit 60f18710aa215157b0b6ac91ed70f00cc049eb2c
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Add RK3568 SoC support

The RK3568 has a single core of the same NVDLA-derived NPU IP as the
RK3588, with a 32-bit AXI master.  Unlike the RK3588 it must be powered
on and de-idled through the PMU, and its PVTPLL clock started via SCMI,
before the NPU is reachable.  Add rk3568_soc_data with an noc_init
callback performing this bring-up.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index 09c445af7de73..a8de876365873 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
> @@ -88,6 +88,15 @@ int rocket_core_init(struct rocket_core *core)
>  		return err;
>  	}
>  
> +	if (core->soc_data->noc_init) {
> +		err = core->soc_data->noc_init(core);
> +		if (err) {
> +			pm_runtime_put_sync(dev);
> +			rocket_job_fini(core);

[Severity: High]
If noc_init() fails, should this call rocket_core_fini() instead of
rocket_job_fini()?

Calling rocket_job_fini() here bypasses the rest of the teardown that
rocket_core_fini() handles. This leaves the PM runtime enabled and
autosuspend incorrectly configured, and skips iommu_group_put() which
leaks the IOMMU group reference.

> +			return err;
> +		}
> +	}
> +
>  	version = rocket_pc_readl(core, VERSION);

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index c18840e5aff76..5a72d0b5f4dff 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c

[ ... ]

> @@ -217,12 +219,63 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }
>  
> +/*
> + * On RK3568 the NPU NOC bus is gated and idle out of reset and must be
> + * powered on and de-idled through the PMU before the NPU is reachable.  PMU
> + * registers use a write-mask protocol: the upper 16 bits enable writes to the
> + * matching lower 16 bits.
> + *
> + * The NPU's high-speed clock is a PVTPLL managed by TF-A via SCMI and must be
> + * running before the NOC acknowledges the de-idle request.  Force a real SCMI
> + * rate change (an intermediate rate defeats the clock framework's
> + * unchanged-rate shortcut) now that the power domain is on and clocks enabled.
> + */
> +#define ROCKET_RK3568_SCMI_CLK	2
> +
> +static int rk3568_noc_init(struct rocket_core *core)
> +{
> +	struct regmap *pmu;
> +	unsigned int val;
> +	int ret;
> +
> +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 600000000UL);
> +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);

[Severity: High]
This is a pre-existing issue, but does core->clks actually contain the correct
SCMI clock here?

Looking at rocket_core_init(), devm_clk_bulk_get() is called without first
initializing the .id fields of the core->clks array:

drivers/accel/rocket/rocket_core.c:rocket_core_init() {
    ...
    err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
    ...
}

Because the IDs are uninitialized (NULL), clk_get() will repeatedly return the
very first clock of the device. Does this mean these clk_set_rate() calls are
erroneously forcing a rate change on the wrong clock while leaving the PVTPLL
unconfigured?

[ ... ]

[Severity: High]
This is a pre-existing issue, but could probe failures (such as noc_init
returning an error) cause an unbounded memory leak?

In rocket_probe(), the global drm_dev is passed to rocket_device_init():

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
    ...
    rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
    ...
}

Inside rocket_device_init(), memory is allocated against this global
device's devres list:

drivers/accel/rocket/rocket_device.c:rocket_device_init() {
    ...
    rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
    ...
}

If probe fails, rocket_probe() calls rocket_device_fini() to unregister
the DRM device. However, it does not free the devm-allocated memory
because the allocations are tied to the global drm_dev that persists until
module unload. Does this mean every deferred probe or bind/unbind action
allocates new memory on the global devres list?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data
  2026-06-04 13:52 ` [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data Midgy BALON
@ 2026-06-04 14:05   ` sashiko-bot
  0 siblings, 0 replies; 35+ messages in thread
From: sashiko-bot @ 2026-06-04 14:05 UTC (permalink / raw)
  To: Midgy BALON; +Cc: robh, dri-devel, devicetree, conor+dt

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Missing bounds check on `rdev->num_cores` against `soc_data->num_cores` causes an out-of-bounds array write.

Pre-existing issues:
- [Critical] Early registration of the DRM device creates a race window leading to a NULL pointer dereference in `rocket_open`.
- [High] Array indexing mismatch on unbind/rebind causes state corruption and failed lookups.
- [High] Race condition during global device initialization and array modification due to lack of locking.
- [High] Failure in `rocket_device_init()` leaves an `ERR_PTR` in the global `rdev` variable, causing a panic on the next probe.
--

commit d76917aa771a1a958c02a6121211647bc25a0160
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Derive DMA width and core count from match data

The probe already has the per-SoC match data, which now records the core
count and DMA width.  Use it for the cores array allocation and the
device DMA mask instead of re-scanning the device tree for available core
nodes.

> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> index 46e6ee1e72c5f..6186f4faa3a2a 100644
> --- a/drivers/accel/rocket/rocket_device.c
> +++ b/drivers/accel/rocket/rocket_device.c
[ ... ]
> @@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
>  	ddev = &rdev->ddev;
>  	dev_set_drvdata(dev, rdev);
>  
> -	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> -		if (of_device_is_available(core_node))
> -			num_cores++;
> -
> -	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> +	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
> +				   GFP_KERNEL);

[Severity: Critical]
This is a pre-existing issue, but does rocket_device_init() register the
DRM device prematurely?

The first core to probe invokes rocket_device_init(), which immediately calls
drm_dev_register() at the end of the function. This exposes the DRM device to
userspace before the caller can initialize the core array elements.

If userspace quickly opens the DRM device node:

rocket_open()
  rocket_iommu_domain_create(rdev->cores[0].dev)

Since the device pointer in the array has not yet been assigned in
rocket_probe() and is still NULL, this results in a NULL pointer dereference.

>  	if (!rdev->cores)
>  		return ERR_PTR(-ENOMEM);
>  
>  	dma_set_max_seg_size(dev, UINT_MAX);
>  
> -	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
> +	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(soc_data->dma_bits));
>  	if (err)
>  		return ERR_PTR(err);
>  
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 384c38e13acce..c18840e5aff76 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -159,11 +159,15 @@ static const struct drm_driver rocket_drm_driver = {
>  
>  static int rocket_probe(struct platform_device *pdev)
>  {
> +	const struct rocket_soc_data *soc_data = of_device_get_match_data(&pdev->dev);
>  	int ret;
>  
> +	if (!soc_data)
> +		return -EINVAL;
> +
>  	if (rdev == NULL) {

[Severity: High]
This isn't a bug introduced by this patch, but does this lack locking?

The kernel driver core can probe multiple matching rockchip,rk3588-rknn-core
devices in parallel. Concurrent probes could observe rdev == NULL
simultaneously, instantiating the DRM device multiple times and leaking
memory. Additionally, concurrent reads and increments of rdev->num_cores can
cause multiple threads to overwrite the same array slot in rdev->cores[]
while leaving others uninitialized.

>  		/* First core probing, initialize DRM device. */
> -		rdev = rocket_device_init(drm_dev, &rocket_drm_driver);
> +		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);

[Severity: High]
This is a pre-existing issue, but what happens if rocket_device_init() fails?

If rocket_device_init() returns an error pointer, it is directly assigned to
the global rdev variable. The error path correctly returns the error but
fails to reset the static global rdev back to NULL. When the driver core
invokes rocket_probe() for the next matching device tree node, the
rdev == NULL check evaluates to false because rdev contains an error pointer.
The probe will then attempt to access rdev->num_cores, causing an invalid
pointer dereference.

>  		if (IS_ERR(rdev)) {
>  			dev_err(&pdev->dev, "failed to initialize rocket device\n");
>  			return PTR_ERR(rdev);
>  		}
>  	}

[Severity: High]
Can this cause an out-of-bounds array write?

The allocation of the rdev->cores array now uses a fixed size defined by
soc_data->num_cores instead of dynamically counting the device tree nodes.
However, rocket_probe() is called by the driver core for every matching DT
node, and it unconditionally uses the auto-incrementing rdev->num_cores as
the array index later in the function.

If a malformed or custom device tree contains more core nodes than
soc_data->num_cores, rocket_probe() will be invoked more times than the array
can hold, writing past the end of the rdev->cores buffer.

[ ... ]
> @@ -214,6 +218,7 @@ static void rocket_remove(struct platform_device *pdev)
>  }

[Severity: High]
This isn't a bug introduced by this patch, but does the array indexing handle
unbind and rebind correctly?

When a core is unbound, rocket_remove() decrements rdev->num_cores without
compacting the remaining active cores in the rdev->cores array to fill the
empty slot. Consequently, find_core_for_dev() (which relies on
rdev->num_cores for its loop bound) fails to find any remaining cores
located at indices >= the new rdev->num_cores, preventing their cleanup.

Furthermore, if a core is re-bound, rocket_probe() will use the decremented
rdev->num_cores as the index and overwrite an existing active core, causing
resource leaks and state corruption.

>  
>  static const struct rocket_soc_data rk3588_soc_data = {
> +	.num_cores = 3,
>  	.dma_bits = 40,
>  };

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=2

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs
  2026-06-04 13:52 ` [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs Midgy BALON
@ 2026-06-04 14:08   ` sashiko-bot
  0 siblings, 0 replies; 35+ messages in thread
From: sashiko-bot @ 2026-06-04 14:08 UTC (permalink / raw)
  To: Midgy BALON; +Cc: conor+dt, robh, devicetree, dri-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] The IOMMU hardware registers are left uninitialized after a job timeout and hardware reset, causing subsequent jobs to immediately fault.

Pre-existing issues:
- [High] Incorrect teardown ordering in `rocket_core_fini()` destroys IOMMU and PM resources before stopping the DRM scheduler, leading to data races, use-after-free, and potential hardware faults.
- [High] PM runtime usage counts and unsignaled fences are leaked on error paths in `rocket_job_run()`.
--

commit 49c877f1ba7baae4a28268850b55e2ac3ab4a8cc
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Keep the IOMMU domain attached across jobs
    
rocket attached the job's IOMMU domain in rocket_job_run() and
detached it again on every completion and reset. Each attach/detach
toggles the rk_iommu stall/force-reset/paging handshake, and on
RK3568 the NPU MMU is idle between jobs, so that handshake times out
and logs a burst of "stall/paging request timed out" errors for
every job.

Attach the per-context domain once and keep it: track the attached
domain in the core, swap it only when a job from a different context
runs, and detach it at core teardown. A reference on the attached
domain is held so it outlives the job that first attached it and is
released on swap/teardown.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index a8de876365873..634f78dfe2887 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c

[ ... ]

> @@ -112,6 +113,11 @@ void rocket_core_fini(struct rocket_core *core)
>  {
>  	pm_runtime_dont_use_autosuspend(core->dev);
>  	pm_runtime_disable(core->dev);
> +	if (core->attached_domain) {
> +		iommu_detach_group(NULL, core->iommu_group);
> +		rocket_iommu_domain_put(core->attached_domain);
> +		core->attached_domain = NULL;
> +	}
>  	iommu_group_put(core->iommu_group);
>  	core->iommu_group = NULL;

[Severity: High]
This is a pre-existing issue, but IOMMU resources are being torn down before
the DRM scheduler is stopped in rocket_job_fini(core).

Because the scheduler might still be active, couldn't an in-flight job be
executing concurrently? This could lead to a data race on core->attached_domain,
the IOMMU domain being forcefully unmapped from a running job causing
AXI bus faults, or a use-after-free when rocket_job_run() accesses
core->iommu_group.

Should rocket_job_fini(core) be called before tearing down the IOMMU state to
ensure the hardware is idle and the scheduler is stopped?

>  	rocket_job_fini(core);
>  }

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index e25234261536b..b248371be8a1e 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c

[ ... ]

> @@ -314,9 +315,26 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
>  	if (ret < 0)
>  		return fence;

[Severity: High]
This isn't a bug introduced by this patch, but if pm_runtime_get_sync()
fails, it returns a negative error code but still increments the PM usage counter.

The error path returns without calling pm_runtime_put_noidle() or signaling
the fence, which leaks the PM counter and causes the scheduler to wait indefinitely.

> -	ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> -	if (ret < 0)
> -		return fence;
> +	/*
> +	 * Attach the job's IOMMU domain only when it differs from the one
> +	 * already attached. Re-attaching per job toggles the rk_iommu
> +	 * stall/reset handshake on an idle NPU MMU, which is slow and
> +	 * noisy; keep the domain attached across jobs instead.
> +	 */
> +	if (core->attached_domain != job->domain) {
> +		if (core->attached_domain) {
> +			iommu_detach_group(NULL, core->iommu_group);
> +			rocket_iommu_domain_put(core->attached_domain);
> +			core->attached_domain = NULL;
> +		}
> +
> +		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> +		if (ret < 0)
> +			return fence;

[Severity: High]
Also a pre-existing issue, but if iommu_attach_group() fails, it returns
the fence early without dropping the successfully acquired PM reference.
Should this path include a pm_runtime_put_noidle() call to avoid leaking
the reference?

> +
> +		kref_get(&job->domain->kref);
> +		core->attached_domain = job->domain;
> +	}
>  
>  	scoped_guard(mutex, &core->job_lock) {
>  		core->in_flight_job = job;

[ ... ]

> @@ -376,8 +393,6 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
>  	 */
>  	rocket_core_reset(core);
>  
> -	iommu_detach_group(NULL, core->iommu_group);
> -
>  	/* NPU has been reset, we can clear the reset pending bit. */
>  	atomic_set(&core->reset.pending, 0);

[Severity: High]
When the NPU is hard-reset by rocket_core_reset(), the IOMMU's hardware registers
(like page table base addresses) are wiped.

Since core->attached_domain is not cleared here, when the scheduler restarts
and executes the next job, rocket_job_run() will see core->attached_domain
matches the job's domain and skip calling iommu_attach_group().

Will this leave the IOMMU hardware unconfigured and cause subsequent jobs to
immediately throw DMA faults? Should core->attached_domain be cleared here
to force a re-attach and reprogramming of the IOMMU?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=5

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  2026-06-04 13:52 ` [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 Midgy BALON
@ 2026-06-04 14:08   ` sashiko-bot
  2026-06-04 16:55     ` Conor Dooley
  0 siblings, 1 reply; 35+ messages in thread
From: sashiko-bot @ 2026-06-04 14:08 UTC (permalink / raw)
  To: Midgy BALON; +Cc: conor+dt, dri-devel, robh, devicetree

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [High] The value written to `RK_MMU_AUTO_GATING` sets bit 1 instead of clearing it, contradicting the inline comment and hardware workaround logic.
- [High] The error handling path for `noc_init()` failure leaks the IOMMU group reference and leaves the device in an unbalanced PM runtime state.
- [High] A malformed device tree with excessive NPU core nodes triggers an out-of-bounds write in `rocket_probe()` array access.
- [High] The NOC bus de-idle sequence via the PMU is missing from the runtime PM resume path.
- [Medium] The `rockchip,pmu` property is functionally mandatory for RK3568 but is not added to the schema's conditional `required` list.

Pre-existing issues:
- [High] The clock names in the `core->clks` array are uninitialized before being passed to `devm_clk_bulk_get()`, causing the framework to fetch the wrong clocks.
--

commit 6c61da961bed3360a99f625d4a3b1599e7bfe685
Author: Midgy BALON <midgy971@gmail.com>

    dt-bindings: npu: rockchip, rk3588-rknn-core: Add RK3568
    
    The RK3568 carries a single core of the same NVDLA-derived NPU IP as the
    RK3588. Add its compatible. On RK3568 the NPU NOC bus-idle and power
    gating are controlled through the system PMU rather than a dedicated
    register block.

> diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> index caca2a4903cd1..af9936b32e9fe 100644
> --- a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
[ ... ]
> @@ -75,7 +83,15 @@ required:
>    - resets
>    - reset-names
>    - npu-supply
> -  - sram-supply
> +
> +if:
> +  properties:
> +    compatible:
> +      contains:
> +        const: rockchip,rk3588-rknn-core
> +then:
> +  required:
> +    - sram-supply

[Severity: Medium]
Does the missing rockchip,pmu property from the conditional required block
cause incomplete device trees to pass validation?

The rockchip,pmu property is functionally mandatory for the
rockchip,rk3568-rknn-core variant as the driver unconditionally fails to probe
without it. Missing this mandatory property here could allow invalid device
trees to go undetected.

>  additionalProperties: false
>  
> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index b3b2fa9ba645a..634f78dfe2887 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
[ ... ]
> @@ -83,6 +89,15 @@ int rocket_core_init(struct rocket_core *core)
>  		return err;
>  	}
>  
> +	if (core->soc_data->noc_init) {
> +		err = core->soc_data->noc_init(core);
> +		if (err) {
> +			pm_runtime_put_sync(dev);
> +			rocket_job_fini(core);
> +			return err;
> +		}
> +	}

[Severity: High]
Does this error path leak the IOMMU group reference and leave the device in an
unbalanced PM runtime state?

By returning directly after pm_runtime_put_sync() and rocket_job_fini(), the
code misses calling rocket_core_fini() which handles iommu_group_put(),
pm_runtime_disable(), and pm_runtime_dont_use_autosuspend().

> +
>  	version = rocket_pc_readl(core, VERSION);
>  	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
>  
> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> index 46e6ee1e72c5f..6186f4faa3a2a 100644
> --- a/drivers/accel/rocket/rocket_device.c
> +++ b/drivers/accel/rocket/rocket_device.c
[ ... ]
> @@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
>  	ddev = &rdev->ddev;
>  	dev_set_drvdata(dev, rdev);
>  
> -	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> -		if (of_device_is_available(core_node))
> -			num_cores++;
> -
> -	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> +	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
> +				   GFP_KERNEL);

[Severity: High]
Can this code overflow rdev->cores[] if the device tree contains more NPU
nodes than soc_data->num_cores?

With this change, the array size is fixed to soc_data->num_cores, but
rocket_probe() in drivers/accel/rocket/rocket_drv.c is still called for every
matching node. If there are too many nodes in the device tree, it appears
rocket_probe() will write out of bounds here:

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
    ...
	unsigned int core = rdev->num_cores;
    ...
	rdev->cores[core].rdev = rdev;
	rdev->cores[core].dev = &pdev->dev;
	rdev->cores[core].index = core;
    ...
}

>  	if (!rdev->cores)
>  		return ERR_PTR(-ENOMEM);
>  
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 8bbbce594883e..5a72d0b5f4dff 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
[ ... ]
> @@ -213,8 +219,64 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }
>  
> +/*
> + * On RK3568 the NPU NOC bus is gated and idle out of reset and must be
> + * powered on and de-idled through the PMU before the NPU is reachable.  PMU
> + * registers use a write-mask protocol: the upper 16 bits enable writes to the
> + * matching lower 16 bits.
> + *
> + * The NPU's high-speed clock is a PVTPLL managed by TF-A via SCMI and must be
> + * running before the NOC acknowledges the de-idle request.  Force a real SCMI
> + * rate change (an intermediate rate defeats the clock framework's
> + * unchanged-rate shortcut) now that the power domain is on and clocks enabled.
> + */
> +#define ROCKET_RK3568_SCMI_CLK	2
> +
> +static int rk3568_noc_init(struct rocket_core *core)
> +{
> +	struct regmap *pmu;
> +	unsigned int val;
> +	int ret;
> +
> +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 600000000UL);
> +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);

[Severity: High]
This is a pre-existing issue, but does this unintentionally alter the system
aclk rate?

Looking at rocket_core_init(), the core->clks array is passed to
devm_clk_bulk_get() without initializing the id strings:

drivers/accel/rocket/rocket_core.c:rocket_core_init() {
    ...
	err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
    ...
}

Because the id fields are NULL, the framework fetches the first DT clock
(aclk) for all elements. Calling clk_set_rate() here seems to alter the
system aclk frequency instead of the intended SCMI clock.

> +
> +	pmu = syscon_regmap_lookup_by_phandle(core->dev->of_node, "rockchip,pmu");
> +	if (IS_ERR(pmu))
> +		return dev_err_probe(core->dev, PTR_ERR(pmu),
> +				     "failed to get PMU regmap\n");
> +
> +	/* Power on the NPU power domain (PWR_GATE_SFTCON bit 1 = 0). */
> +	regmap_write(pmu, 0xa0, BIT(1 + 16));
> +
> +	/* Disable NPU NOC auto-idle (NOC_AUTO_CON0 bit 2). */
> +	regmap_write(pmu, 0x70, BIT(2 + 16));
> +
> +	/* Request NPU bus de-idle (BUS_IDLE_SFTCON0 bit 2 = 0). */
> +	regmap_write(pmu, 0x50, BIT(2 + 16));
> +
> +	/* Wait for the bus to report active (BUS_IDLE_ST bit 2 = 0). */
> +	ret = regmap_read_poll_timeout(pmu, 0x68, val, !(val & BIT(2)), 10, 1000);
> +	if (ret)
> +		dev_err(core->dev, "timed out waiting for NPU bus de-idle\n");
> +
> +	return ret;
> +}

[Severity: High]
Is a NOC bus de-idle sequence required in the runtime PM resume path as well?

If the generic power domain (genpd) powers off the NPU during runtime suspend,
the NOC bus idle state would reset. Currently, rocket_device_runtime_resume()
enables clocks but doesn't repeat this PMU de-idle sequence, which could leave
the AXI bus gated and cause subsequent hardware accesses to hang or crash.

> +
>  static const struct rocket_soc_data rk3568_soc_data = {
>  	.num_cores = 1,
>  	.dma_bits = 32,
>  	.noc_init = rk3568_noc_init,
>  };
>  
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 0013cf196c573..7d40ecb26123f 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -952,6 +952,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>  
>  	ret = rk_iommu_enable_paging(iommu);
>  
> +	if (!ret) {
> +		/*
> +		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> +		 * auto clock-gating; the walker's AXI master then loses its clock
> +		 * between transactions and a TLB-miss page walk never completes,
> +		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> +		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> +		 */
> +		for (i = 0; i < iommu->num_mmu; i++)
> +			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);

[Severity: High]
Will this write operation inadvertently set bit 1 instead of clearing it?

Writing 0x2 (binary 10) sets bit 1 and clears bit 0, which contradicts the
inline comment indicating that bit 1 should be cleared and bit 0 kept. This
appears to enable the buggy page-walker auto clock-gating and disable the
slave-port gate.

> +	}
> +
>  out_disable_stall:
>  	rk_iommu_disable_stall(iommu);
>  out_disable_clocks:

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=7

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data
  2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
@ 2026-06-04 14:08   ` sashiko-bot
  0 siblings, 0 replies; 35+ messages in thread
From: sashiko-bot @ 2026-06-04 14:08 UTC (permalink / raw)
  To: Midgy BALON; +Cc: devicetree, robh, dri-devel, conor+dt

Thank you for your contribution! Sashiko AI review found 11 potential issue(s) to consider:

Pre-existing issues:
- [Critical] kref_put() dereferences a NULL pointer when job initialization fails early or a BO is imported.
- [High] Error path in BO creation frees an object that is already owned by the handle table.
- [High] Device is permanently left in an active power state upon unbind.
- [High] Array mismanagement during core removal prevents multi-core unbind.
- [High] Memory and DRM device leak on probe failure or deferral.
- [High] Asynchronous reset work is canceled after the DRM scheduler is destroyed.
- [High] The DRM device is registered and exposed to userspace before the NPU cores are initialized.
- [High] The global `rdev` pointer and `rdev->num_cores` are accessed and modified locklessly during probe and removal.
- [High] The shared interrupt handler accesses hardware registers without checking the PM state, and suspend executes without synchronizing IRQs.
- [High] A failure in `rocket_job_run` leaks the PM usage counter and returns an unsignaled hardware fence, hanging the DRM scheduler.
- [High] Registers are accessed during initialization before resets are deasserted.
--

commit e6cc2e46cba8fcba41d70a8b1e9422607b444db8
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Introduce per-SoC rocket_soc_data

Add a per-SoC data structure carried in the OF match table, currently
holding only the NPU AXI address width, and use it for the per-core DMA
mask instead of a hardcoded 40-bit value.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index b3b2fa9ba645a..09c445af7de73 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
> @@ -21,6 +22,10 @@ int rocket_core_init(struct rocket_core *core)
>  	u32 version;
>  	int err = 0;
>  
> +	core->soc_data = of_device_get_match_data(dev);
> +	if (!core->soc_data)
> +		return dev_err_probe(dev, -EINVAL, "missing SoC match data\n");
> +

[Severity: High]
This is a pre-existing issue, but I noticed that later in this function,
registers are accessed before resets are deasserted:

drivers/accel/rocket/rocket_core.c:rocket_core_init() {
	...
	core->resets[0].id = "srst_a";
	core->resets[1].id = "srst_h";
	err = devm_reset_control_bulk_get_exclusive(&pdev->dev, ARRAY_SIZE(core->resets),
						    core->resets);
	...
	version = rocket_pc_readl(core, VERSION);
	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
	...
}

Will this cause a bus stall and trigger an asynchronous external abort if the
hardware or bootloader leaves these resets asserted?

[Severity: High]
This is a pre-existing issue, but does rocket_core_fini() leave the device
permanently powered on with its clocks running after the driver has been
detached?

drivers/accel/rocket/rocket_core.c:rocket_core_fini() {
	pm_runtime_dont_use_autosuspend(core->dev);
	pm_runtime_disable(core->dev);
	...
}

Since rocket_core_init() drops its Runtime PM reference asynchronously via
pm_runtime_put_autosuspend(dev), if the driver is unbound before the
autosuspend delay elapses, pm_runtime_disable() forcefully cancels the
pending asynchronous suspend work without first synchronously powering down
the hardware (e.g., via pm_runtime_put_sync()).

>  	core->resets[0].id = "srst_a";
>  	core->resets[1].id = "srst_h";
>  	err = devm_reset_control_bulk_get_exclusive(&pdev->dev, ARRAY_SIZE(core->resets),
[ ... ]
> @@ -52,7 +57,7 @@ int rocket_core_init(struct rocket_core *core)
>  
>  	dma_set_max_seg_size(dev, UINT_MAX);
>  
> -	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
> +	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(core->soc_data->dma_bits));
>  	if (err)
>  		return err;
>  
> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 8bbbce594883e..384c38e13acce 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -213,8 +213,12 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }
>  

[Severity: High]
This is a pre-existing issue, but can this array mismanagement during core
removal prevent multi-core unbind?

drivers/accel/rocket/rocket_drv.c:rocket_remove() {
	...
	rocket_core_fini(&rdev->cores[core]);
	rdev->num_cores--;
	...
}

drivers/accel/rocket/rocket_drv.c:find_core_for_dev() {
	for (unsigned int core = 0; core < rdev->num_cores; core++) {
		if (dev == rdev->cores[core].dev)
			return core;
	}
	...
}

When a core is unbound, rdev->num_cores is decremented, but the unbound core's
slot in the static rdev->cores array is left unmodified, and the remaining
elements are not shifted.

Because find_core_for_dev() unconditionally loops only rdev->num_cores times,
any core residing at an array index greater than or equal to the decremented
num_cores becomes invisible.

Will subsequent attempts to unbind those remaining cores fail, silently
exiting early and completely leaking the cores' resources and power state?

[Severity: High]
This is a pre-existing issue, but could rocket_probe() leak memory and the DRM
device on probe failure or deferral?

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
	if (rdev == NULL) {
		/* First core probing, initialize DRM device. */
		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
	...
}

drivers/accel/rocket/rocket_device.c:rocket_device_init() {
	rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
	...
	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores), GFP_KERNEL);
	...
}

The global facade device rdev is dynamically initialized via rocket_device_init()
using the global, virtually registered drm_dev platform device. Allocations like
devm_drm_dev_alloc() and devm_kcalloc() are bound to the drm_dev lifecycle
rather than the physical pdev currently being probed.

If rocket_core_init() subsequently fails (e.g., due to -EPROBE_DEFER),
rocket_probe() cleans up by calling rocket_device_fini() and resetting
rdev = NULL. However, rocket_device_fini() only unregisters the DRM device.

Because drm_dev remains globally alive until module unload, none of the devm_
allocations are freed. On the next probe attempt, a new rdev is allocated on
drm_dev. Can this create a permanent memory leak?

[Severity: High]
This is a pre-existing issue, but might the DRM device be registered and
exposed to userspace before the NPU cores are initialized?

drivers/accel/rocket/rocket_device.c:rocket_device_init() {
	...
	err = drm_dev_register(ddev, 0);
	if (err)
		return ERR_PTR(err);

	return rdev;
}

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
	...
		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
	...
	unsigned int core = rdev->num_cores;
	dev_set_drvdata(&pdev->dev, rdev);

	rdev->cores[core].rdev = rdev;
	rdev->cores[core].dev = &pdev->dev;
	...
}

When the first core is probed, it calls rocket_device_init(), which ends with
drm_dev_register(). At this moment, the DRM character device becomes visible
and openable by userspace.

However, the core hardware, IRQs, and core pointers (e.g. rdev->cores[core].dev)
are not assigned or initialized until immediately afterward, when
rocket_core_init() runs.

If userspace opens the device in this narrow window, will rocket_open()
evaluate the uninitialized rdev->cores[0].dev (which is NULL) and pass it to
rocket_iommu_domain_create(), causing an immediate NULL pointer dereference?

[Severity: High]
This is a pre-existing issue, but is it possible that the global rdev pointer
and rdev->num_cores are accessed and modified locklessly during probe and
removal?

drivers/accel/rocket/rocket_drv.c:rocket_probe() {
	if (rdev == NULL) {
		/* First core probing, initialize DRM device. */
		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
	...
	rdev->num_cores++;
	...
}

If multiple cores are probed concurrently (e.g., via sysfs asynchronous
binding), multiple threads will evaluate `if (rdev == NULL)` simultaneously
without a lock.

Can this create a data race where concurrent threads allocate redundant DRM
devices, overwrite the rdev singleton (causing memory leaks), and corrupt
the shared rdev->num_cores counter and rdev->cores array?

[Severity: Critical]
This is a pre-existing issue, but will kref_put() dereference a NULL pointer
when job initialization fails early or a BO is imported?

drivers/accel/rocket/rocket_job.c:rocket_ioctl_submit_job() {
	...
	goto out_cleanup_job;
	...
out_cleanup_job:
	if (ret)
		drm_sched_job_cleanup(&rjob->base);
out_put_job:
	rocket_job_put(rjob);
}

drivers/accel/rocket/rocket_drv.c:rocket_iommu_domain_put() {
	kref_put(&domain->kref, rocket_iommu_domain_destroy);
}

If an early failure occurs in rocket_ioctl_submit_job() (e.g., in
rocket_copy_tasks() or drm_gem_objects_lookup()), the error path calls
rocket_job_put(), which invokes rocket_job_cleanup(). At this point,
job->domain is still NULL.

rocket_job_cleanup() unconditionally passes this to rocket_iommu_domain_put(),
which immediately evaluates &domain->kref and causes a NULL pointer
dereference.

The same panic appears to occur when a BO is imported via PRIME
(rocket_gem_create_object() leaves domain NULL), and rocket_gem_bo_free()
subsequently accesses bo->domain->domain and bo->domain.

[Severity: High]
This is a pre-existing issue, but could the error path in BO creation free an
object that is already owned by the handle table?

drivers/accel/rocket/rocket_gem.c:rocket_ioctl_create_bo() {
	...
	ret = drm_gem_handle_create(file, gem_obj, &args->handle);
	drm_gem_object_put(gem_obj);
	if (ret)
		goto err;
	...
err:
	drm_gem_shmem_object_free(gem_obj);
}

drm_gem_handle_create() successfully exposes gem_obj to userspace and
increments its handle refcount, after which the local reference is correctly
dropped via drm_gem_object_put().

If a subsequent step fails (such as drm_gem_shmem_get_pages_sgt()), the error
path jumps to err: and manually calls drm_gem_shmem_object_free(gem_obj).

This explicitly frees the backing memory while the object is still actively
referenced by the handle table. When userspace later closes the handle, will
the GEM core double-free the object?

[Severity: High]
This is a pre-existing issue, but should the asynchronous reset work be
canceled before the DRM scheduler is destroyed?

drivers/accel/rocket/rocket_job.c:rocket_job_fini() {
	drm_sched_fini(&core->sched);

	cancel_work_sync(&core->reset.work);
	destroy_workqueue(core->reset.wq);
}

drm_sched_fini(&core->sched) is called before cancel_work_sync(&core->reset.work).

If the background reset task rocket_reset_work() is running concurrently,
it will execute rocket_reset(), which attempts to call drm_sched_stop() and
drm_sched_start() on the core->sched instance that has already been torn down
or freed.

[Severity: High]
This is a pre-existing issue, but could the shared interrupt handler access
hardware registers without checking the PM state, and does suspend execute
without synchronizing IRQs?

drivers/accel/rocket/rocket_job.c:rocket_job_irq_handler() {
	struct rocket_core *core = data;
	u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
	...
}

drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend() {
	...
	clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);

	return 0;
}

The driver requests its interrupt using IRQF_SHARED, meaning
rocket_job_irq_handler() can execute at any time if another device on the same
line interrupts the CPU.

When called, the handler immediately reads hardware registers
(INTERRUPT_RAW_STATUS) without using pm_runtime_get_if_active() to verify the
device is powered. If the device was suspended, will this read return garbage
or cause a bus fault?

Additionally, the rocket_device_runtime_suspend() callback disables the
device's clocks without invoking synchronize_irq(), which might allow
hardware to lose power while an interrupt handler is executing mid-flight.

[Severity: High]
This is a pre-existing issue, but can a failure in rocket_job_run() leak
the PM usage counter and return an unsignaled hardware fence, hanging the
DRM scheduler?

drivers/accel/rocket/rocket_job.c:rocket_job_run() {
	...
	ret = pm_runtime_get_sync(core->dev);
	if (ret < 0)
		return fence;
	...
		ret = iommu_attach_group(job->domain->domain, core->iommu_group);
		if (ret < 0)
			return fence;
	...
}

drivers/accel/rocket/rocket_job.c:rocket_reset() {
	scoped_guard(mutex, &core->job_lock) {
		if (core->in_flight_job)
			pm_runtime_put_noidle(core->dev);

		core->in_flight_job = NULL;
	}
}

If pm_runtime_get_sync() returns an error, it returns the created fence
directly. Since pm_runtime_get_sync() increments the usage counter even upon
failure, this leaks the PM reference (missing pm_runtime_put_noidle()).

Similarly, if iommu_attach_group() fails shortly after, the function again
returns the fence without dropping the successfully acquired PM reference.

Because the hardware submission is skipped in both error paths, the returned
fence is never signaled. The DRM scheduler will block indefinitely waiting for
it, triggering a job timeout.

Will the timeout reset logic (rocket_reset()) fail to clean up these leaked
PM references because core->in_flight_job is evaluated as NULL (it was never
assigned before the early returns)?

> +static const struct rocket_soc_data rk3588_soc_data = {
> +	.dma_bits = 40,
> +};
> +
>  static const struct of_device_id dt_match[] = {
> -	{ .compatible = "rockchip,rk3588-rknn-core" },
> +	{ .compatible = "rockchip,rk3588-rknn-core", .data = &rk3588_soc_data },
>  	{}
>  };
>  MODULE_DEVICE_TABLE(of, dt_match);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout
  2026-06-04 13:52 ` [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout Midgy BALON
@ 2026-06-04 14:10   ` sashiko-bot
  0 siblings, 0 replies; 35+ messages in thread
From: sashiko-bot @ 2026-06-04 14:10 UTC (permalink / raw)
  To: Midgy BALON; +Cc: conor+dt, devicetree, robh, dri-devel

Thank you for your contribution! Sashiko AI review found 9 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Double free of rjob->tasks array when copy_from_user() fails.
- [High] The shared IRQ handler unconditionally accesses hardware registers without verifying the device's PM active state.
- [High] Missing PM reference drop and unsignaled fence return on error paths in `rocket_job_run()`.
- [High] Concurrent hardware reset and IRQ handler execution can cause an external abort.
- [High] Unbalanced teardown order in rocket_core_fini() causes UAF and data races.
- [High] NULL pointer dereference of job->domain in rocket_job_cleanup() on early error paths.
- [High] Missing `synchronize_irq()` in runtime suspend callback.
- [Medium] The TDR reset path permanently leaks power by preventing the device from entering runtime suspend.
- [Medium] The hardware reset is performed after dropping the PM runtime reference, potentially targeting an unpowered device.
--

commit 323afa3b038052ef6c02ee0548658a65126910a0
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Reset the NPU before detaching the IOMMU on timeout

On a job timeout the NPU AXI master can be left wedged with outstanding
transactions. rocket_reset() detached the IOMMU group before resetting
the hardware, causing the IOMMU core to stall. Assert the core reset
first to quiesce the master so the detach completes cleanly.

> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
> @@ -364,14 +364,20 @@ rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
>  		if (core->in_flight_job)
>  			pm_runtime_put_noidle(core->dev);

[Severity: Medium]
This is a pre-existing issue, but does this TDR reset path permanently leak
power by preventing the device from entering runtime suspend?

It appears pm_runtime_put_noidle() drops the reference without triggering an
idle power-down. If the timed-out job was the last active job, the usage count
reaches zero but the device remains in the RPM_ACTIVE state indefinitely,
staying powered on until another job is submitted and correctly triggers
autosuspend.

> -		iommu_detach_group(NULL, core->iommu_group);
> -
>  		core->in_flight_job = NULL;
>  	}
>  
> -	/* Proceed with reset now. */
> +	/*
> +	 * Reset the NPU hardware before detaching the IOMMU. A timed-out job
> +	 * leaves the NPU AXI master wedged; detaching the IOMMU then issues a
> +	 * stall request that never drains and times out (warning in the IOMMU
> +	 * core). Asserting the core reset first quiesces the master so the
> +	 * detach completes cleanly.
> +	 */
>  	rocket_core_reset(core);

[Severity: High]
This isn't a bug introduced by this patch, but can a concurrent hardware reset
and IRQ handler execution cause an external abort?

rocket_core_reset() asserts a hardware reset without first disabling device
interrupts. If an interrupt fires just before or during the reset,
rocket_job_irq_handler() will execute and perform MMIO reads and writes to the
bus while the hardware is actively in reset.

[Severity: Medium]
This is also a pre-existing issue, but is it safe to perform the hardware
reset after dropping the PM runtime reference?

Dropping the PM reference via pm_runtime_put_noidle() earlier decrements the
usage count. A concurrently scheduled autosuspend timer could expire precisely
in this window and power down the device before or during the reset operations.

Asserting resets on a suspended device can lead to system bus faults. Should
the PM reference be dropped after the hardware reset is complete?

> +
> +	iommu_detach_group(NULL, core->iommu_group);
> +
>  	/* NPU has been reset, we can clear the reset pending bit. */
>  	atomic_set(&core->reset.pending, 0);


[Severity: High]
This isn't a bug introduced by this patch, but does the shared IRQ handler
unconditionally access hardware registers without verifying the device's PM
active state?

drivers/accel/rocket/rocket_job.c:rocket_job_irq_handler() {
    ...
    u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
    ...
}

The driver registers rocket_job_irq_handler() with the IRQF_SHARED flag.
Without first calling pm_runtime_get_if_active(), if the NPU is runtime
suspended, this read accesses an unpowered register domain. This typically
returns 0xffffffff (causing spurious handling) or triggers a synchronous
external abort.


[Severity: High]
This is a pre-existing issue, but does rocket_job_run() leak the PM reference
and return an unsignaled fence on error paths?

drivers/accel/rocket/rocket_job.c:rocket_job_run() {
    ...
    ret = pm_runtime_get_sync(core->dev);
    if (ret < 0)
        return fence;
    ...
}

If pm_runtime_get_sync() or iommu_attach_group() fails, the function returns
a newly created, un-signaled fence instead of an error pointer, and leaves
the PM usage counter incremented. Because the hardware submission step is
skipped, the hardware will never signal this fence, causing the DRM scheduler
to hang indefinitely.


[Severity: High]
This is a pre-existing issue, but does the teardown order in rocket_core_fini()
cause a use-after-free and data races?

drivers/accel/rocket/rocket_core.c:rocket_core_fini() {
    ...
    iommu_group_put(core->iommu_group);
    core->iommu_group = NULL;
    rocket_job_fini(core);
    ...
}

Resources are torn down in the wrong order. The iommu_group and attached_domain
members are freed, and pm_runtime_disable() is called before rocket_job_fini().

If a scheduled job is processed by the DRM scheduler during this window,
rocket_job_run() will concurrently access the now-NULL iommu_group without
locking. Should rocket_job_fini() be called first to stop the scheduler and
wait for jobs?


[Severity: Critical]
This isn't a bug introduced by this patch, but is there a double free of the
rjob->tasks array when copy_from_user() fails?

drivers/accel/rocket/rocket_job.c:rocket_copy_tasks() {
    ...
fail:
    kvfree(rjob->tasks);
    return ret;
}

If copy_from_user() fails, it frees rjob->tasks but leaves the dangling pointer.

The function returns an error, prompting rocket_ioctl_submit_job() to jump to
out_cleanup_job and call rocket_job_put(rjob). This invokes
rocket_job_cleanup(), which unconditionally calls kvfree(job->tasks),
resulting in a double-free.


[Severity: High]
This is a pre-existing issue, but does rocket_job_cleanup() dereference a NULL
job->domain on early error paths?

drivers/accel/rocket/rocket_job.c:rocket_job_cleanup() {
    ...
    rocket_iommu_domain_put(job->domain);
    ...
}

The job->domain structure is initially NULL and only populated late in the
function. If an earlier step fails (such as rocket_copy_tasks()), the cleanup
unconditionally calls rocket_iommu_domain_put(job->domain).

This wrapper dereferences the domain struct to access its kref without a
NULL check:

drivers/accel/rocket/rocket_drv.c:rocket_iommu_domain_put() {
    kref_put(&domain->kref, rocket_iommu_domain_destroy);
}


[Severity: High]
This isn't a bug introduced by this patch, but is there a missing
synchronize_irq() in the runtime suspend callback?

drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend() {
    ...
    clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks),
                               rdev->cores[core].clks);
    ...
}

The driver registers a shared interrupt but fails to call synchronize_irq().
Disabling the clocks while the IRQ handler is executing concurrently on another
CPU could cause an AXI bus fault or a hard system crash.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
  2026-06-04 13:52 ` [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU Midgy BALON
@ 2026-06-04 14:11   ` sashiko-bot
  0 siblings, 0 replies; 35+ messages in thread
From: sashiko-bot @ 2026-06-04 14:11 UTC (permalink / raw)
  To: Midgy BALON; +Cc: conor+dt, devicetree, robh, dri-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] The NPU node simultaneously defines `power-domains` and `rockchip,pmu`, causing the generic power domain framework and the NPU driver to concurrently manipulate the same PMU power and idle registers.
- [Low] Device tree nodes `npu@fde40000` and `iommu@fde4b000` are not sorted by unit address.

Pre-existing issues:
- [Critical] The driver calls `devm_clk_bulk_get` without initializing the `.id` fields of the clock bulk data array, causing all clock requests to incorrectly default to the first clock in the device tree node.
--

commit dff3ba648a490ef1a1c6644d672ca0b9e1d68360
Author: Midgy BALON <midgy971@gmail.com>

arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU

This commit adds the NPU and its associated IOMMU device tree nodes to the 
RK3568 base devicetree. It also configures the necessary clocks, resets, and
power domain for the NPU.

> diff --git a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> index 64bdd8b7754b5..50ce5a5e4fc24 100644
> --- a/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk356x-base.dtsi

[ ... ]

> @@ -948,6 +955,37 @@ qos_rga_wr: qos@fe158300 {
>  		reg = <0x0 0xfe158300 0x0 0x20>;
>  	};
>  
> +	rknn_core_0: npu@fde40000 {
[Severity: Low]
This isn't a bug, but should the npu@fde40000 and iommu@fde4b000 nodes be
ordered alphanumerically by unit address? They appear to be inserted after
qos@fe158300 rather than being placed right before gpu@fde60000.

> +		compatible = "rockchip,rk3568-rknn-core";
> +		reg = <0x0 0xfde40000 0x0 0x1000>,
> +		      <0x0 0xfde41000 0x0 0x1000>,
> +		      <0x0 0xfde43000 0x0 0x1000>;
> +		reg-names = "pc", "cna", "core";
> +		interrupts = <GIC_SPI 151 IRQ_TYPE_LEVEL_HIGH>;
> +		clocks = <&cru ACLK_NPU>, <&cru HCLK_NPU>,
> +			 <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_PRE>;
[Severity: Critical]
This is a pre-existing issue, but enabling this node exposes a bug where
devm_clk_bulk_get() is called without initializing the .id fields in
rocket_core_init().

Since the connection ID is NULL, does clk_get() implicitly fall back to
of_clk_get(np, 0) for each loop iteration, mapping all four clock array
entries to ACLK_NPU?

Later, when rk3568_noc_init() dynamically scales the SCMI clock rate to
1 GHz:

    clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);

Does this mistakenly configure the rate of ACLK_NPU instead, which could
overclock the AXI bus and lead to immediate hardware instability?

> +		clock-names = "aclk", "hclk", "npu", "pclk";
> +		assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
> +		assigned-clock-rates = <200000000>;
> +		resets = <&cru SRST_A_NPU>, <&cru SRST_H_NPU>;
> +		reset-names = "srst_a", "srst_h";
> +		power-domains = <&power RK3568_PD_NPU>;
> +		rockchip,pmu = <&pmu>;
[Severity: High]
Could setting both power-domains and rockchip,pmu cause a race condition
with TF-A and corrupt hardware state?

The generic power domain framework inherently manages power gating and NoC
bus idle requests for this domain, and explicitly protects these PMU
transitions with pmu->mutex.

Concurrently, the rocket driver's rk3568_noc_init() issues direct, un-polled
regmap_write() operations to the exact same PMU offsets:

    regmap_write(pmu, 0xa0, BIT(1 + 16));
    ...
    regmap_write(pmu, 0x50, BIT(2 + 16));

Does bypassing GenPD's tracked state machine and the pmu->mutex lock create
a layering violation here?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=8

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
  2026-06-04 14:04   ` sashiko-bot
@ 2026-06-04 14:20   ` Tomeu Vizoso
  2026-06-05  1:59   ` Chaoyi Chen
  2 siblings, 0 replies; 35+ messages in thread
From: Tomeu Vizoso @ 2026-06-04 14:20 UTC (permalink / raw)
  To: Midgy BALON
  Cc: ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will, robin.murphy,
	dri-devel, linux-rockchip, devicetree, linux-arm-kernel, iommu,
	linux-kernel

On Thu, Jun 4, 2026 at 3:53 PM Midgy BALON <midgy971@gmail.com> wrote:
>
> On the RK356x v1 IOMMU, RK_MMU_AUTO_GATING resets to 0x3. Bit 1 enables
> auto clock-gating of the page-table walker, so the walker's AXI master
> loses its clock between transactions; a TLB-miss page walk then never
> completes and the IOMMU is left stuck (PAGING_ENABLED, never IDLE).
>
> Clear bit 1 (keeping bit 0, the slave-port gate) once paging is enabled
> so the walker keeps its clock. This is required for the RK3568 NPU MMU.

Hi,

I'm not able to review this patch myself, but maybe it can be
submitted separately while we work on the NPU bits?

Regards,

Tomeu

> Signed-off-by: Midgy BALON <midgy971@gmail.com>
> ---
>  drivers/iommu/rockchip-iommu.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 4da80136933c4..e3d8b6e9ca12b 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -953,6 +953,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>
>         ret = rk_iommu_enable_paging(iommu);
>
> +       if (!ret) {
> +               /*
> +                * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> +                * auto clock-gating; the walker's AXI master then loses its clock
> +                * between transactions and a TLB-miss page walk never completes,
> +                * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> +                * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> +                */
> +               for (i = 0; i < iommu->num_mmu; i++)
> +                       rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);
> +       }
> +
>  out_disable_stall:
>         rk_iommu_disable_stall(iommu);
>  out_disable_clocks:
> --
> 2.39.5
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
  2026-06-04 14:08   ` sashiko-bot
@ 2026-06-04 16:55     ` Conor Dooley
  0 siblings, 0 replies; 35+ messages in thread
From: Conor Dooley @ 2026-06-04 16:55 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: Midgy BALON, conor+dt, dri-devel, robh, devicetree

[-- Attachment #1: Type: text/plain, Size: 10212 bytes --]

On Thu, Jun 04, 2026 at 02:08:50PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> 
> New issues:
> - [High] The value written to `RK_MMU_AUTO_GATING` sets bit 1 instead of clearing it, contradicting the inline comment and hardware workaround logic.
> - [High] The error handling path for `noc_init()` failure leaks the IOMMU group reference and leaves the device in an unbalanced PM runtime state.
> - [High] A malformed device tree with excessive NPU core nodes triggers an out-of-bounds write in `rocket_probe()` array access.
> - [High] The NOC bus de-idle sequence via the PMU is missing from the runtime PM resume path.
> - [Medium] The `rockchip,pmu` property is functionally mandatory for RK3568 but is not added to the schema's conditional `required` list.
> 
> Pre-existing issues:
> - [High] The clock names in the `core->clks` array are uninitialized before being passed to `devm_clk_bulk_get()`, causing the framework to fetch the wrong clocks.
> --
> 
> commit 6c61da961bed3360a99f625d4a3b1599e7bfe685
> Author: Midgy BALON <midgy971@gmail.com>
> 
>     dt-bindings: npu: rockchip, rk3588-rknn-core: Add RK3568
>     
>     The RK3568 carries a single core of the same NVDLA-derived NPU IP as the
>     RK3588. Add its compatible. On RK3568 the NPU NOC bus-idle and power
>     gating are controlled through the system PMU rather than a dedicated
>     register block.
> 
> > diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> > index caca2a4903cd1..af9936b32e9fe 100644
> > --- a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> > +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> [ ... ]
> > @@ -75,7 +83,15 @@ required:
> >    - resets
> >    - reset-names
> >    - npu-supply
> > -  - sram-supply
> > +
> > +if:
> > +  properties:
> > +    compatible:
> > +      contains:
> > +        const: rockchip,rk3588-rknn-core
> > +then:
> > +  required:
> > +    - sram-supply
> 
> [Severity: Medium]
> Does the missing rockchip,pmu property from the conditional required block
> cause incomplete device trees to pass validation?
> 
> The rockchip,pmu property is functionally mandatory for the
> rockchip,rk3568-rknn-core variant as the driver unconditionally fails to probe
> without it. Missing this mandatory property here could allow invalid device
> trees to go undetected.

This looks valid. Additionally, the condition should forbid this
property on rk3588.
pw-bot: changes-requested
> 
> >  additionalProperties: false
> >  
> > diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c

Why is the bot commenting on driver stuff here?

Cheers,
Conor.

> > index b3b2fa9ba645a..634f78dfe2887 100644
> > --- a/drivers/accel/rocket/rocket_core.c
> > +++ b/drivers/accel/rocket/rocket_core.c
> [ ... ]
> > @@ -83,6 +89,15 @@ int rocket_core_init(struct rocket_core *core)
> >  		return err;
> >  	}
> >  
> > +	if (core->soc_data->noc_init) {
> > +		err = core->soc_data->noc_init(core);
> > +		if (err) {
> > +			pm_runtime_put_sync(dev);
> > +			rocket_job_fini(core);
> > +			return err;
> > +		}
> > +	}
> 
> [Severity: High]
> Does this error path leak the IOMMU group reference and leave the device in an
> unbalanced PM runtime state?
> 
> By returning directly after pm_runtime_put_sync() and rocket_job_fini(), the
> code misses calling rocket_core_fini() which handles iommu_group_put(),
> pm_runtime_disable(), and pm_runtime_dont_use_autosuspend().
> 
> > +
> >  	version = rocket_pc_readl(core, VERSION);
> >  	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
> >  
> > diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> > index 46e6ee1e72c5f..6186f4faa3a2a 100644
> > --- a/drivers/accel/rocket/rocket_device.c
> > +++ b/drivers/accel/rocket/rocket_device.c
> [ ... ]
> > @@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
> >  	ddev = &rdev->ddev;
> >  	dev_set_drvdata(dev, rdev);
> >  
> > -	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> > -		if (of_device_is_available(core_node))
> > -			num_cores++;
> > -
> > -	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> > +	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
> > +				   GFP_KERNEL);
> 
> [Severity: High]
> Can this code overflow rdev->cores[] if the device tree contains more NPU
> nodes than soc_data->num_cores?
> 
> With this change, the array size is fixed to soc_data->num_cores, but
> rocket_probe() in drivers/accel/rocket/rocket_drv.c is still called for every
> matching node. If there are too many nodes in the device tree, it appears
> rocket_probe() will write out of bounds here:
> 
> drivers/accel/rocket/rocket_drv.c:rocket_probe() {
>     ...
> 	unsigned int core = rdev->num_cores;
>     ...
> 	rdev->cores[core].rdev = rdev;
> 	rdev->cores[core].dev = &pdev->dev;
> 	rdev->cores[core].index = core;
>     ...
> }
> 
> >  	if (!rdev->cores)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> > index 8bbbce594883e..5a72d0b5f4dff 100644
> > --- a/drivers/accel/rocket/rocket_drv.c
> > +++ b/drivers/accel/rocket/rocket_drv.c
> [ ... ]
> > @@ -213,8 +219,64 @@ static void rocket_remove(struct platform_device *pdev)
> >  	}
> >  }
> >  
> > +/*
> > + * On RK3568 the NPU NOC bus is gated and idle out of reset and must be
> > + * powered on and de-idled through the PMU before the NPU is reachable.  PMU
> > + * registers use a write-mask protocol: the upper 16 bits enable writes to the
> > + * matching lower 16 bits.
> > + *
> > + * The NPU's high-speed clock is a PVTPLL managed by TF-A via SCMI and must be
> > + * running before the NOC acknowledges the de-idle request.  Force a real SCMI
> > + * rate change (an intermediate rate defeats the clock framework's
> > + * unchanged-rate shortcut) now that the power domain is on and clocks enabled.
> > + */
> > +#define ROCKET_RK3568_SCMI_CLK	2
> > +
> > +static int rk3568_noc_init(struct rocket_core *core)
> > +{
> > +	struct regmap *pmu;
> > +	unsigned int val;
> > +	int ret;
> > +
> > +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 600000000UL);
> > +	clk_set_rate(core->clks[ROCKET_RK3568_SCMI_CLK].clk, 1000000000UL);
> 
> [Severity: High]
> This is a pre-existing issue, but does this unintentionally alter the system
> aclk rate?
> 
> Looking at rocket_core_init(), the core->clks array is passed to
> devm_clk_bulk_get() without initializing the id strings:
> 
> drivers/accel/rocket/rocket_core.c:rocket_core_init() {
>     ...
> 	err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
>     ...
> }
> 
> Because the id fields are NULL, the framework fetches the first DT clock
> (aclk) for all elements. Calling clk_set_rate() here seems to alter the
> system aclk frequency instead of the intended SCMI clock.
> 
> > +
> > +	pmu = syscon_regmap_lookup_by_phandle(core->dev->of_node, "rockchip,pmu");
> > +	if (IS_ERR(pmu))
> > +		return dev_err_probe(core->dev, PTR_ERR(pmu),
> > +				     "failed to get PMU regmap\n");
> > +
> > +	/* Power on the NPU power domain (PWR_GATE_SFTCON bit 1 = 0). */
> > +	regmap_write(pmu, 0xa0, BIT(1 + 16));
> > +
> > +	/* Disable NPU NOC auto-idle (NOC_AUTO_CON0 bit 2). */
> > +	regmap_write(pmu, 0x70, BIT(2 + 16));
> > +
> > +	/* Request NPU bus de-idle (BUS_IDLE_SFTCON0 bit 2 = 0). */
> > +	regmap_write(pmu, 0x50, BIT(2 + 16));
> > +
> > +	/* Wait for the bus to report active (BUS_IDLE_ST bit 2 = 0). */
> > +	ret = regmap_read_poll_timeout(pmu, 0x68, val, !(val & BIT(2)), 10, 1000);
> > +	if (ret)
> > +		dev_err(core->dev, "timed out waiting for NPU bus de-idle\n");
> > +
> > +	return ret;
> > +}
> 
> [Severity: High]
> Is a NOC bus de-idle sequence required in the runtime PM resume path as well?
> 
> If the generic power domain (genpd) powers off the NPU during runtime suspend,
> the NOC bus idle state would reset. Currently, rocket_device_runtime_resume()
> enables clocks but doesn't repeat this PMU de-idle sequence, which could leave
> the AXI bus gated and cause subsequent hardware accesses to hang or crash.
> 
> > +
> >  static const struct rocket_soc_data rk3568_soc_data = {
> >  	.num_cores = 1,
> >  	.dma_bits = 32,
> >  	.noc_init = rk3568_noc_init,
> >  };
> >  
> > diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> > index 0013cf196c573..7d40ecb26123f 100644
> > --- a/drivers/iommu/rockchip-iommu.c
> > +++ b/drivers/iommu/rockchip-iommu.c
> > @@ -952,6 +952,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
> >  
> >  	ret = rk_iommu_enable_paging(iommu);
> >  
> > +	if (!ret) {
> > +		/*
> > +		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> > +		 * auto clock-gating; the walker's AXI master then loses its clock
> > +		 * between transactions and a TLB-miss page walk never completes,
> > +		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> > +		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> > +		 */
> > +		for (i = 0; i < iommu->num_mmu; i++)
> > +			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);
> 
> [Severity: High]
> Will this write operation inadvertently set bit 1 instead of clearing it?
> 
> Writing 0x2 (binary 10) sets bit 1 and clears bit 0, which contradicts the
> inline comment indicating that bit 1 should be cleared and bit 0 kept. This
> appears to enable the buggy page-walker auto clock-gating and disable the
> slave-port gate.
> 
> > +	}
> > +
> >  out_disable_stall:
> >  	rk_iommu_disable_stall(iommu);
> >  out_disable_clocks:
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260604135255.62682-1-midgy971@gmail.com?part=7

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
                   ` (8 preceding siblings ...)
  2026-06-04 13:52 ` [RFC PATCH v3 9/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU Midgy BALON
@ 2026-06-05  1:36 ` Chaoyi Chen
  2026-06-07 21:03   ` Midgy Balon
  9 siblings, 1 reply; 35+ messages in thread
From: Chaoyi Chen @ 2026-06-05  1:36 UTC (permalink / raw)
  To: Midgy BALON
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Hello Midgy,

On 6/4/2026 9:52 PM, Midgy BALON wrote:
> RFC, not for merge. End-to-end inference does not produce correct output
> yet (see Status), so per the v2 discussion this is a request for design
> feedback. It now probes, attaches, and submits cleanly on a stock
> v7.1-rc6 tree; what remains is one hardware-internal issue.
> 
> The RK3568 has a single NVDLA-derived NPU core, the same IP family as the
> RK3588 NPU the driver already supports; the register layout matches. The
> RK3568 differences are a 32-bit NPU AXI/IOMMU (vs 40-bit) and explicit
> PVTPLL/PMU bring-up to power and de-idle the NPU before it is reachable.
> 
> Patches:
>   1-2  rocket: per-SoC data struct, then derive DMA width and core count
>        from match data (refactors, no functional change).
>   3    rocket: RK3568 SoC data + PVTPLL/PMU/NOC bring-up.
>   4    rocket: reset the NPU before detaching the IOMMU on a job timeout
>        (the detach otherwise stalls a wedged AXI master and WARNs).
>   5    rocket: keep the IOMMU domain attached across jobs instead of
>        re-attaching per job (the per-job rk_iommu handshake on the idle
>        NPU MMU is slow and noisy).
>   6    iommu/rockchip: clear AUTO_GATING bit 1 on the RK356x v1 IOMMU so
>        the page-walker keeps its clock (else a TLB-miss walk never
>        completes).
>   7    dt-bindings: add the RK3568 NPU compatible.
>   8-9  arm64 dts: add the NPU and its IOMMU, and enable them on ROCK 3B.
> 
> Dependency. The NPU MMU is rockchip-iommu v1 (32-bit) while the rest of
> the RK3568 uses v2 (40-bit). They cannot coexist until the driver carries
> per-device ops; this series is developed on top of Simon Xue's
> "iommu/rockchip: Drop global rk_ops in favor of per-device ops" [1].
> Without it the NPU IOMMU fails to probe on a full RK3568 boot.
>

Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than
v1, implying it should support 40-bit PAs. Nevertheless, please note that
the upper limit for DTE is 32 bits.

> Power bring-up. The NPU is brought up through the power-domain layer (no
> driver hack): the NPU power-domain keeps its clocks but drops the pm_qos
> phandle (qos_npu sits behind the gated NPU NoC, so genpd's power-off QoS
> save faults reading it), and vdd_npu is marked always-on so the rail is
> up before genpd de-idles the NoC at power-on. The PMU de-idle then ACKs
> without PVTPLL running; PVTPLL is only needed for compute.
>

Can these operations not be completed via the pmdomain driver?
If some operations are controlled by TF-A, are you using open
source TF-A? Thank you.

> Status. On v7.1-rc6 the driver probes, creates /dev/accel/accel0,
> attaches an IOMMU domain, and submits jobs; the program controller
> fetches and broadcasts the command list. Inference output is still wrong,
> and the cause is split across three layers:
>   - kernel (this series): the RK3568 differences appear handled;
>   - mesa/Teflon userspace: still emits RK3588-tuned config, wrong for
>     RK3568 (to be filed separately on mesa-dev);
>   - hardware: with corrected config the NPU's DMA reads the full input
>     and weight tensors (confirmed via its DMA bandwidth counters), but
>     the MAC/output stage never completes, the job times out, and the
>     output stays at the buffer's zero-point. I have not found the missing
>     step; it is not in the command list (replaying the vendor's
>     byte-exact command list behaves the same). Pointers welcome,
>     especially from anyone with RK3568 NPU experience.
> 
> Known residual. On the first IOMMU attach the NPU MMU is idle with paging
> already enabled; the rk_iommu stall/reset handshake does not complete in
> that state and logs one burst of timeouts before the (kept) domain
> settles. It is harmless here because the job times out regardless, but it
> points at an idle-MMU reconfiguration corner the rk_iommu code does not
> handle on this block.
> 
> [1] https://lore.kernel.org/linux-rockchip/20260310105303.128859-1-xxm@rock-chips.com/
> 
> Changes since v2:
>   - Tagged RFC; now tested on a stock v7.1-rc6 tree.
>   - Bring-up moved into the power-domain/DT layer (no initcall hack).
>   - Added the IOMMU detach-on-timeout and attach-once driver fixes.
>   - Split the driver patch (Heiko): soc_data / match-data / RK3568.
>   - Derive DMA width and core count from match data; drop the DT rescans.
>   - Binding describes the hardware; added the missing $ref on rockchip,pmu.
>   - Disclosed the per-device-ops IOMMU dependency.
> 
> Midgy BALON (9):
>   accel: rocket: Introduce per-SoC rocket_soc_data
>   accel: rocket: Derive DMA width and core count from match data
>   accel: rocket: Add RK3568 SoC support
>   accel: rocket: Reset the NPU before detaching the IOMMU on timeout
>   accel: rocket: Keep the IOMMU domain attached across jobs
>   iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
>   dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
>   arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
>   arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU
> 
>  .../npu/rockchip,rk3588-rknn-core.yaml        | 18 ++++-
>  .../boot/dts/rockchip/rk3568-rock-3b.dts      | 14 +++-
>  arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 +++++++++++
>  drivers/accel/rocket/rocket_core.c            | 22 ++++++-
>  drivers/accel/rocket/rocket_core.h            | 19 ++++++
>  drivers/accel/rocket/rocket_device.c          | 15 ++---
>  drivers/accel/rocket/rocket_device.h          |  3 +-
>  drivers/accel/rocket/rocket_drv.c             | 66 ++++++++++++++++++-
>  drivers/accel/rocket/rocket_job.c             | 35 ++++++++--
>  drivers/iommu/rockchip-iommu.c                | 12 ++++
>  10 files changed, 219 insertions(+), 23 deletions(-)
> 
> 
> base-commit: 52c800fdcf11888ebeb50c3d707f782cc15b66eb

-- 
Best, 
Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
  2026-06-04 14:04   ` sashiko-bot
  2026-06-04 14:20   ` Tomeu Vizoso
@ 2026-06-05  1:59   ` Chaoyi Chen
  2026-06-07 21:05     ` Midgy Balon
  2 siblings, 1 reply; 35+ messages in thread
From: Chaoyi Chen @ 2026-06-05  1:59 UTC (permalink / raw)
  To: Midgy BALON
  Cc: Simon Xue, tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro,
	will, robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Hello Midgy,

On 6/4/2026 9:52 PM, Midgy BALON wrote:
> On the RK356x v1 IOMMU, RK_MMU_AUTO_GATING resets to 0x3. Bit 1 enables
> auto clock-gating of the page-table walker, so the walker's AXI master
> loses its clock between transactions; a TLB-miss page walk then never
> completes and the IOMMU is left stuck (PAGING_ENABLED, never IDLE).
> 
> Clear bit 1 (keeping bit 0, the slave-port gate) once paging is enabled
> so the walker keeps its clock. This is required for the RK3568 NPU MMU.
> 
> Signed-off-by: Midgy BALON <midgy971@gmail.com>
> ---
>  drivers/iommu/rockchip-iommu.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 4da80136933c4..e3d8b6e9ca12b 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -953,6 +953,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>  
>  	ret = rk_iommu_enable_paging(iommu);
>  
> +	if (!ret) {
> +		/*
> +		 * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> +		 * auto clock-gating; the walker's AXI master then loses its clock
> +		 * between transactions and a TLB-miss page walk never completes,
> +		 * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> +		 * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> +		 */
> +		for (i = 0; i < iommu->num_mmu; i++)
> +			rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);
> +	}
> +
>  out_disable_stall:
>  	rk_iommu_disable_stall(iommu);
>  out_disable_clocks:

As I said, it is v2. Could you please try using the code below
instead and see if it works? Thank you.

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 0013cf196c57..89e3a83a0251 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -930,6 +930,7 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
        struct iommu_domain *domain = iommu->domain;
        struct rk_iommu_domain *rk_domain = to_rk_domain(domain);
        int ret, i;
+       u32 auto_gate;
 
        ret = clk_bulk_enable(iommu->num_clocks, iommu->clocks);
        if (ret)
@@ -948,6 +949,10 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
                               rk_ops->mk_dtentries(rk_domain->dt_dma));
                rk_iommu_base_command(iommu->bases[i], RK_MMU_CMD_ZAP_CACHE);
                rk_iommu_write(iommu->bases[i], RK_MMU_INT_MASK, RK_MMU_IRQ_MASK);
+
+               auto_gate = rk_iommu_read(iommu->bases[i], RK_MMU_AUTO_GATING);
+               auto_gate |= BIT(31);
+               rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, auto_gate);
        }
 
        ret = rk_iommu_enable_paging(iommu);

-- 
Best, 
Chaoyi

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-05  1:36 ` [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Chaoyi Chen
@ 2026-06-07 21:03   ` Midgy Balon
  2026-06-08  1:40     ` Chaoyi Chen
  0 siblings, 1 reply; 35+ messages in thread
From: Midgy Balon @ 2026-06-07 21:03 UTC (permalink / raw)
  To: Chaoyi Chen
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Hi Chaoyi,

Thanks a lot for looking at this -- input from Rockchip is exactly what this
series needs.

> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than v1,
> implying it should support 40-bit PAs. Nevertheless, please note that the
> upper limit for DTE is 32 bits.

Understood, and that 32-bit-DTE note is the crux of the trouble I had, so let
me lay out what I see and ask how you'd prefer to solve it.

The mainline node is already v2 (rockchip,rk3568-iommu in rk356x-base.dtsi).
The problem on this 8 GiB board: with the v2 ops the page-table allocations
(gfp_flags == 0) can land above 4 GiB, so the DTE ends up > 32 bits and the
NPU's first translation faults with DMA_READ_ERROR. To work around that I had
switched the NPU MMU to the v1 compatible (rockchip,iommu), whose ops set
GFP_DMA32 and keep the DTE sub-4 GiB. That works in isolation, but because the
driver keeps a single global rk_ops, a v1 NPU MMU then trips
WARN_ON(rk_ops != ops) against the SoC's v2 instances (VOP/VDEC), which is why
I based the series on Simon's per-device-ops work.

So my question: with per-device ops in place, what's the intended way to keep
the NPU MMU on v2 *and* cap its DTE at 32 bits on boards with >4 GiB of RAM?
A v2 ops variant carrying GFP_DMA32 for this device, or is there a register/
config bit that constrains the DTE address? I'd rather follow the Rockchip
intent here than carry the v1 workaround. (Simon, cc'd -- this is right next to
your per-device-ops series.)

> Can these operations not be completed via the pmdomain driver?
> If some operations are controlled by TF-A, are you using open source TF-A?

Most of it is in pmdomain already. Power-on and NoC de-idle are done by the
RK3568 NPU power domain (genpd) at power-on -- the driver no longer pokes the
PMU directly. Two things remain outside it:

 - vdd_npu: I mark it regulator-always-on in DT rather than wiring it as the
   domain's domain-supply, because as a domain-supply it created a device-link
   to the I2C PMIC (rk809) and genpd's power-off QoS-save path then hung
   reading the NPU QoS registers behind the (gated) NoC. If there's a clean way
   to let genpd own vdd_npu without that I2C ordering deadlock I'd much prefer
   that -- pointers welcome.

 - the NPU compute clock (PVTPLL): set from the driver via SCMI, and only
   needed for actual compute, not for bring-up.

One more pmdomain observation from testing, possibly relevant to how the NPU
domain should be modelled: the domain's power-off/on cycle doesn't reliably
re-de-idle the NoC. If the NPU is probed after genpd has already powered the
(unused) domain off, the power-on de-idle fails ("failed to set idle on domain
'npu'") and the NPU IOMMU then takes an external abort on its first MMIO access.
Probing the NPU before the unused-domain power-off, or marking the domain
always-on, both avoid it. Is the NoC de-idle expected to work on a genpd
re-power here, or should this domain effectively stay on?

On TF-A: yes -- bl31 is built from upstream arm-trusted-firmware
(github.com/ARM-software/arm-trusted-firmware, RK3568 platform), providing PSCI
and the SCMI clock service. The only closed blob in the boot chain is Rockchip's
DDR init (rkbin), which is the standard situation for mainline RK356x.

Kind regards,
Midgy

Le ven. 5 juin 2026 à 03:36, Chaoyi Chen <chaoyi.chen@rock-chips.com> a écrit :
>
> Hello Midgy,
>
> On 6/4/2026 9:52 PM, Midgy BALON wrote:
> > RFC, not for merge. End-to-end inference does not produce correct output
> > yet (see Status), so per the v2 discussion this is a request for design
> > feedback. It now probes, attaches, and submits cleanly on a stock
> > v7.1-rc6 tree; what remains is one hardware-internal issue.
> >
> > The RK3568 has a single NVDLA-derived NPU core, the same IP family as the
> > RK3588 NPU the driver already supports; the register layout matches. The
> > RK3568 differences are a 32-bit NPU AXI/IOMMU (vs 40-bit) and explicit
> > PVTPLL/PMU bring-up to power and de-idle the NPU before it is reachable.
> >
> > Patches:
> >   1-2  rocket: per-SoC data struct, then derive DMA width and core count
> >        from match data (refactors, no functional change).
> >   3    rocket: RK3568 SoC data + PVTPLL/PMU/NOC bring-up.
> >   4    rocket: reset the NPU before detaching the IOMMU on a job timeout
> >        (the detach otherwise stalls a wedged AXI master and WARNs).
> >   5    rocket: keep the IOMMU domain attached across jobs instead of
> >        re-attaching per job (the per-job rk_iommu handshake on the idle
> >        NPU MMU is slow and noisy).
> >   6    iommu/rockchip: clear AUTO_GATING bit 1 on the RK356x v1 IOMMU so
> >        the page-walker keeps its clock (else a TLB-miss walk never
> >        completes).
> >   7    dt-bindings: add the RK3568 NPU compatible.
> >   8-9  arm64 dts: add the NPU and its IOMMU, and enable them on ROCK 3B.
> >
> > Dependency. The NPU MMU is rockchip-iommu v1 (32-bit) while the rest of
> > the RK3568 uses v2 (40-bit). They cannot coexist until the driver carries
> > per-device ops; this series is developed on top of Simon Xue's
> > "iommu/rockchip: Drop global rk_ops in favor of per-device ops" [1].
> > Without it the NPU IOMMU fails to probe on a full RK3568 boot.
> >
>
> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than
> v1, implying it should support 40-bit PAs. Nevertheless, please note that
> the upper limit for DTE is 32 bits.
>
> > Power bring-up. The NPU is brought up through the power-domain layer (no
> > driver hack): the NPU power-domain keeps its clocks but drops the pm_qos
> > phandle (qos_npu sits behind the gated NPU NoC, so genpd's power-off QoS
> > save faults reading it), and vdd_npu is marked always-on so the rail is
> > up before genpd de-idles the NoC at power-on. The PMU de-idle then ACKs
> > without PVTPLL running; PVTPLL is only needed for compute.
> >
>
> Can these operations not be completed via the pmdomain driver?
> If some operations are controlled by TF-A, are you using open
> source TF-A? Thank you.
>
> > Status. On v7.1-rc6 the driver probes, creates /dev/accel/accel0,
> > attaches an IOMMU domain, and submits jobs; the program controller
> > fetches and broadcasts the command list. Inference output is still wrong,
> > and the cause is split across three layers:
> >   - kernel (this series): the RK3568 differences appear handled;
> >   - mesa/Teflon userspace: still emits RK3588-tuned config, wrong for
> >     RK3568 (to be filed separately on mesa-dev);
> >   - hardware: with corrected config the NPU's DMA reads the full input
> >     and weight tensors (confirmed via its DMA bandwidth counters), but
> >     the MAC/output stage never completes, the job times out, and the
> >     output stays at the buffer's zero-point. I have not found the missing
> >     step; it is not in the command list (replaying the vendor's
> >     byte-exact command list behaves the same). Pointers welcome,
> >     especially from anyone with RK3568 NPU experience.
> >
> > Known residual. On the first IOMMU attach the NPU MMU is idle with paging
> > already enabled; the rk_iommu stall/reset handshake does not complete in
> > that state and logs one burst of timeouts before the (kept) domain
> > settles. It is harmless here because the job times out regardless, but it
> > points at an idle-MMU reconfiguration corner the rk_iommu code does not
> > handle on this block.
> >
> > [1] https://lore.kernel.org/linux-rockchip/20260310105303.128859-1-xxm@rock-chips.com/
> >
> > Changes since v2:
> >   - Tagged RFC; now tested on a stock v7.1-rc6 tree.
> >   - Bring-up moved into the power-domain/DT layer (no initcall hack).
> >   - Added the IOMMU detach-on-timeout and attach-once driver fixes.
> >   - Split the driver patch (Heiko): soc_data / match-data / RK3568.
> >   - Derive DMA width and core count from match data; drop the DT rescans.
> >   - Binding describes the hardware; added the missing $ref on rockchip,pmu.
> >   - Disclosed the per-device-ops IOMMU dependency.
> >
> > Midgy BALON (9):
> >   accel: rocket: Introduce per-SoC rocket_soc_data
> >   accel: rocket: Derive DMA width and core count from match data
> >   accel: rocket: Add RK3568 SoC support
> >   accel: rocket: Reset the NPU before detaching the IOMMU on timeout
> >   accel: rocket: Keep the IOMMU domain attached across jobs
> >   iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
> >   dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
> >   arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU
> >   arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU
> >
> >  .../npu/rockchip,rk3588-rknn-core.yaml        | 18 ++++-
> >  .../boot/dts/rockchip/rk3568-rock-3b.dts      | 14 +++-
> >  arch/arm64/boot/dts/rockchip/rk356x-base.dtsi | 38 +++++++++++
> >  drivers/accel/rocket/rocket_core.c            | 22 ++++++-
> >  drivers/accel/rocket/rocket_core.h            | 19 ++++++
> >  drivers/accel/rocket/rocket_device.c          | 15 ++---
> >  drivers/accel/rocket/rocket_device.h          |  3 +-
> >  drivers/accel/rocket/rocket_drv.c             | 66 ++++++++++++++++++-
> >  drivers/accel/rocket/rocket_job.c             | 35 ++++++++--
> >  drivers/iommu/rockchip-iommu.c                | 12 ++++
> >  10 files changed, 219 insertions(+), 23 deletions(-)
> >
> >
> > base-commit: 52c800fdcf11888ebeb50c3d707f782cc15b66eb
>
> --
> Best,
> Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-05  1:59   ` Chaoyi Chen
@ 2026-06-07 21:05     ` Midgy Balon
  2026-06-08  1:45       ` Chaoyi Chen
  0 siblings, 1 reply; 35+ messages in thread
From: Midgy Balon @ 2026-06-07 21:05 UTC (permalink / raw)
  To: Chaoyi Chen
  Cc: Simon Xue, tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro,
	will, robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Hi Chaoyi,

> As I said, it is v2. Could you please try using the code below instead and
> see if it works?
> [ auto_gate = read(RK_MMU_AUTO_GATING); auto_gate |= BIT(31); write(...) ]

Thanks -- that's clearly the right shape (read-modify-write, before paging is
enabled, keeping the reset value instead of my clobbering 0x2).

I rebuilt v7.1-rc6 (with the rocket RK3568 series + your per-device-ops work)
using your bit-31 version and tested it on a ROCK 3B: the NPU IOMMU comes up and
services the NPU's DMA cleanly -- the NPU probes, attaches its domain, and runs
repeated conv submissions with no DMA_READ_ERROR and no page-walk stall. No
regression from the write.

To be precise about what I can and can't show: I tested both ways on v7.1-rc6 --
with your bit-31 write, and on the reset value (0x3) -- and the NPU
IOMMU services
the NPU's reads with zero faults in both cases (no DMA_READ_ERROR, no page-walk
stall). So I don't have a failing baseline here that bit-31 visibly
fixes. Is the
AUTO_GATING write needed on current mainline, or only under conditions I'm not
reproducing (a particular traffic pattern / silicon rev)? I'll keep the patch in
your form unless you'd prefer to drop it.

One question so I document it correctly: what does bit 31 of RK_MMU_AUTO_GATING
control on the v2 block -- is it a master "disable internal auto clock-gating"
for the page-table walker (i.e. so a TLB-miss walk's AXI master keeps its clock
to completion)? The RK3568 TRM I have doesn't cover the IOMMU registers, so a
one-line description would let me write an accurate comment.



Kind regards,
Midgy

Le ven. 5 juin 2026 à 03:59, Chaoyi Chen <chaoyi.chen@rock-chips.com> a écrit :
>
> Hello Midgy,
>
> On 6/4/2026 9:52 PM, Midgy BALON wrote:
> > On the RK356x v1 IOMMU, RK_MMU_AUTO_GATING resets to 0x3. Bit 1 enables
> > auto clock-gating of the page-table walker, so the walker's AXI master
> > loses its clock between transactions; a TLB-miss page walk then never
> > completes and the IOMMU is left stuck (PAGING_ENABLED, never IDLE).
> >
> > Clear bit 1 (keeping bit 0, the slave-port gate) once paging is enabled
> > so the walker keeps its clock. This is required for the RK3568 NPU MMU.
> >
> > Signed-off-by: Midgy BALON <midgy971@gmail.com>
> > ---
> >  drivers/iommu/rockchip-iommu.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> > index 4da80136933c4..e3d8b6e9ca12b 100644
> > --- a/drivers/iommu/rockchip-iommu.c
> > +++ b/drivers/iommu/rockchip-iommu.c
> > @@ -953,6 +953,18 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
> >
> >       ret = rk_iommu_enable_paging(iommu);
> >
> > +     if (!ret) {
> > +             /*
> > +              * RK356x v1 IOMMU: RK_MMU_AUTO_GATING bit 1 enables page-walker
> > +              * auto clock-gating; the walker's AXI master then loses its clock
> > +              * between transactions and a TLB-miss page walk never completes,
> > +              * leaving the IOMMU stuck (PAGING_ENABLED, never IDLE).  Clear
> > +              * bit 1 (keep bit 0, the slave-port gate) once paging is enabled.
> > +              */
> > +             for (i = 0; i < iommu->num_mmu; i++)
> > +                     rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, 0x2);
> > +     }
> > +
> >  out_disable_stall:
> >       rk_iommu_disable_stall(iommu);
> >  out_disable_clocks:
>
> As I said, it is v2. Could you please try using the code below
> instead and see if it works? Thank you.
>
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 0013cf196c57..89e3a83a0251 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -930,6 +930,7 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>         struct iommu_domain *domain = iommu->domain;
>         struct rk_iommu_domain *rk_domain = to_rk_domain(domain);
>         int ret, i;
> +       u32 auto_gate;
>
>         ret = clk_bulk_enable(iommu->num_clocks, iommu->clocks);
>         if (ret)
> @@ -948,6 +949,10 @@ static int rk_iommu_enable(struct rk_iommu *iommu)
>                                rk_ops->mk_dtentries(rk_domain->dt_dma));
>                 rk_iommu_base_command(iommu->bases[i], RK_MMU_CMD_ZAP_CACHE);
>                 rk_iommu_write(iommu->bases[i], RK_MMU_INT_MASK, RK_MMU_IRQ_MASK);
> +
> +               auto_gate = rk_iommu_read(iommu->bases[i], RK_MMU_AUTO_GATING);
> +               auto_gate |= BIT(31);
> +               rk_iommu_write(iommu->bases[i], RK_MMU_AUTO_GATING, auto_gate);
>         }
>
>         ret = rk_iommu_enable_paging(iommu);
>
> --
> Best,
> Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-07 21:03   ` Midgy Balon
@ 2026-06-08  1:40     ` Chaoyi Chen
  2026-06-08  8:05       ` Midgy Balon
  0 siblings, 1 reply; 35+ messages in thread
From: Chaoyi Chen @ 2026-06-08  1:40 UTC (permalink / raw)
  To: Midgy Balon
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao

Hi Midgy,

On 6/8/2026 5:03 AM, Midgy Balon wrote:
> Hi Chaoyi,
> 
> Thanks a lot for looking at this -- input from Rockchip is exactly what this
> series needs.
> 
>> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than v1,
>> implying it should support 40-bit PAs. Nevertheless, please note that the
>> upper limit for DTE is 32 bits.
> 
> Understood, and that 32-bit-DTE note is the crux of the trouble I had, so let
> me lay out what I see and ask how you'd prefer to solve it.
> 
> The mainline node is already v2 (rockchip,rk3568-iommu in rk356x-base.dtsi).
> The problem on this 8 GiB board: with the v2 ops the page-table allocations
> (gfp_flags == 0) can land above 4 GiB, so the DTE ends up > 32 bits and the
> NPU's first translation faults with DMA_READ_ERROR. To work around that I had
> switched the NPU MMU to the v1 compatible (rockchip,iommu), whose ops set
> GFP_DMA32 and keep the DTE sub-4 GiB. That works in isolation, but because the
> driver keeps a single global rk_ops, a v1 NPU MMU then trips
> WARN_ON(rk_ops != ops) against the SoC's v2 instances (VOP/VDEC), which is why
> I based the series on Simon's per-device-ops work.
> 
> So my question: with per-device ops in place, what's the intended way to keep
> the NPU MMU on v2 *and* cap its DTE at 32 bits on boards with >4 GiB of RAM?
> A v2 ops variant carrying GFP_DMA32 for this device, or is there a register/
> config bit that constrains the DTE address? I'd rather follow the Rockchip
> intent here than carry the v1 workaround. (Simon, cc'd -- this is right next to
> your per-device-ops series.)
>

If Simon's method works, please use it :)

>> Can these operations not be completed via the pmdomain driver?
>> If some operations are controlled by TF-A, are you using open source TF-A?
> 
> Most of it is in pmdomain already. Power-on and NoC de-idle are done by the
> RK3568 NPU power domain (genpd) at power-on -- the driver no longer pokes the
> PMU directly. Two things remain outside it:
> 
>  - vdd_npu: I mark it regulator-always-on in DT rather than wiring it as the
>    domain's domain-supply, because as a domain-supply it created a device-link
>    to the I2C PMIC (rk809) and genpd's power-off QoS-save path then hung
>    reading the NPU QoS registers behind the (gated) NoC. If there's a clean way
>    to let genpd own vdd_npu without that I2C ordering deadlock I'd much prefer
>    that -- pointers welcome.
>

Please refer to the patch below regarding the RK3588 NPU pmdomain.
In short, you need to set a "need_regulator" for the RK3568 NPU pmdomain.

https://lore.kernel.org/all/20251216055247.13150-1-rmxpzlb@gmail.com/

>  - the NPU compute clock (PVTPLL): set from the driver via SCMI, and only
>    needed for actual compute, not for bring-up.
> 
> One more pmdomain observation from testing, possibly relevant to how the NPU
> domain should be modelled: the domain's power-off/on cycle doesn't reliably
> re-de-idle the NoC. If the NPU is probed after genpd has already powered the
> (unused) domain off, the power-on de-idle fails ("failed to set idle on domain
> 'npu'") and the NPU IOMMU then takes an external abort on its first MMIO access.
> Probing the NPU before the unused-domain power-off, or marking the domain
> always-on, both avoid it. Is the NoC de-idle expected to work on a genpd
> re-power here, or should this domain effectively stay on?
>

Not quite sure what's going on with PVTPLL and NOC.
Maybe @Finley knows about this?

> On TF-A: yes -- bl31 is built from upstream arm-trusted-firmware
> (github.com/ARM-software/arm-trusted-firmware, RK3568 platform), providing PSCI
> and the SCMI clock service. The only closed blob in the boot chain is Rockchip's
> DDR init (rkbin), which is the standard situation for mainline RK356x.

-- 
Best, 
Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-07 21:05     ` Midgy Balon
@ 2026-06-08  1:45       ` Chaoyi Chen
  2026-06-08  3:40         ` Chaoyi Chen
  0 siblings, 1 reply; 35+ messages in thread
From: Chaoyi Chen @ 2026-06-08  1:45 UTC (permalink / raw)
  To: Midgy Balon
  Cc: Simon Xue, tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro,
	will, robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

Hi Midgy,

On 6/8/2026 5:05 AM, Midgy Balon wrote:
> Hi Chaoyi,
> 
>> As I said, it is v2. Could you please try using the code below instead and
>> see if it works?
>> [ auto_gate = read(RK_MMU_AUTO_GATING); auto_gate |= BIT(31); write(...) ]
> 
> Thanks -- that's clearly the right shape (read-modify-write, before paging is
> enabled, keeping the reset value instead of my clobbering 0x2).
> 
> I rebuilt v7.1-rc6 (with the rocket RK3568 series + your per-device-ops work)
> using your bit-31 version and tested it on a ROCK 3B: the NPU IOMMU comes up and
> services the NPU's DMA cleanly -- the NPU probes, attaches its domain, and runs
> repeated conv submissions with no DMA_READ_ERROR and no page-walk stall. No
> regression from the write.
> 
> To be precise about what I can and can't show: I tested both ways on v7.1-rc6 --
> with your bit-31 write, and on the reset value (0x3) -- and the NPU
> IOMMU services
> the NPU's reads with zero faults in both cases (no DMA_READ_ERROR, no page-walk
> stall). So I don't have a failing baseline here that bit-31 visibly
> fixes. Is the
> AUTO_GATING write needed on current mainline, or only under conditions I'm not
> reproducing (a particular traffic pattern / silicon rev)? I'll keep the patch in
> your form unless you'd prefer to drop it.
> 
> One question so I document it correctly: what does bit 31 of RK_MMU_AUTO_GATING
> control on the v2 block -- is it a master "disable internal auto clock-gating"
> for the page-table walker (i.e. so a TLB-miss walk's AXI master keeps its clock
> to completion)? The RK3568 TRM I have doesn't cover the IOMMU registers, so a
> one-line description would let me write an accurate comment.
> 

Glad to hear this works. Please refer to the commit below.

[0]: https://github.com/rockchip-linux/kernel/commit/7f8158fb41b5cc8e738aaeebc3637c50ebd74cae
[1]: https://github.com/rockchip-linux/kernel/commit/6a355e5f9a2069a2309e240791bc3aad63b7324e

-- 
Best, 
Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU
  2026-06-08  1:45       ` Chaoyi Chen
@ 2026-06-08  3:40         ` Chaoyi Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Chaoyi Chen @ 2026-06-08  3:40 UTC (permalink / raw)
  To: Midgy Balon
  Cc: Simon Xue, tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro,
	will, robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel

On 6/8/2026 9:45 AM, Chaoyi Chen wrote:
> Hi Midgy,
> 
> On 6/8/2026 5:05 AM, Midgy Balon wrote:
>> Hi Chaoyi,
>>
>>> As I said, it is v2. Could you please try using the code below instead and
>>> see if it works?
>>> [ auto_gate = read(RK_MMU_AUTO_GATING); auto_gate |= BIT(31); write(...) ]
>>
>> Thanks -- that's clearly the right shape (read-modify-write, before paging is
>> enabled, keeping the reset value instead of my clobbering 0x2).
>>
>> I rebuilt v7.1-rc6 (with the rocket RK3568 series + your per-device-ops work)
>> using your bit-31 version and tested it on a ROCK 3B: the NPU IOMMU comes up and
>> services the NPU's DMA cleanly -- the NPU probes, attaches its domain, and runs
>> repeated conv submissions with no DMA_READ_ERROR and no page-walk stall. No
>> regression from the write.
>>
>> To be precise about what I can and can't show: I tested both ways on v7.1-rc6 --
>> with your bit-31 write, and on the reset value (0x3) -- and the NPU
>> IOMMU services
>> the NPU's reads with zero faults in both cases (no DMA_READ_ERROR, no page-walk
>> stall). So I don't have a failing baseline here that bit-31 visibly
>> fixes. Is the
>> AUTO_GATING write needed on current mainline, or only under conditions I'm not
>> reproducing (a particular traffic pattern / silicon rev)? I'll keep the patch in
>> your form unless you'd prefer to drop it.
>>
>> One question so I document it correctly: what does bit 31 of RK_MMU_AUTO_GATING
>> control on the v2 block -- is it a master "disable internal auto clock-gating"
>> for the page-table walker (i.e. so a TLB-miss walk's AXI master keeps its clock
>> to completion)? The RK3568 TRM I have doesn't cover the IOMMU registers, so a
>> one-line description would let me write an accurate comment.
>>
> 
> Glad to hear this works. Please refer to the commit below.
> 
> [0]: https://github.com/rockchip-linux/kernel/commit/7f8158fb41b5cc8e738aaeebc3637c50ebd74cae
> [1]: https://github.com/rockchip-linux/kernel/commit/6a355e5f9a2069a2309e240791bc3aad63b7324e
> 

It looks like RGA needs this patch too, and it has already been merged :).

https://lore.kernel.org/all/20260428-spu-iommudtefix-v2-1-f592f579e508@pengutronix.de/

-- 
Best, 
Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-08  1:40     ` Chaoyi Chen
@ 2026-06-08  8:05       ` Midgy Balon
  2026-06-08  9:14         ` Midgy Balon
  0 siblings, 1 reply; 35+ messages in thread
From: Midgy Balon @ 2026-06-08  8:05 UTC (permalink / raw)
  To: Chaoyi Chen
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao

Hello Chaoyi,

Thanks -- this is exactly what I needed.

- v2/DTE: will do. I'll keep building on Simon's per-device-ops series -- with
  that in place the NPU MMU can use the 32-bit-DTE ops (the per-ops GFP_DMA32
  that's already in mainline) without the global rk_ops conflict. I'll
keep it as
  a stated dependency of the v4 cover letter.

- vdd_npu:  I'll switch the RK3568 NPU
  power domain to need_regulator + domain-supply = <&vdd_npu> and drop the
  regulator-always-on workaround. I suspect that's also the right fix for the
  power-off/on de-idle issue I described -- the always-on was really
just papering
  over the domain not being modelled with a regulator. I'll confirm on
the board.

- AUTO_GATING: thanks for the commit references -- I'll keep the bit-31
  read-modify-write form with your Suggested-by and write the comment
from those.
  For the record: on v7.1-rc6 the NPU MMU also completes translations
on the reset
  value (I couldn't reproduce a page-walk stall without the write), so I'll note
  in the commit that it matches the vendor clock-gating handling rather than
  fixing a failure I can reproduce here -- happy to drop it if the iommu
  maintainers would prefer.

- PVTPLL/NoC: I'll follow up with Finley. First I'll check whether the
  need_regulator change resolves the NoC re-power de-idle on its own;
if it still
  I'll bring him the details (the genpd power-on de-idle ack and the
  BUS_IDLE_ST state).

I'll send a v4 with these. Thanks again for the quick, detailed answers.

Kind regards,
Midgy

Le lun. 8 juin 2026 à 03:40, Chaoyi Chen <chaoyi.chen@rock-chips.com> a écrit :
>
> Hi Midgy,
>
> On 6/8/2026 5:03 AM, Midgy Balon wrote:
> > Hi Chaoyi,
> >
> > Thanks a lot for looking at this -- input from Rockchip is exactly what this
> > series needs.
> >
> >> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than v1,
> >> implying it should support 40-bit PAs. Nevertheless, please note that the
> >> upper limit for DTE is 32 bits.
> >
> > Understood, and that 32-bit-DTE note is the crux of the trouble I had, so let
> > me lay out what I see and ask how you'd prefer to solve it.
> >
> > The mainline node is already v2 (rockchip,rk3568-iommu in rk356x-base.dtsi).
> > The problem on this 8 GiB board: with the v2 ops the page-table allocations
> > (gfp_flags == 0) can land above 4 GiB, so the DTE ends up > 32 bits and the
> > NPU's first translation faults with DMA_READ_ERROR. To work around that I had
> > switched the NPU MMU to the v1 compatible (rockchip,iommu), whose ops set
> > GFP_DMA32 and keep the DTE sub-4 GiB. That works in isolation, but because the
> > driver keeps a single global rk_ops, a v1 NPU MMU then trips
> > WARN_ON(rk_ops != ops) against the SoC's v2 instances (VOP/VDEC), which is why
> > I based the series on Simon's per-device-ops work.
> >
> > So my question: with per-device ops in place, what's the intended way to keep
> > the NPU MMU on v2 *and* cap its DTE at 32 bits on boards with >4 GiB of RAM?
> > A v2 ops variant carrying GFP_DMA32 for this device, or is there a register/
> > config bit that constrains the DTE address? I'd rather follow the Rockchip
> > intent here than carry the v1 workaround. (Simon, cc'd -- this is right next to
> > your per-device-ops series.)
> >
>
> If Simon's method works, please use it :)
>
> >> Can these operations not be completed via the pmdomain driver?
> >> If some operations are controlled by TF-A, are you using open source TF-A?
> >
> > Most of it is in pmdomain already. Power-on and NoC de-idle are done by the
> > RK3568 NPU power domain (genpd) at power-on -- the driver no longer pokes the
> > PMU directly. Two things remain outside it:
> >
> >  - vdd_npu: I mark it regulator-always-on in DT rather than wiring it as the
> >    domain's domain-supply, because as a domain-supply it created a device-link
> >    to the I2C PMIC (rk809) and genpd's power-off QoS-save path then hung
> >    reading the NPU QoS registers behind the (gated) NoC. If there's a clean way
> >    to let genpd own vdd_npu without that I2C ordering deadlock I'd much prefer
> >    that -- pointers welcome.
> >
>
> Please refer to the patch below regarding the RK3588 NPU pmdomain.
> In short, you need to set a "need_regulator" for the RK3568 NPU pmdomain.
>
> https://lore.kernel.org/all/20251216055247.13150-1-rmxpzlb@gmail.com/
>
> >  - the NPU compute clock (PVTPLL): set from the driver via SCMI, and only
> >    needed for actual compute, not for bring-up.
> >
> > One more pmdomain observation from testing, possibly relevant to how the NPU
> > domain should be modelled: the domain's power-off/on cycle doesn't reliably
> > re-de-idle the NoC. If the NPU is probed after genpd has already powered the
> > (unused) domain off, the power-on de-idle fails ("failed to set idle on domain
> > 'npu'") and the NPU IOMMU then takes an external abort on its first MMIO access.
> > Probing the NPU before the unused-domain power-off, or marking the domain
> > always-on, both avoid it. Is the NoC de-idle expected to work on a genpd
> > re-power here, or should this domain effectively stay on?
> >
>
> Not quite sure what's going on with PVTPLL and NOC.
> Maybe @Finley knows about this?
>
> > On TF-A: yes -- bl31 is built from upstream arm-trusted-firmware
> > (github.com/ARM-software/arm-trusted-firmware, RK3568 platform), providing PSCI
> > and the SCMI clock service. The only closed blob in the boot chain is Rockchip's
> > DDR init (rkbin), which is the standard situation for mainline RK356x.
>
> --
> Best,
> Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-08  8:05       ` Midgy Balon
@ 2026-06-08  9:14         ` Midgy Balon
  2026-06-08  9:38           ` Chaoyi Chen
  0 siblings, 1 reply; 35+ messages in thread
From: Midgy Balon @ 2026-06-08  9:14 UTC (permalink / raw)
  To: Chaoyi Chen
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao

Hello Chaoyi,

Following up on the need_regulator suggestion -- I implemented and
tested it on the
board, and unfortunately it doesn't avoid the deadlock on RK3568; it
moves it from
boot to the NPU job submit.

What I did: gave the RK3568 NPU power domain a regulator (a DOMAIN_M_R
variant with
need_regulator = true), wired domain-supply = <&vdd_npu>, and dropped the
regulator-always-on workaround.

Boot is now clean and the NPU probes, but there is a warning during boot:

  rockchip-pm-domain ...: Failed to create device link (0x180) with supplier
  0-0020 for .../power-domain@6

(0-0020 is the rk809 PMIC that supplies vdd_npu.) Then on the first NPU job
submit the board hard-hangs with an RCU stall:

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     3-...!: (1 GPs behind) ...
  rcu: rcu_preempt kthread starved for 5115 jiffies! ... RCU_GP_WAIT_FQS(5)
  rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected

My reading: vdd_npu is on the rk809 *I2C* PMIC, so when genpd
enables/disables the
regulator during the NPU's runtime-PM power transition, the I2C
transfer runs in a
context that starves RCU and the box freezes. (I suspect
need_regulator is fine on
the RK3588 NPU because its supply isn't behind an I2C PMIC.) The always-on
workaround avoids this precisely because genpd never touches the I2C
regulator in
that path.

So: for an NPU domain whose supply is an I2C PMIC, is there a
supported way to let
genpd own the regulator without performing the I2C op in the
power-transition path
(a deferred/async regulator enable, or a flag), or should RK3568 keep vdd_npu as
regulator-always-on? For v4 I'll keep always-on unless there's a cleaner path.


Thanks,
Midgy

Le lun. 8 juin 2026 à 10:05, Midgy Balon <midgy971@gmail.com> a écrit :
>
> Hello Chaoyi,
>
> Thanks -- this is exactly what I needed.
>
> - v2/DTE: will do. I'll keep building on Simon's per-device-ops series -- with
>   that in place the NPU MMU can use the 32-bit-DTE ops (the per-ops GFP_DMA32
>   that's already in mainline) without the global rk_ops conflict. I'll
> keep it as
>   a stated dependency of the v4 cover letter.
>
> - vdd_npu:  I'll switch the RK3568 NPU
>   power domain to need_regulator + domain-supply = <&vdd_npu> and drop the
>   regulator-always-on workaround. I suspect that's also the right fix for the
>   power-off/on de-idle issue I described -- the always-on was really
> just papering
>   over the domain not being modelled with a regulator. I'll confirm on
> the board.
>
> - AUTO_GATING: thanks for the commit references -- I'll keep the bit-31
>   read-modify-write form with your Suggested-by and write the comment
> from those.
>   For the record: on v7.1-rc6 the NPU MMU also completes translations
> on the reset
>   value (I couldn't reproduce a page-walk stall without the write), so I'll note
>   in the commit that it matches the vendor clock-gating handling rather than
>   fixing a failure I can reproduce here -- happy to drop it if the iommu
>   maintainers would prefer.
>
> - PVTPLL/NoC: I'll follow up with Finley. First I'll check whether the
>   need_regulator change resolves the NoC re-power de-idle on its own;
> if it still
>   I'll bring him the details (the genpd power-on de-idle ack and the
>   BUS_IDLE_ST state).
>
> I'll send a v4 with these. Thanks again for the quick, detailed answers.
>
> Kind regards,
> Midgy
>
> Le lun. 8 juin 2026 à 03:40, Chaoyi Chen <chaoyi.chen@rock-chips.com> a écrit :
> >
> > Hi Midgy,
> >
> > On 6/8/2026 5:03 AM, Midgy Balon wrote:
> > > Hi Chaoyi,
> > >
> > > Thanks a lot for looking at this -- input from Rockchip is exactly what this
> > > series needs.
> > >
> > >> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than v1,
> > >> implying it should support 40-bit PAs. Nevertheless, please note that the
> > >> upper limit for DTE is 32 bits.
> > >
> > > Understood, and that 32-bit-DTE note is the crux of the trouble I had, so let
> > > me lay out what I see and ask how you'd prefer to solve it.
> > >
> > > The mainline node is already v2 (rockchip,rk3568-iommu in rk356x-base.dtsi).
> > > The problem on this 8 GiB board: with the v2 ops the page-table allocations
> > > (gfp_flags == 0) can land above 4 GiB, so the DTE ends up > 32 bits and the
> > > NPU's first translation faults with DMA_READ_ERROR. To work around that I had
> > > switched the NPU MMU to the v1 compatible (rockchip,iommu), whose ops set
> > > GFP_DMA32 and keep the DTE sub-4 GiB. That works in isolation, but because the
> > > driver keeps a single global rk_ops, a v1 NPU MMU then trips
> > > WARN_ON(rk_ops != ops) against the SoC's v2 instances (VOP/VDEC), which is why
> > > I based the series on Simon's per-device-ops work.
> > >
> > > So my question: with per-device ops in place, what's the intended way to keep
> > > the NPU MMU on v2 *and* cap its DTE at 32 bits on boards with >4 GiB of RAM?
> > > A v2 ops variant carrying GFP_DMA32 for this device, or is there a register/
> > > config bit that constrains the DTE address? I'd rather follow the Rockchip
> > > intent here than carry the v1 workaround. (Simon, cc'd -- this is right next to
> > > your per-device-ops series.)
> > >
> >
> > If Simon's method works, please use it :)
> >
> > >> Can these operations not be completed via the pmdomain driver?
> > >> If some operations are controlled by TF-A, are you using open source TF-A?
> > >
> > > Most of it is in pmdomain already. Power-on and NoC de-idle are done by the
> > > RK3568 NPU power domain (genpd) at power-on -- the driver no longer pokes the
> > > PMU directly. Two things remain outside it:
> > >
> > >  - vdd_npu: I mark it regulator-always-on in DT rather than wiring it as the
> > >    domain's domain-supply, because as a domain-supply it created a device-link
> > >    to the I2C PMIC (rk809) and genpd's power-off QoS-save path then hung
> > >    reading the NPU QoS registers behind the (gated) NoC. If there's a clean way
> > >    to let genpd own vdd_npu without that I2C ordering deadlock I'd much prefer
> > >    that -- pointers welcome.
> > >
> >
> > Please refer to the patch below regarding the RK3588 NPU pmdomain.
> > In short, you need to set a "need_regulator" for the RK3568 NPU pmdomain.
> >
> > https://lore.kernel.org/all/20251216055247.13150-1-rmxpzlb@gmail.com/
> >
> > >  - the NPU compute clock (PVTPLL): set from the driver via SCMI, and only
> > >    needed for actual compute, not for bring-up.
> > >
> > > One more pmdomain observation from testing, possibly relevant to how the NPU
> > > domain should be modelled: the domain's power-off/on cycle doesn't reliably
> > > re-de-idle the NoC. If the NPU is probed after genpd has already powered the
> > > (unused) domain off, the power-on de-idle fails ("failed to set idle on domain
> > > 'npu'") and the NPU IOMMU then takes an external abort on its first MMIO access.
> > > Probing the NPU before the unused-domain power-off, or marking the domain
> > > always-on, both avoid it. Is the NoC de-idle expected to work on a genpd
> > > re-power here, or should this domain effectively stay on?
> > >
> >
> > Not quite sure what's going on with PVTPLL and NOC.
> > Maybe @Finley knows about this?
> >
> > > On TF-A: yes -- bl31 is built from upstream arm-trusted-firmware
> > > (github.com/ARM-software/arm-trusted-firmware, RK3568 platform), providing PSCI
> > > and the SCMI clock service. The only closed blob in the boot chain is Rockchip's
> > > DDR init (rkbin), which is the standard situation for mainline RK356x.
> >
> > --
> > Best,
> > Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-08  9:14         ` Midgy Balon
@ 2026-06-08  9:38           ` Chaoyi Chen
  2026-06-09 11:11             ` Midgy Balon
  0 siblings, 1 reply; 35+ messages in thread
From: Chaoyi Chen @ 2026-06-08  9:38 UTC (permalink / raw)
  To: Midgy Balon
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao

Hi Midgy,

On 6/8/2026 5:14 PM, Midgy Balon wrote:
> Hello Chaoyi,
> 
> Following up on the need_regulator suggestion -- I implemented and
> tested it on the
> board, and unfortunately it doesn't avoid the deadlock on RK3568; it
> moves it from
> boot to the NPU job submit.
> 
> What I did: gave the RK3568 NPU power domain a regulator (a DOMAIN_M_R
> variant with
> need_regulator = true), wired domain-supply = <&vdd_npu>, and dropped the
> regulator-always-on workaround.
> 
> Boot is now clean and the NPU probes, but there is a warning during boot:
> 
>   rockchip-pm-domain ...: Failed to create device link (0x180) with supplier
>   0-0020 for .../power-domain@6
> 
> (0-0020 is the rk809 PMIC that supplies vdd_npu.) Then on the first NPU job
> submit the board hard-hangs with an RCU stall:
> 
>   rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>   rcu:     3-...!: (1 GPs behind) ...
>   rcu: rcu_preempt kthread starved for 5115 jiffies! ... RCU_GP_WAIT_FQS(5)
>   rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected
> 
> My reading: vdd_npu is on the rk809 *I2C* PMIC, so when genpd
> enables/disables the
> regulator during the NPU's runtime-PM power transition, the I2C
> transfer runs in a
> context that starves RCU and the box freezes. (I suspect
> need_regulator is fine on
> the RK3588 NPU because its supply isn't behind an I2C PMIC.) The always-on
> workaround avoids this precisely because genpd never touches the I2C
> regulator in
> that path.
>

No, they are all controlled by RK809.

And This looks werid. Is your rocket driver compiled as a module? 
Please try compiling it as a module. When is the above error printed? 
Please provide the complete boot log.

> So: for an NPU domain whose supply is an I2C PMIC, is there a
> supported way to let
> genpd own the regulator without performing the I2C op in the
> power-transition path
> (a deferred/async regulator enable, or a flag), or should RK3568 keep vdd_npu as
> regulator-always-on? For v4 I'll keep always-on unless there's a cleaner path.
> 

-- 
Best, 
Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-08  9:38           ` Chaoyi Chen
@ 2026-06-09 11:11             ` Midgy Balon
  2026-06-10  1:14               ` Chaoyi Chen
  0 siblings, 1 reply; 35+ messages in thread
From: Midgy Balon @ 2026-06-09 11:11 UTC (permalink / raw)
  To: Chaoyi Chen
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao

[-- Attachment #1: Type: text/plain, Size: 4983 bytes --]

Hello Chaoyi,

You were right - building rocket as a module fixes it. Thanks for the pointer.

I rebuilt with CONFIG_DRM_ACCEL_ROCKET=m (everything else the same:
need_regulator on
the RK3568 NPU power domain via a DOMAIN_M_R variant, domain-supply =
<&vdd_npu>, and the
regulator-always-on workaround dropped). The board now boots cleanly
and, more importantly,
an NPU job submit no longer hangs: I ran the test workload five times
with no RCU stall and
no freeze.

So with rocket=m the need_regulator approach works on RK3568, and I'll
keep it for v4
(domain-supply + need_regulator, instead of marking vdd_npu
always-on). rocket=m is the
normal configuration anyway; my earlier hang came from building it =y
in a self-contained
image, so it probed in the initcalls (around 2 s) and the genpd ->
I2C-PMIC regulator
transition ran before the system was ready. As a module it loads from
udev much later
(~6.8 s here), after the I2C controller and regulator core are fully up.

On your question of when the device-link error is printed - it is at
power-domain
controller probe, not at the rocket probe:

  [    2.700618] vdd_npu: Bringing 500000uV into 825000-825000uV
  [    2.749637] rockchip-pm-domain fdd90000.power-management:power-controller:
                 Failed to create device link (0x180) with supplier 0-0020 for
                 /power-management@fdd90000/power-controller/power-domain@6
  [    2.945955] platform fde40000.npu: Adding to iommu group 3
  ...
  [    6.840374] rocket: loading out-of-tree module taints kernel.
  [    6.877647] [drm] Initialized rocket 0.0.0 for rknn on minor 0
  [    6.879950] rocket fde40000.npu: Rockchip NPU core 0 version: 0

So the device-link to the rk809 PMIC (0-0020) fails to form at ~2.75
s, well before rocket
loads at ~6.8 s. It is non-fatal here - the vdd_npu rail is brought up
by the regulator core
and all jobs run - and there is no "failed to get ack on domain npu"
NoC warning this boot
(the always-on kernel had one). The complete boot log is attached.

Two notes / one question:
- This boot used fw_devlink=permissive on the command line. Is the
"Failed to create device
  link ... supplier 0-0020" at pmdomain probe expected/benign, or is
there a clean way to make
  it order correctly (so it also works without permissive, and a =y
build wouldn't deadlock in
  the initcalls)?
- (The convolution output is still uniform zero-point / the job times
out - that is the
  separate NPU compute-completion issue, unrelated to the power-domain
work. Finley, that is
  the one I flagged earlier re PVTPLL/NoC.)

Kind regards,
Midgy

Le lun. 8 juin 2026 à 11:38, Chaoyi Chen <chaoyi.chen@rock-chips.com> a écrit :
>
> Hi Midgy,
>
> On 6/8/2026 5:14 PM, Midgy Balon wrote:
> > Hello Chaoyi,
> >
> > Following up on the need_regulator suggestion -- I implemented and
> > tested it on the
> > board, and unfortunately it doesn't avoid the deadlock on RK3568; it
> > moves it from
> > boot to the NPU job submit.
> >
> > What I did: gave the RK3568 NPU power domain a regulator (a DOMAIN_M_R
> > variant with
> > need_regulator = true), wired domain-supply = <&vdd_npu>, and dropped the
> > regulator-always-on workaround.
> >
> > Boot is now clean and the NPU probes, but there is a warning during boot:
> >
> >   rockchip-pm-domain ...: Failed to create device link (0x180) with supplier
> >   0-0020 for .../power-domain@6
> >
> > (0-0020 is the rk809 PMIC that supplies vdd_npu.) Then on the first NPU job
> > submit the board hard-hangs with an RCU stall:
> >
> >   rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >   rcu:     3-...!: (1 GPs behind) ...
> >   rcu: rcu_preempt kthread starved for 5115 jiffies! ... RCU_GP_WAIT_FQS(5)
> >   rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected
> >
> > My reading: vdd_npu is on the rk809 *I2C* PMIC, so when genpd
> > enables/disables the
> > regulator during the NPU's runtime-PM power transition, the I2C
> > transfer runs in a
> > context that starves RCU and the box freezes. (I suspect
> > need_regulator is fine on
> > the RK3588 NPU because its supply isn't behind an I2C PMIC.) The always-on
> > workaround avoids this precisely because genpd never touches the I2C
> > regulator in
> > that path.
> >
>
> No, they are all controlled by RK809.
>
> And This looks werid. Is your rocket driver compiled as a module?
> Please try compiling it as a module. When is the above error printed?
> Please provide the complete boot log.
>
> > So: for an NPU domain whose supply is an I2C PMIC, is there a
> > supported way to let
> > genpd own the regulator without performing the I2C op in the
> > power-transition path
> > (a deferred/async regulator enable, or a flag), or should RK3568 keep vdd_npu as
> > regulator-always-on? For v4 I'll keep always-on unless there's a cleaner path.
> >
>
> --
> Best,
> Chaoyi

[-- Attachment #2: 2026-06-09_rocket-m-needreg.log --]
[-- Type: text/x-log, Size: 70113 bytes --]

sudo reboot
Password: 

Login incorrect
rock-3b login: radxa
Password: 
Linux rock-3b 6.19.0-rc5-00003-gb16c04e5e619-dirty #65 SMP PREEMPT Thu May 21 11:16:54 CEST 2026 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Jun  8 14:14:44 CEST 2026 from 10.160.1.15 on pts/1
^[[?2004hradxa@rock-3b:~$ [ 1800.807070] hdmi-audio-codec hdmi-audio-codec.11.auto: HDMI: Unknown ELD version 0
[ 1800.811885] hdmi-audio-codec hdmi-audio-codec.11.auto: HDMI: Unknown ELD version 0
[ 1800.813239] hdmi-audio-codec hdmi-audio-codec.11.auto: HDMI: Unknown ELD version 0
[ 1800.814408] hdmi-audio-codec hdmi-audio-codec.11.auto: HDMI: Unknown ELD version 0
[ 1800.815613] hdmi-audio-codec hdmi-audio-codec.11.auto: HDMI: Unknown ELD version 0
[ 1800.817175] hdmi-audio-codec hdmi-audio-codec.11.auto: HDMI: Unknown ELD version 0
[ 1800.821487] hdmi-audio-codec hdmi-audio-codec.11.auto: HDMI: Unknown ELD version 0
d\b^[[Ksudo reboot
^[[?2004l\r         Stopping ^[[0;1;39mSession 42 of user radxa^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39msystem-modprobe.slice^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39msystem-ssh.slice^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mGraphical Interface^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mMulti-User System^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mLogin Prompts^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mRemote Encrypted Volumes^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSound Card^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mTimers^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mDaily apt upgrade and clean activities^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mDaily apt download activities^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mPeriodic ext4 Onli…ata Check for All Filesystems^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mDiscard unused blocks once a week^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mDaily man-db regeneration^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mDaily Cleanup of Temporary Directories^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSystem Time Synchronized^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSystem Time Set^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mHardware activated USB gadget^[[0m.
[^[[0;32m  OK  ^[[0m] Closed ^[[0;1;39mLoad/Save RF Kill Switch Status /dev/rfkill Watch^[[0m.
         Unmounting ^[[0;1;39m/config^[[0m...
         Stopping ^[[0;1;39mSave/Restore Sound Card State^[[0m...
         Stopping ^[[0;1;39mAvahi mDNS/DNS-SD Stack^[[0m...
         Stopping ^[[0;1;39mGetty on tty1^[[0m...
         Stopping ^[[0;1;39mNetdata, X-Ray Vi…on for your infrastructure!^[[0m...
         Stopping ^[[0;1;39mEnable adbd on supported Radxa products^[[0m...
         Stopping ^[[0;1;39mEnable USB Ethern…on supported Radxa products^[[0m...
         Stopping ^[[0;1;39mSerial Getty on ttyS2^[[0m...
         Stopping ^[[0;1;39mLSB: Set sysfs variables from /etc/sysfs.conf^[[0m...
         Stopping ^[[0;1;39mJournal Service for Namespace netdata^[[0m...
         Stopping ^[[0;1;39mLoad/Save Random Seed^[[0m...
         Stopping ^[[0;1;39mTailscale node agent^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mAvahi mDNS/DNS-SD Stack^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mGetty on tty1^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mSerial Getty on ttyS2^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mJournal Service for Namespace netdata^[[0m.
[^[[0;32m  OK  ^[[0m] Unmounted ^[[0;1;39m/config^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mSave/Restore Sound Card State^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mLoad/Save Random Seed^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mSession 42 of user radxa^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mLSB: Set sysfs variables from /etc/sysfs.conf^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39msystem-getty.slice^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39msystem-serial\x2dgetty.slice^[[0m.
         Stopping ^[[0;1;39mUser Login Management^[[0m...
         Stopping ^[[0;1;39mUser Manager for UID 1000^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mUser Manager for UID 1000^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mUser Login Management^[[0m.
         Stopping ^[[0;1;39mUser Runtime Directory /run/user/1000^[[0m...
[^[[0;32m  OK  ^[[0m] Unmounted ^[[0;1;39m/run/user/1000^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mUser Runtime Directory /run/user/1000^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39mUser Slice of UID 1000^[[0m.
         Stopping ^[[0;1;39mPermit User Sessions^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mPermit User Sessions^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mRemote File Systems^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mTailscale node agent^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mNetdata, X-Ray Vision for your infrastructure!^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mNetwork is Online^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mHost and Network Name Lookups^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mNetwork Manager Wait Online^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mWait for Network to be Configured^[[0m.
[ 1809.095948] dwc3 fcc00000.usb: request 00000000d5d2d033 was not queued to ep0out
[ 1809.143743] unloading
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mEnable USB Ethernet on supported Radxa products^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mEnable adbd on supported Radxa products^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mNetwork^[[0m.
         Stopping ^[[0;1;39mNetwork Manager^[[0m...
         Stopping ^[[0;1;39mRaise network interfaces^[[0m...
         Stopping ^[[0;1;39mNetwork Name Resolution^[[0m...
         Stopping ^[[0;1;39mWPA supplicant^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mRaise network interfaces^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mNetwork Name Resolution^[[0m.
         Stopping ^[[0;1;39mNetwork Service^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mNetwork Service^[[0m.
[ 1809.276260] wlp1s0: deauthenticating from 38:ff:36:bb:04:ac by local choice (Reason: 3=DEAUTH_LEAVING)
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mWPA supplicant^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mNetwork Manager^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mNetwork (Pre)^[[0m.
         Stopping ^[[0;1;39mD-Bus System Message Bus^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mD-Bus System Message Bus^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mBasic System^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mPaths^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSlices^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39mUser and Session Slice^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSockets^[[0m.
[^[[0;32m  OK  ^[[0m] Closed ^[[0;1;39mAvahi mDNS/DNS-SD Stack Activation Socket^[[0m.
[^[[0;32m  OK  ^[[0m] Closed ^[[0;1;39mD-Bus System Message Bus Socket^[[0m.
[^[[0;32m  OK  ^[[0m] Closed ^[[0;1;39mOpenBSD Secure Shell server socket^[[0m.
[^[[0;32m  OK  ^[[0m] Closed ^[[0;1;39mJournal Varlink Socket for Namespace netdata^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39msystem-syste…\x2djournald\x2dvarlink.slice^[[0m.
[^[[0;32m  OK  ^[[0m] Closed ^[[0;1;39mJournal Socket for Namespace netdata^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39msystem-systemd\x2djournald.slice^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSystem Initialization^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mLocal Encrypted Volumes^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mDispatch Password …ts to Console Directory Watch^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mForward Password R…uests to Wall Directory Watch^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSwap^[[0m.
         Deactivating swap ^[[0;1;39m/mnt/nvme/swapfile^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mApply Kernel Variables^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mLoad Kernel Modules^[[0m.
         Stopping ^[[0;1;39mNetwork Time Synchronization^[[0m...
         Stopping ^[[0;1;39mUpdate UTMP about System Boot/Shutdown^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mNetwork Time Synchronization^[[0m.
[^[[0;32m  OK  ^[[0m] Deactivated swap ^[[0;1;39m/mnt/nvme/swapfile^[[0m.
         Unmounting ^[[0;1;39m/mnt/nvme^[[0m...
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mUpdate UTMP about System Boot/Shutdown^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mCreate Volatile Files and Directories^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mLocal File Systems^[[0m.
[^[[0;32m  OK  ^[[0m] Unset automount ^[[0;1;39mboot-efi.automount^[[0m.
[^[[0;32m  OK  ^[[0m] Unset automount ^[[0;1;39mconfig.automount^[[0m.
[ 1811.430689] EXT4-fs (nvme0n1): unmounting filesystem 0d9000fd-1edf-455d-9058-56e0855a1edb.
[^[[0;32m  OK  ^[[0m] Unmounted ^[[0;1;39m/mnt/nvme^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mUnmount All Filesystems^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mFile System Check …d-1edf-455d-9058-56e0855a1edb^[[0m.
[^[[0;32m  OK  ^[[0m] Removed slice ^[[0;1;39msystem-systemd\x2dfsck.slice^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mLocal File Systems (Pre)^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mCreate Static Device Nodes in /dev^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mCreate System Users^[[0m.
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mRemount Root and Kernel File Systems^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mShutdown^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mFinal Step^[[0m.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mReboot^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mReboot^[[0m.
[ 1811.697989] watchdog: watchdog0: watchdog did not stop!
[ 1811.815097] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 1811.829482] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[ 1811.853568] systemd-journald[204]: Received SIGTERM from PID 1 (systemd-shutdow).
[ 1811.899622] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 1811.913025] systemd-shutdown[1]: Using hardware watchdog 'Synopsys DesignWare Watchdog', version 0, device /dev/watchdog
[ 1811.917532] systemd-shutdown[1]: Unmounting file systems.
[ 1811.920424] [11189]: Remounting '/' read-only in with options '(null)'.
[ 1811.983886] EXT4-fs (mmcblk0p3): re-mounted d210e617-6a4c-4771-b955-ddd835a32d2b ro.
[ 1812.006902] systemd-shutdown[1]: All filesystems unmounted.
[ 1812.007480] systemd-shutdown[1]: Deactivating swaps.
[ 1812.008275] systemd-shutdown[1]: All swaps deactivated.
[ 1812.008802] systemd-shutdown[1]: Detaching loop devices.
[ 1812.015813] systemd-shutdown[1]: All loop devices detached.
[ 1812.016332] systemd-shutdown[1]: Stopping MD devices.
[ 1812.017106] systemd-shutdown[1]: All MD devices stopped.
[ 1812.017591] systemd-shutdown[1]: Detaching DM devices.
[ 1812.018320] systemd-shutdown[1]: All DM devices detached.
[ 1812.018817] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[ 1812.027209] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 1812.031042] systemd-shutdown[1]: Rebooting.
[ 1812.186088] kvm: exiting hardware virtualization
[ 1812.186550] reboot: Restarting system
\0\0DDR 03ea844c5d typ 24/09/03-10:42:57,fwver: v1.23
In
wdqs_if: 0x1010100
LP4/4x derate en, other dram:1x trefi
ddrconfig:7
MID:0xff
LPDDR4X, 324MHz
BW=32 Col=10 Bk=8 CS0 Row=17 CS1 Row=17 CS=2 Die BW=16 Size=8192MB
tdqss_lf: cs0 dqs0: -24ps, dqs1: -96ps, dqs2: -48ps, dqs3: -168ps, 
tdqss_lf: cs1 dqs0: 24ps, dqs1: -72ps, dqs2: -48ps, dqs3: -144ps, 
tdqss_hf: cs0 dqs0: -24ps, dqs1: -96ps, dqs2: -48ps, dqs3: -168ps, 
tdqss_hf: cs1 dqs0: 24ps, dqs1: -72ps, dqs2: -48ps, dqs3: -144ps, 

change to: 324MHz
PHY drv:clk:36,ca:36,DQ:29,odt:240
vrefinner:16%, vrefout:41%
dram drv:40,odt:0
clk skew:0x62

change to: 528MHz
PHY drv:clk:36,ca:36,DQ:29,odt:240
vrefinner:16%, vrefout:41%
dram drv:40,odt:0
clk skew:0x58

change to: 780MHz
PHY drv:clk:36,ca:36,DQ:29,odt:60
vrefinner:16%, vrefout:41%
dram drv:40,odt:0
clk skew:0x58
rx vref: 14.6%
tx vref: 34.0%

change to: 1560MHz(final freq)
PHY drv:clk:36,ca:36,DQ:29,odt:60
vrefinner:16%, vrefout:22%
dram drv:40,odt:80
vref_ca:00000071
clk skew:0x26
rx vref: 15.6%
tx vref: 22.8%
cs 0:
rdtrn RS:
DQS0:0x30, DQS1:0x31, DQS2:0x33, DQS3:0x2c, 
min  : 0xd  0xe 0x10  0xe  0x1  0x2  0x8  0x5 , 0xa  0x7  0x2  0x3  0xc  0xb  0xd  0xa ,
      0x11  0xf  0xa  0xa  0x2  0x2  0x2  0x6 , 0xc  0x7  0x6  0x3 0x10 0x11  0xd 0x11 ,
mid  :0x26 0x26 0x29 0x27 0x1b 0x1c 0x22 0x1f ,0x24 0x21 0x1b 0x1c 0x26 0x25 0x26 0x25 ,
      0x2c 0x2a 0x24 0x24 0x1c 0x1b 0x1b 0x20 ,0x24 0x20 0x1e 0x1a 0x27 0x29 0x25 0x29 ,
max  :0x3f 0x3f 0x43 0x40 0x35 0x36 0x3c 0x39 ,0x3e 0x3c 0x35 0x36 0x41 0x3f 0x3f 0x40 ,
      0x47 0x45 0x3e 0x3e 0x36 0x35 0x34 0x3b ,0x3d 0x39 0x37 0x32 0x3f 0x41 0x3e 0x42 ,
range:0x32 0x31 0x33 0x32 0x34 0x34 0x34 0x34 ,0x34 0x35 0x33 0x33 0x35 0x34 0x32 0x36 ,
      0x36 0x36 0x34 0x34 0x34 0x33 0x32 0x35 ,0x31 0x32 0x31 0x2f 0x2f 0x30 0x31 0x31 ,
wrtrn RS:
DQS0:0x22, DQS1:0x13, DQS2:0x1d, DQS3:0x5, 
min  :0x76 0x79 0x7c 0x78 0x6c 0x6f 0x73 0x72 0x72 ,0x63 0x5f 0x59 0x59 0x65 0x63 0x64 0x62 0x5e ,
      0x6a 0x6a 0x66 0x64 0x5d 0x5c 0x5c 0x60 0x63 ,0x55 0x51 0x51 0x4c 0x58 0x5a 0x55 0x5b 0x53 ,
mid  :0x91 0x93 0x95 0x92 0x84 0x88 0x8c 0x8b 0x8a ,0x7c 0x79 0x73 0x73 0x7d 0x7b 0x7c 0x7b 0x77 ,
      0x85 0x84 0x7f 0x7e 0x76 0x75 0x75 0x7a 0x7b ,0x70 0x6b 0x6b 0x66 0x73 0x73 0x6f 0x75 0x6c ,
max  :0xac 0xae 0xaf 0xad 0x9c 0xa1 0xa5 0xa5 0xa2 ,0x96 0x94 0x8e 0x8d 0x96 0x94 0x95 0x94 0x90 ,
      0xa1 0x9f 0x98 0x98 0x90 0x8e 0x8f 0x94 0x94 ,0x8b 0x85 0x86 0x80 0x8f 0x8d 0x89 0x8f 0x86 ,
range:0x36 0x35 0x33 0x35 0x30 0x32 0x32 0x33 0x30 ,0x33 0x35 0x35 0x34 0x31 0x31 0x31 0x32 0x32 ,
      0x37 0x35 0x32 0x34 0x33 0x32 0x33 0x34 0x31 ,0x36 0x34 0x35 0x34 0x37 0x33 0x34 0x34 0x33 ,
cs 1:
rdtrn RS:
DQS0:0x30, DQS1:0x31, DQS2:0x33, DQS3:0x2c, 
min  : 0xd  0xe 0x10  0xe  0x1  0x2  0x8  0x5 , 0xa  0x7  0x2  0x3  0xc  0xb  0xd  0xa ,
      0x11  0xf  0xa  0xa  0x2  0x2  0x2  0x6 , 0xc  0x7  0x6  0x3 0x10 0x11  0xd 0x11 ,
mid  :0x26 0x26 0x29 0x27 0x1b 0x1c 0x22 0x1f ,0x24 0x21 0x1b 0x1c 0x26 0x25 0x26 0x25 ,
      0x2c 0x2a 0x24 0x24 0x1c 0x1b 0x1b 0x20 ,0x24 0x20 0x1e 0x1a 0x27 0x29 0x25 0x29 ,
max  :0x3f 0x3f 0x43 0x40 0x35 0x36 0x3c 0x39 ,0x3e 0x3c 0x35 0x36 0x41 0x3f 0x3f 0x40 ,
      0x47 0x45 0x3e 0x3e 0x36 0x35 0x34 0x3b ,0x3d 0x39 0x37 0x32 0x3f 0x41 0x3e 0x42 ,
range:0x32 0x31 0x33 0x32 0x34 0x34 0x34 0x34 ,0x34 0x35 0x33 0x33 0x35 0x34 0x32 0x36 ,
      0x36 0x36 0x34 0x34 0x34 0x33 0x32 0x35 ,0x31 0x32 0x31 0x2f 0x2f 0x30 0x31 0x31 ,
wrtrn RS:
DQS0:0x22, DQS1:0x13, DQS2:0x1d, DQS3:0x5, 
min  :0x76 0x79 0x7c 0x78 0x6c 0x6f 0x73 0x72 0x72 ,0x63 0x5f 0x59 0x59 0x65 0x63 0x64 0x62 0x5e ,
      0x6a 0x6a 0x66 0x64 0x5d 0x5c 0x5c 0x60 0x63 ,0x55 0x51 0x51 0x4c 0x58 0x5a 0x55 0x5b 0x53 ,
mid  :0x91 0x93 0x95 0x92 0x84 0x88 0x8c 0x8b 0x8a ,0x7c 0x79 0x73 0x73 0x7d 0x7b 0x7c 0x7b 0x77 ,
      0x85 0x84 0x7f 0x7e 0x76 0x75 0x75 0x7a 0x7b ,0x70 0x6b 0x6b 0x66 0x73 0x73 0x6f 0x75 0x6c ,
max  :0xac 0xae 0xaf 0xad 0x9c 0xa1 0xa5 0xa5 0xa2 ,0x96 0x94 0x8e 0x8d 0x96 0x94 0x95 0x94 0x90 ,
      0xa1 0x9f 0x98 0x98 0x90 0x8e 0x8f 0x94 0x94 ,0x8b 0x85 0x86 0x80 0x8f 0x8d 0x89 0x8f 0x86 ,
range:0x36 0x35 0x33 0x35 0x30 0x32 0x32 0x33 0x30 ,0x33 0x35 0x35 0x34 0x31 0x31 0x31 0x32 0x32 ,
      0x37 0x35 0x32 0x34 0x33 0x32 0x33 0x34 0x31 ,0x36 0x34 0x35 0x34 0x37 0x33 0x34 0x34 0x33 ,
CBT RS:
cs:0 min  :0x43 0x39 0x38 0x2d 0x36 0x27 0x3c ,0x44 0x36 0x37 0x2a 0x35 0x2a 0x3d ,
cs:0 mid  :0x7d 0x7d 0x71 0x71 0x70 0x6c 0x69 ,0x7d 0x7a 0x70 0x6d 0x6e 0x6e 0x69 ,
cs:0 max  :0xb7 0xc1 0xab 0xb5 0xab 0xb1 0x97 ,0xb7 0xbf 0xa9 0xb0 0xa7 0xb2 0x96 ,
cs:0 range:0x74 0x88 0x73 0x88 0x75 0x8a 0x5b ,0x73 0x89 0x72 0x86 0x72 0x88 0x59 ,
cs:1 min  :0x42 0x3e 0x39 0x33 0x38 0x2f 0x40 ,0x43 0x3c 0x35 0x2f 0x37 0x30 0x3f ,
cs:1 mid  :0x7f 0x7f 0x75 0x72 0x74 0x6e 0x6e ,0x7f 0x7c 0x72 0x6e 0x73 0x6f 0x6e ,
cs:1 max  :0xbd 0xc0 0xb2 0xb2 0xb0 0xad 0x9c ,0xbc 0xbc 0xb0 0xae 0xaf 0xae 0x9d ,
cs:1 range:0x7b 0x82 0x79 0x7f 0x78 0x7e 0x5c ,0x79 0x80 0x7b 0x7f 0x78 0x7e 0x5e ,
out

<debug_uart>
dmc
pinctrl
serial@fe660000

U-Boot SPL 2025.07-rc4-dirty (Mar 25 2026 - 16:29:49 +0100)
mmc@fe310000
clock-controller@fdd20000
clock-controller@fdd00000
mmc@fe2b0000
Trying to boot from MMC1
## Checking hash(es) for config config-1 ... OK
## Checking hash(es) for Image atf-1 ... sha256+ OK
## Checking hash(es) for Image u-boot ... sha256+ OK
## Checking hash(es) for Image fdt-1 ... sha256+ OK
## Checking hash(es) for Image atf-2 ... sha256+ OK
## Checking hash(es) for Image atf-3 ... sha256+ OK
NOTICE:  BL31: v2.14.0(release):8dae086
NOTICE:  BL31: Built : 14:26:06, Mar 25 2026
NOTICE:  BL31: Rockchip release version: v1.0

<debug_uart>
A
B
C
D
E
F
G
H
pinctrl
serial@fe660000


U-Boot 2025.07-rc4-dirty (Mar 25 2026 - 16:29:49 +0100)

clock-controller@fdd20000
clock-controller@fdd00000
Model: Radxa ROCK 3B
nvmem@fe38c000
SoC:   RK3568
I
DRAM:  dmc
J
8 GiB (total 7.7 GiB)
io-domains
clock-controller@fdd00000
pinctrl
i2c@fdd40000
clock-controller@fdd20000
pmic@20
PMIC:  RK809 (on=0x02, off=0x00)
LDO_REG6
LDO_REG4
LDO_REG5
DCDC_REG5
SWITCH_REG1
DCDC_REG1
DCDC_REG2
DCDC_REG3
LDO_REG2
LDO_REG3
LDO_REG7
LDO_REG8
SWITCH_REG2
led-0
gpio@fdd60000
serial@fe660000
Core:  340 devices, 33 uclasses, devicetree: separate
MMC:   mmc@fe310000
mmc@fe2b0000
mmc@fe2b0000: 1, mmc@fe310000: 0
Loading Environment from nowhere... OK
In:    serial@fe660000
Out:   serial@fe660000
Err:   serial@fe660000
Model: Radxa ROCK 3B
nvmem@fe38c000
SoC:   RK3568
saradc@fe720000
reset
Net:   ethernet@fe010000
gpio@fe760000
ethernet@fe2a0000
eth1: ethernet@fe010000, eth0: ethernet@fe2a0000
Hit any key to stop autoboot:  2 \b\b\b 1 \b\b\b 0 
bootstd
Scanning for bootflows in all bootdevs
Seq  Method       State   Uclass    Part  Name                      Filename
---  -----------  ------  --------  ----  ------------------------  ----------------
vbe_simple
vbe_simple
Scanning global bootmeth 'efi_mgr':
^[7^[[r^[[999;999H^[[6n^[8mmc@fe2b0000.blk
Card did not respond to voltage select! : -110
mmc@fe310000.blk
rng@fe388000
psci
  0  efi_mgr      ready   (none)       0  <NULL>                    
** Booting bootflow '<NULL>' with efi_mgr
Loading Boot0000 'mmc 0' failed
EFI boot manager: Cannot load any image
Boot failed (err=-14)
Scanning bootdev 'mmc@fe2b0000.bootdev':
mmc@fe2b0000.blk
Card did not respond to voltage select! : -110
Scanning bootdev 'mmc@fe310000.bootdev':
  1  extlinux     ready   mmc          3  mmc@fe310000.bootdev.part /boot/extlinux/extlinux.conf
** Booting bootflow 'mmc@fe310000.bootdev.part_3' with extlinux
U-Boot menu
1:	Mainline 6.19 NPU IOMMU (default)
2:	Mainline 6.19 NPU non-IOMMU (fallback)
3:	Mainline 6.19 Rocket accel driver (test)
4:	Mainline 7.1-rc6 Rocket NPU (test)
5:	Mainline 7.1-rc6 Rocket NPU + Chaoyi AUTO_GATING BIT(31) (test)
6:	Mainline 7.1-rc6 NPU need_regulator + rocket=m (test)
Enter choice: 6
6:	Mainline 7.1-rc6 NPU need_regulator + rocket=m (test)
Retrieving file: /boot/Image-7.1-needreg-m
Retrieving file: /boot/initrd.img-7.1.0-rc6-00007-g043be7a551c4
append: root=UUID=d210e617-6a4c-4771-b955-ddd835a32d2b rw rootwait earlycon console=ttyFIQ0,1500000n8 console=ttyS2,1500000n8 clk_ignore_unused cma=128M kernel.panic=5 fw_devlink=permissive
Retrieving file: /boot/rk3568-rock-3b-7.1-needreg-m.dtb
## Flattened Device Tree blob at 12000000
   Booting using the fdt blob at 0x12000000
Working FDT set to 12000000
   Loading Ramdisk to eb2c0000, end ecead483 ... OK
   Loading Device Tree to 00000000eb2ac000, end 00000000eb2bf1e3 ... OK
Working FDT set to eb2ac000

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x412fd050]
[    0.000000] Linux version 7.1.0-rc6-chaoyi-00011-ga31e2e6fae27 (radxa@rock-3b) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #5 SMP PREEMPT Tue Jun  9 12:40:22 CEST 2026
[    0.000000] KASLR enabled
[    0.000000] Machine model: Radxa ROCK 3B
[    0.000000] efi: UEFI not found.
[    0.000000] earlycon: uart0 at MMIO32 0x00000000fe660000 (options '1500000n8')
[    0.000000] printk: legacy bootconsole [uart0] enabled
[    0.000000] OF: reserved mem: 0x000000000010f000..0x000000000010f0ff (0 KiB) nomap non-reusable shmem@10f000
[    0.000000] NUMA: Faking a node at [mem 0x0000000000200000-0x00000001ffffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x1ff01df80-0x1ff02067f]
[    0.000000] cma: Reserved 128 MiB at 0x00000000e3200000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.5
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x00000001ffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000200000-0x00000000efffffff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x00000001ffffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000001ffffffff]
[    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
[    0.000000] percpu: Embedded 26 pages/cpu s67288 r8192 d31016 u106496
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: detected: GICv3 CPU interface
[    0.000000] CPU features: detected: Virtualization Host Extensions
[    0.000000] CPU features: kernel page table isolation forced ON by KASLR
[    0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[    0.000000] CPU features: detected: ARM errata 1165522, 1319367, or 1530923
[    0.000000] alternatives: applying boot alternatives
[    0.000000] Kernel command line: root=UUID=d210e617-6a4c-4771-b955-ddd835a32d2b rw rootwait earlycon console=ttyFIQ0,1500000n8 console=ttyS2,1500000n8 clk_ignore_unused cma=128M kernel.panic=5 fw_devlink=permissive
[    0.000000] printk: log buffer data + meta data: 131072 + 458752 = 589824 bytes
[    0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[    0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.000000] software IO TLB: area num 4.
[    0.000000] software IO TLB: mapped [mem 0x00000000df200000-0x00000000e3200000] (64MB)
[    0.000000] Fallback order for Node 0: 0 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 2031104
[    0.000000] Policy zone: Normal
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu: 	RCU event tracing is enabled.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=4.
[    0.000000] 	Trampoline variant of Tasks RCU enabled.
[    0.000000] 	Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000] RCU Tasks: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=4.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GIC: enabling workaround for GICv3: non-coherent attribute
[    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[    0.000000] GICv3: 320 SPIs implemented
[    0.000000] GICv3: 0 Extended SPIs implemented
[    0.000000] GICv3: MBI range [296:319]
[    0.000000] GICv3: Using MBI frame 0x00000000fd410000
[    0.000000] Root IRQ handler: gic_handle_irq
[    0.000000] GICv3: GICv3 features: 16 PPIs
[    0.000000] GICv3: GICD_CTLR.DS=0, SCR_EL3.FIQ=0
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000fd460000
[    0.000000] ITS [mem 0xfd440000-0xfd45ffff]
[    0.000000] GIC: enabling workaround for ITS: Rockchip erratum RK3568002
[    0.000000] GIC: enabling workaround for ITS: non-coherent attribute
[    0.000000] ITS@0x00000000fd440000: allocated 8192 Devices @410000 (indirect, esz 8, psz 64K, shr 0)
[    0.000000] ITS@0x00000000fd440000: allocated 32768 Interrupt Collections @420000 (flat, esz 2, psz 64K, shr 0)
[    0.000000] ITS: using cache flushing for cmd queue
[    0.000000] GICv3: using LPI property table @0x0000000000430000
[    0.000000] GIC: using cache flushing for LPI property table
[    0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000000440000
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] arch_timer: cp15 timer running at 24.00MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x588fe9dc0, max_idle_ns: 440795202592 ns
[    0.000001] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 4398046511097ns
[    0.004345] Console: colour dummy device 80x25
[    0.004930] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=96000)
[    0.005946] pid_max: default: 32768 minimum: 301
[    0.006823] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.007612] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.008917] VFS: Finished mounting rootfs on nullfs
[    0.012680] rcu: Hierarchical SRCU implementation.
[    0.013171] rcu: 	Max phase no-delay instances is 1000.
[    0.014123] Timer migration: 1 hierarchy levels; 8 children per group; 1 crossnode level
[    0.015214] fsl-mc MSI: msi-controller@fd440000 domain created
[    0.020987] EFI services will not be available.
[    0.021934] smp: Bringing up secondary CPUs ...
[    0.023391] Detected VIPT I-cache on CPU1
[    0.023553] GICv3: CPU1: found redistributor 100 region 0:0x00000000fd480000
[    0.023577] GICv3: CPU1: using allocated LPI pending table @0x0000000000450000
[    0.023640] CPU1: Booted secondary processor 0x0000000100 [0x412fd050]
[    0.024763] Detected VIPT I-cache on CPU2
[    0.024907] GICv3: CPU2: found redistributor 200 region 0:0x00000000fd4a0000
[    0.024929] GICv3: CPU2: using allocated LPI pending table @0x0000000000460000
[    0.024978] CPU2: Booted secondary processor 0x0000000200 [0x412fd050]
[    0.026138] Detected VIPT I-cache on CPU3
[    0.026279] GICv3: CPU3: found redistributor 300 region 0:0x00000000fd4c0000
[    0.026303] GICv3: CPU3: using allocated LPI pending table @0x0000000000470000
[    0.026352] CPU3: Booted secondary processor 0x0000000300 [0x412fd050]
[    0.026541] smp: Brought up 1 node, 4 CPUs
[    0.034241] SMP: Total of 4 processors activated.
[    0.034710] CPU: All CPU(s) started at EL2
[    0.035116] CPU features: detected: 32-bit EL0 Support
[    0.035620] CPU features: detected: 32-bit EL1 Support
[    0.036127] CPU features: detected: Data cache clean to the PoU not required for I/D coherence
[    0.036969] CPU features: detected: Common not Private translations
[    0.037582] CPU features: detected: CRC32 instructions
[    0.038125] CPU features: detected: RCpc load-acquire (LDAPR)
[    0.038694] CPU features: detected: LSE atomic instructions
[    0.039244] CPU features: detected: Privileged Access Never
[    0.039791] CPU features: detected: PMUv3
[    0.040186] CPU features: detected: RAS Extension Support
[    0.040719] CPU features: detected: XNX
[    0.041102] CPU features: detected: Speculative Store Bypassing Safe (SSBS)
[    0.041855] alternatives: applying system-wide alternatives
[    0.047418] CPU features: detected: ICV_DIR_EL1 trapping
[    0.048557] Memory: 7683552K/8124416K available (20736K kernel code, 5018K rwdata, 14920K rodata, 12480K init, 724K bss, 305152K reserved, 131072K cma-reserved)
[    0.053188] devtmpfs: initialized
[    0.076194] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.077193] posixtimers hash table entries: 2048 (order: 3, 32768 bytes, linear)
[    0.077991] futex hash table entries: 1024 (65536 bytes on 1 NUMA nodes, total 64 KiB, linear).
[    0.080642] 2G module region forced by RANDOMIZE_MODULE_REGION_FULL
[    0.081302] 0 pages in range for non-PLT usage
[    0.081310] 510752 pages in range for PLT usage
[    0.087203] DMI: not present or invalid.
[    0.092012] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.094182] DMA: preallocated 1024 KiB GFP_KERNEL pool for atomic allocations
[    0.095295] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.096163] audit: initializing netlink subsys (disabled)
[    0.096954] audit: type=2000 audit(0.092:1): state=initialized audit_enabled=0 res=1
[    0.102019] thermal_sys: Registered thermal governor 'step_wise'
[    0.102040] thermal_sys: Registered thermal governor 'power_allocator'
[    0.102800] cpuidle: using governor menu
[    0.104380] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.105274] ASID allocator initialised with 32768 entries
[    0.111040] Serial: AMBA PL011 UART driver
[    0.140432] /vop@fe040000: Fixed dependency cycle(s) with /hdmi@fe0a0000
[    0.141206] /hdmi@fe0a0000: Fixed dependency cycle(s) with /vop@fe040000
[    0.157642] /pcie@fe260000: Fixed dependency cycle(s) with /pcie@fe260000/legacy-interrupt-controller
[    0.178928] rockchip-gpio fdd60000.gpio: probed /pinctrl/gpio@fdd60000
[    0.180729] rockchip-gpio fe740000.gpio: probed /pinctrl/gpio@fe740000
[    0.182380] rockchip-gpio fe750000.gpio: probed /pinctrl/gpio@fe750000
[    0.184237] rockchip-gpio fe760000.gpio: probed /pinctrl/gpio@fe760000
[    0.185880] rockchip-gpio fe770000.gpio: probed /pinctrl/gpio@fe770000
[    0.192768] /pcie@fe280000: Fixed dependency cycle(s) with /pcie@fe280000/legacy-interrupt-controller
[    0.196426] /hdmi@fe0a0000: Fixed dependency cycle(s) with /hdmi-con
[    0.197174] /hdmi-con: Fixed dependency cycle(s) with /hdmi@fe0a0000
[    0.207741] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[    0.208425] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
[    0.209043] HugeTLB: registered 32.0 MiB page size, pre-allocated 0 pages
[    0.209708] HugeTLB: 0 KiB vmemmap can be freed for a 32.0 MiB page
[    0.210324] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[    0.210986] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
[    0.211638] HugeTLB: registered 64.0 KiB page size, pre-allocated 0 pages
[    0.212304] HugeTLB: 0 KiB vmemmap can be freed for a 64.0 KiB page
[    0.216832] ACPI: Interpreter disabled.
[    0.226115] iommu: Default domain type: Translated
[    0.226612] iommu: DMA domain TLB invalidation policy: strict mode
[    0.229120] SCSI subsystem initialized
[    0.230219] usbcore: registered new interface driver usbfs
[    0.230824] usbcore: registered new interface driver hub
[    0.231419] usbcore: registered new device driver usb
[    0.234769] pps_core: LinuxPPS API ver. 1 registered
[    0.235266] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.236205] PTP clock support registered
[    0.236952] EDAC MC: Ver: 3.0.0
[    0.238218] scmi_core: SCMI protocol bus registered
[    0.241592] FPGA manager framework
[    0.244246] vgaarb: loaded
[    0.245545] clocksource: Switched to clocksource arch_sys_counter
[    0.246553] VFS: Disk quotas dquot_6.6.0
[    0.246977] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.248448] pnp: PnP ACPI: disabled
[    0.262673] NET: Registered PF_INET protocol family
[    0.263673] IP idents hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.271846] tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    0.272763] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.273581] TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.274889] TCP bind hash table entries: 65536 (order: 9, 2097152 bytes, linear)
[    0.277519] TCP: Hash tables configured (established 65536 bind 65536)
[    0.278369] UDP hash table entries: 4096 (order: 6, 262144 bytes, linear)
[    0.279544] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.280868] RPC: Registered named UNIX socket transport module.
[    0.281459] RPC: Registered udp transport module.
[    0.281967] RPC: Registered tcp transport module.
[    0.282433] RPC: Registered tcp-with-tls transport module.
[    0.282973] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.285066] PCI: CLS 0 bytes, default 64
[    0.285751] Unpacking initramfs...
[    0.294394] kvm [1]: nv: 570 coarse grained trap handlers
[    0.295328] kvm [1]: nv: 710 fine grained trap handlers
[    0.296530] kvm [1]: IPA Size Limit: 40 bits
[    0.296999] kvm [1]: GICv3: no GICV resource entry
[    0.297475] kvm [1]: disabling GICv2 emulation
[    0.297989] kvm [1]: GIC system register CPU interface enabled
[    0.298599] kvm [1]: vgic interrupt IRQ9
[    0.299039] kvm [1]: VHE mode initialized successfully
[    0.302206] Initialise system trusted keyrings
[    0.303030] workingset: timestamp_bits=42 (anon: 38) max_order=21 bucket_order=0 (anon: 0)
[    0.304530] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.305699] NFS: Registering the id_resolver key type
[    0.306250] Key type id_resolver registered
[    0.306668] Key type id_legacy registered
[    0.307105] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    0.307767] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering...
[    0.308845] 9p: Installing v9fs 9p2000 file system support
[    0.378356] Key type asymmetric registered
[    0.378791] Asymmetric key parser 'x509' registered
[    0.379447] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 244)
[    0.380179] io scheduler mq-deadline registered
[    0.380628] io scheduler kyber registered
[    0.381094] io scheduler bfq registered
[    2.017986] Freeing initrd memory: 28596K
[    2.025771] ledtrig-cpu: registered to indicate activity on CPUs
[    2.036309] phy phy-fe8c0000.phy.3: lane number 0, val 1
[    2.036934] rockchip-dw-pcie 3c0800000.pcie: host bridge /pcie@fe280000 ranges:
[    2.037737] rockchip-dw-pcie 3c0800000.pcie:       IO 0x00f0100000..0x00f01fffff -> 0x00f0100000
[    2.038623] rockchip-dw-pcie 3c0800000.pcie:      MEM 0x00f0200000..0x00f1ffffff -> 0x00f0200000
[    2.039516] rockchip-dw-pcie 3c0800000.pcie:      MEM 0x0380000000..0x03bfffffff -> 0x0380000000
[    2.048377] rockchip-dw-pcie 3c0800000.pcie: iATU: unroll T, 8 ob, 8 ib, align 64K, limit 8G
[    2.349576] rockchip-dw-pcie 3c0800000.pcie: PCIe Gen.3 x2 link up
[    2.350587] rockchip-dw-pcie 3c0800000.pcie: PCI host bridge to bus 0002:20
[    2.351289] pci_bus 0002:20: root bus resource [bus 20-2f]
[    2.351840] pci_bus 0002:20: root bus resource [io  0x0000-0xfffff] (bus address [0xf0100000-0xf01fffff])
[    2.352778] pci_bus 0002:20: root bus resource [mem 0xf0200000-0xf1ffffff]
[    2.353456] pci_bus 0002:20: root bus resource [mem 0x380000000-0x3bfffffff]
[    2.354237] pci 0002:20:00.0: [1d87:3566] type 01 class 0x060400 PCIe Root Port
[    2.354983] pci 0002:20:00.0: ROM [mem 0x00000000-0x0000ffff pref]
[    2.355597] pci 0002:20:00.0: PCI bridge to [bus 01-ff]
[    2.356120] pci 0002:20:00.0:   bridge window [io  0x0000-0x0fff]
[    2.356722] pci 0002:20:00.0:   bridge window [mem 0x00000000-0x000fffff]
[    2.357397] pci 0002:20:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[    2.358247] pci 0002:20:00.0: supports D1 D2
[    2.358677] pci 0002:20:00.0: PME# supported from D0 D1 D3hot
[    2.364773] pci 0002:20:00.0: Primary bus is hard wired to 0
[    2.365345] pci 0002:20:00.0: bridge configuration invalid ([bus 01-ff]), reconfiguring
[    2.366452] pci 0002:21:00.0: [10ec:5765] type 00 class 0x010802 PCIe Endpoint
[    2.367380] pci 0002:21:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
[    2.368045] pci 0002:21:00.0: BAR 5 [mem 0x00000000-0x00001fff]
[    2.369416] pci 0002:21:00.0: 15.752 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x2 link at 0002:20:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[    2.377603] pci 0002:21:00.0: ASPM: default states L1
[    2.378208] pci_bus 0002:21: busn_res: [bus 21-2f] end is updated to 21
[    2.378898] pci 0002:20:00.0: bridge window [mem 0xf0200000-0xf02fffff]: assigned
[    2.379643] pci 0002:20:00.0: ROM [mem 0xf0300000-0xf030ffff pref]: assigned
[    2.380348] pci 0002:21:00.0: BAR 0 [mem 0xf0200000-0xf0203fff 64bit]: assigned
[    2.381111] pci 0002:21:00.0: BAR 5 [mem 0xf0204000-0xf0205fff]: assigned
[    2.381824] pci 0002:20:00.0: PCI bridge to [bus 21]
[    2.382325] pci 0002:20:00.0:   bridge window [mem 0xf0200000-0xf02fffff]
[    2.383002] pci_bus 0002:20: resource 4 [io  0x0000-0xfffff]
[    2.383562] pci_bus 0002:20: resource 5 [mem 0xf0200000-0xf1ffffff]
[    2.384181] pci_bus 0002:20: resource 6 [mem 0x380000000-0x3bfffffff]
[    2.384817] pci_bus 0002:21: resource 1 [mem 0xf0200000-0xf02fffff]
[    2.390163] pcieport 0002:20:00.0: PME: Signaling with IRQ 31
[    2.391304] pcieport 0002:20:00.0: AER: enabled with IRQ 32
[    2.461045] dma-pl330 fe530000.dma-controller: Loaded driver for PL330 DMAC-241330
[    2.461857] dma-pl330 fe530000.dma-controller: 	DBUFF-128x8bytes Num_Chans-8 Num_Peri-32 Num_Events-16
[    2.465804] dma-pl330 fe550000.dma-controller: Loaded driver for PL330 DMAC-241330
[    2.466560] dma-pl330 fe550000.dma-controller: 	DBUFF-128x8bytes Num_Chans-8 Num_Peri-32 Num_Events-16
[    2.506402] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    2.512775] printk: legacy console [ttyS2] disabled
[    2.513873] fe660000.serial: ttyS2 at MMIO 0xfe660000 (irq = 37, base_baud = 1500000) is a 16550A
[    2.514882] printk: legacy console [ttyS2] enabled
[    2.514882] printk: legacy console [ttyS2] enabled
[    2.515784] printk: legacy bootconsole [uart0] disabled
[    2.515784] printk: legacy bootconsole [uart0] disabled
[    2.523685] msm_serial: driver initialized
[    2.525677] SuperH (H)SCI(F) driver initialized
[    2.526792] STM32 USART driver initialized
[    2.538263] random: crng init done
[    2.538643] platform fdea0000.video-codec: Adding to iommu group 0
[    2.541147] platform fdee0000.video-codec: Adding to iommu group 1
[    2.543525] platform fe040000.vop: Adding to iommu group 2
[    2.545348] Error: Driver 'efi-framebuffer' is already registered, aborting...
[    2.556173] loop: module loaded
[    2.560241] megasas: 07.734.00.00-rc1
[    2.563304] nvme nvme0: pci function 0002:21:00.0
[    2.563774] nvme 0002:21:00.0: enabling device (0000 -> 0002)
[    2.581994] tun: Universal TUN/TAP device driver, 1.6
[    2.585959] thunder_xcv, ver 1.0
[    2.586353] thunder_bgx, ver 1.0
[    2.586717] nicpf, ver 1.0
[    2.591410] e1000: Intel(R) PRO/1000 Network Driver
[    2.591864] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    2.592455] e1000e: Intel(R) PRO/1000 Network Driver
[    2.592904] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    2.593517] igb: Intel(R) Gigabit Ethernet Network Driver
[    2.594043] igb: Copyright (c) 2007-2014 Intel Corporation.
[    2.594615] igbvf: Intel(R) Gigabit Virtual Function Network Driver
[    2.595175] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[    2.596815] sky2: driver version 1.30
[    2.601784] rk_gmac-dwmac fe010000.ethernet: IRQ sfty not found
[    2.604039] rk_gmac-dwmac fe2a0000.ethernet: IRQ sfty not found
[    2.606627] usbcore: registered new device driver r8152-cfgselector
[    2.607267] usbcore: registered new interface driver r8152
[    2.607825] usbcore: registered new interface driver asix
[    2.608408] usbcore: registered new interface driver ax88179_178a
[    2.609582] VFIO - User Level meta-driver version: 0.3
[    2.624316] ehci-platform fd800000.usb: EHCI Host Controller
[    2.624342] ohci-platform fd840000.usb: Generic Platform OHCI controller
[    2.624869] ehci-platform fd800000.usb: new USB bus registered, assigned bus number 1
[    2.625452] ohci-platform fd840000.usb: new USB bus registered, assigned bus number 2
[    2.625481] usbcore: registered new interface driver usb-storage
[    2.626353] ehci-platform fd800000.usb: irq 46, io mem 0xfd800000
[    2.627002] ohci-platform fd840000.usb: irq 47, io mem 0xfd840000
[    2.636410] i2c_dev: i2c /dev entries driver
[    2.637620] ehci-platform fd800000.usb: USB 2.0 started, EHCI 1.00
[    2.639265] hub 1-0:1.0: USB hub found
[    2.639666] hub 1-0:1.0: 1 port detected
[    2.643281] fan53555-regulator 0-001c: FAN53555 Option[12] Rev[15] Detected!
[    2.686641] hub 2-0:1.0: USB hub found
[    2.687081] hub 2-0:1.0: 1 port detected
[    2.700618] vdd_npu: Bringing 500000uV into 825000-825000uV
[    2.701202] nvme nvme0: passthrough uses implicit buffer lengths
[    2.709720] vdda0v9_image: Bringing 600000uV into 900000-900000uV
[    2.714524] nvme nvme0: allocated 64 MiB host memory buffer (16 segments).
[    2.737292] vcca1v8_image: Bringing 600000uV into 1800000-1800000uV
[    2.749637] rockchip-pm-domain fdd90000.power-management:power-controller: Failed to create device link (0x180) with supplier 0-0020 for /power-management@fdd90000/power-controller/power-domain@6
[    2.750502] nvme nvme0: 4/0/0 default/read/poll queues
[    2.773746] nvme nvme0: Ignoring bogus Namespace Identifiers
[    2.788100] dwmmc_rockchip fe2b0000.mmc: IDMAC supports 32-bit address mode.
[    2.788831] dwmmc_rockchip fe2b0000.mmc: Using internal DMA controller.
[    2.789432] dwmmc_rockchip fe2b0000.mmc: Version ID is 270a
[    2.790037] dwmmc_rockchip fe2b0000.mmc: DW MMC controller at irq 80,32 bit host data width,256 deep fifo
[    2.802482] arm-scmi arm-scmi.6.auto: Using scmi_smc_transport
[    2.803032] arm-scmi arm-scmi.6.auto: SCMI max-rx-timeout: 30ms / max-msg-size: 104bytes / max-msg: 20
[    2.804109] scmi_protocol scmi_dev.1: Enabled polling mode TX channel - prot_id:16
[    2.805142] arm-scmi arm-scmi.6.auto: SCMI Notifications - Core Enabled.
[    2.805740] mmc_host mmc1: Bus speed = 375000Hz (req 400000Hz, actual 375000HZ div = 0)
[    2.806552] arm-scmi arm-scmi.6.auto: SCMI Protocol v2.0 'rockchip:' Firmware version 0x0
[    2.807391] arm-scmi arm-scmi.6.auto: Enabling SCMI Quirk [quirk_clock_rates_triplet_out_of_spec]
[    2.810214] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
[    2.815909] usbcore: registered new interface driver usbhid
[    2.816421] usbhid: USB HID core driver
[    2.825563] mmc0: SDHCI controller on fe310000.mmc [fe310000.mmc] using ADMA
[    2.829265] hw perfevents: enabled with armv8_cortex_a55 PMU driver, 7 (0,8000003f) counters available
[    2.849585] NET: Registered PF_INET6 protocol family
[    2.851218] Segment Routing with IPv6
[    2.851605] In-situ OAM (IOAM) with IPv6
[    2.852047] NET: Registered PF_PACKET protocol family
[    2.852619] 9pnet: Installing 9P2000 support
[    2.853103] Key type dns_resolver registered
[    2.878282] registered taskstats version 1
[    2.878959] Loading compiled-in X.509 certificates
[    2.885591] usb 1-1: new high-speed USB device number 2 using ehci-platform
[    2.889469] sdhci-dwcmshc fe310000.mmc: Can't reduce the clock below 52MHz in HS200/HS400 mode
[    2.890328] sdhci-dwcmshc fe310000.mmc: Can't reduce the clock below 52MHz in HS200/HS400 mode
[    2.891109] sdhci-dwcmshc fe310000.mmc: Can't reduce the clock below 52MHz in HS200/HS400 mode
[    2.892871] mmc0: new HS200 MMC card at address 0001
[    2.894399] mmcblk0: mmc0:0001 BJTD4R 29.1 GiB
[    2.894880] Demotion targets for Node 0: null
[    2.899582]  mmcblk0: p1 p2 p3
[    2.901085] mmcblk0boot0: mmc0:0001 BJTD4R 4.00 MiB
[    2.904012] mmcblk0boot1: mmc0:0001 BJTD4R 4.00 MiB
[    2.907113] mmcblk0rpmb: mmc0:0001 BJTD4R 4.00 MiB, chardev (509:0)
[    2.945955] platform fde40000.npu: Adding to iommu group 3
[    2.950953] rk_gmac-dwmac fe010000.ethernet: IRQ sfty not found
[    2.952440] rk_gmac-dwmac fe010000.ethernet: clock input or output? (input).
[    2.953079] rk_gmac-dwmac fe010000.ethernet: Can not read property: tx_delay.
[    2.953754] rk_gmac-dwmac fe010000.ethernet: set tx_delay to 0x30
[    2.954304] rk_gmac-dwmac fe010000.ethernet: Can not read property: rx_delay.
[    2.954936] rk_gmac-dwmac fe010000.ethernet: set rx_delay to 0x10
[    2.955485] rk_gmac-dwmac fe010000.ethernet: integrated PHY? (no).
[    2.956079] rk_gmac-dwmac fe010000.ethernet: clock input from PHY
[    2.961643] rk_gmac-dwmac fe010000.ethernet: init for RGMII_ID
[    2.962534] rk_gmac-dwmac fe010000.ethernet: User ID: 0x30, Synopsys ID: 0x51
[    2.963181] rk_gmac-dwmac fe010000.ethernet: 	DWMAC4/5
[    2.963645] rk_gmac-dwmac fe010000.ethernet: DMA HW capability register supported
[    2.964309] rk_gmac-dwmac fe010000.ethernet: Active PHY interface: RGMII (1)
[    2.964933] rk_gmac-dwmac fe010000.ethernet: RX Checksum Offload Engine supported
[    2.965638] rk_gmac-dwmac fe010000.ethernet: TX Checksum insertion supported
[    2.966267] rk_gmac-dwmac fe010000.ethernet: Wake-Up On Lan supported
[    2.966926] rk_gmac-dwmac fe010000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[    2.967641] rk_gmac-dwmac fe010000.ethernet: Enabled RFS Flow TC (entries=10)
[    2.968277] rk_gmac-dwmac fe010000.ethernet: TSO supported
[    2.968766] rk_gmac-dwmac fe010000.ethernet: TSO feature enabled
[    2.969301] rk_gmac-dwmac fe010000.ethernet: Using 32/32 bits DMA host/device width
[    3.035000] hub 1-1:1.0: USB hub found
[    3.035550] hub 1-1:1.0: 4 ports detected
[    3.073444] rk_gmac-dwmac fe2a0000.ethernet: IRQ sfty not found
[    3.075209] rk_gmac-dwmac fe2a0000.ethernet: clock input or output? (input).
[    3.075852] rk_gmac-dwmac fe2a0000.ethernet: Can not read property: tx_delay.
[    3.076485] rk_gmac-dwmac fe2a0000.ethernet: set tx_delay to 0x30
[    3.077027] rk_gmac-dwmac fe2a0000.ethernet: Can not read property: rx_delay.
[    3.077695] rk_gmac-dwmac fe2a0000.ethernet: set rx_delay to 0x10
[    3.078252] rk_gmac-dwmac fe2a0000.ethernet: integrated PHY? (no).
[    3.078851] rk_gmac-dwmac fe2a0000.ethernet: clock input from PHY
[    3.084415] rk_gmac-dwmac fe2a0000.ethernet: init for RGMII_ID
[    3.085289] rk_gmac-dwmac fe2a0000.ethernet: User ID: 0x30, Synopsys ID: 0x51
[    3.085968] rk_gmac-dwmac fe2a0000.ethernet: 	DWMAC4/5
[    3.086436] rk_gmac-dwmac fe2a0000.ethernet: DMA HW capability register supported
[    3.087099] rk_gmac-dwmac fe2a0000.ethernet: Active PHY interface: RGMII (1)
[    3.087724] rk_gmac-dwmac fe2a0000.ethernet: RX Checksum Offload Engine supported
[    3.088385] rk_gmac-dwmac fe2a0000.ethernet: TX Checksum insertion supported
[    3.089008] rk_gmac-dwmac fe2a0000.ethernet: Wake-Up On Lan supported
[    3.089683] rk_gmac-dwmac fe2a0000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[    3.090398] rk_gmac-dwmac fe2a0000.ethernet: Enabled RFS Flow TC (entries=10)
[    3.091034] rk_gmac-dwmac fe2a0000.ethernet: TSO supported
[    3.091521] rk_gmac-dwmac fe2a0000.ethernet: TSO feature enabled
[    3.092056] rk_gmac-dwmac fe2a0000.ethernet: Using 32/32 bits DMA host/device width
[    3.228427] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    3.244306] Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[    3.245444] Loaded X.509 cert 'wens: 61c038651aabdcf94bd0ac7ff06c7248db18c600'
[    3.246270] faux_driver regulatory: Direct firmware load for regulatory.db failed with error -2
[    3.246879] clk: Not disabling unused clocks
[    3.247042] cfg80211: failed to load regulatory.db
[    3.247425] PM: genpd: Disabling unused power domains
[    3.248577] dw-apb-uart fe660000.serial: forbid DMA for kernel console
[    3.254849] Freeing unused kernel memory: 12480K
[    3.255399] Run /init as init process
Loading, please wait...
Starting version 247.3-7+deb11u4
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
Begin: Will now check root file system ... fsck from util-linux 2.36.1
[/sbin/fsck.ext4 (1) -- /dev/mmcblk0p3] fsck.ext4 -a -C0 /dev/mmcblk0p3 
rootfs: clean, 297753/1855392 files, 4946612/7548923 blocks
done.
[    4.081211] EXT4-fs (mmcblk0p3): mounted filesystem d210e617-6a4c-4771-b955-ddd835a32d2b r/w with ordered data mode. Quota mode: none.
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[    4.488149] systemd[1]: System time before build time, advancing clock.
[    4.545777] systemd[1]: systemd 247.3-7+deb11u4 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[    4.549900] systemd[1]: Detected architecture arm64.

Welcome to ^[[1mDebian GNU/Linux 11 (bullseye)^[[0m!

[    4.563093] systemd[1]: Set hostname to <rock-3b>.
[    4.712335] block mmcblk0: the capability attribute has been deprecated.
[    5.090660] systemd[1]: Queued start job for default target Graphical Interface.
[    5.131927] systemd[1]: Created slice system-getty.slice.
[^[[0;32m  OK  ^[[0m] Created slice ^[[0;1;39msystem-getty.slice^[[0m.
[    5.134792] systemd[1]: Created slice system-modprobe.slice.
[^[[0;32m  OK  ^[[0m] Created slice ^[[0;1;39msystem-modprobe.slice^[[0m.
[    5.143177] systemd[1]: Created slice system-serial\x2dgetty.slice.
[^[[0;32m  OK  ^[[0m] Created slice ^[[0;1;39msystem-serial\x2dgetty.slice^[[0m.
[    5.151765] systemd[1]: Created slice system-systemd\x2dfsck.slice.
[^[[0;32m  OK  ^[[0m] Created slice ^[[0;1;39msystem-systemd\x2dfsck.slice^[[0m.
[    5.163903] systemd[1]: Created slice system-systemd\x2djournald.slice.
[^[[0;32m  OK  ^[[0m] Created slice ^[[0;1;39msystem-systemd\x2djournald.slice^[[0m.
[    5.175785] systemd[1]: Created slice system-systemd\x2djournald\x2dvarlink.slice.
[^[[0;32m  OK  ^[[0m] Created slice ^[[0;1;39msystem-syste…\x2djournald\x2dvarlink.slice^[[0m.
[    5.183362] systemd[1]: Created slice User and Session Slice.
[^[[0;32m  OK  ^[[0m] Created slice ^[[0;1;39mUser and Session Slice^[[0m.
[    5.190135] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mDispatch Password …ts to Console Directory Watch^[[0m.
[    5.202212] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mForward Password R…uests to Wall Directory Watch^[[0m.
[    5.210105] systemd[1]: Condition check resulted in Arbitrary Executable File Formats File System Automount Point being skipped.
[    5.211439] systemd[1]: Reached target Local Encrypted Volumes.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mLocal Encrypted Volumes^[[0m.
[    5.222078] systemd[1]: Reached target Network (Pre).
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mNetwork (Pre)^[[0m.
[    5.229939] systemd[1]: Reached target Paths.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mPaths^[[0m.
[    5.238057] systemd[1]: Reached target Remote Encrypted Volumes.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mRemote Encrypted Volumes^[[0m.
[    5.245910] systemd[1]: Reached target Remote File Systems.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mRemote File Systems^[[0m.
[    5.247592] systemd[1]: Reached target Slices.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mSlices^[[0m.
[    5.250927] systemd[1]: Listening on fsck to fsckd communication Socket.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mfsck to fsckd communication Socket^[[0m.
[    5.258649] systemd[1]: Listening on initctl Compatibility Named Pipe.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39minitctl Compatibility Named Pipe^[[0m.
[    5.267782] systemd[1]: Listening on Journal Audit Socket.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mJournal Audit Socket^[[0m.
[    5.270864] systemd[1]: systemd-journald-dev-log.socket: SO_PASSSEC failed: Operation not supported
[    5.271966] systemd[1]: Listening on Journal Socket (/dev/log).
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mJournal Socket (/dev/log)^[[0m.
[    5.274918] systemd[1]: systemd-journald.socket: SO_PASSSEC failed: Operation not supported
[    5.276098] systemd[1]: systemd-journald.socket: SO_PASSSEC failed: Operation not supported
[    5.277020] systemd[1]: Listening on Journal Socket.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mJournal Socket^[[0m.
[    5.280707] systemd[1]: Listening on Network Service Netlink Socket.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mNetwork Service Netlink Socket^[[0m.
[    5.291657] systemd[1]: Listening on udev Control Socket.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mudev Control Socket^[[0m.
[    5.294591] systemd[1]: Listening on udev Kernel Socket.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mudev Kernel Socket^[[0m.
[    5.321932] systemd[1]: Mounting Huge Pages File System...
         Mounting ^[[0;1;39mHuge Pages File System^[[0m...
[    5.334172] systemd[1]: Mounting POSIX Message Queue File System...
         Mounting ^[[0;1;39mPOSIX Message Queue File System^[[0m...
[    5.339943] systemd[1]: Mounting Kernel Debug File System...
         Mounting ^[[0;1;39mKernel Debug File System^[[0m...
[    5.346080] systemd[1]: Condition check resulted in Kernel Trace File System being skipped.
[    5.351151] systemd[1]: Starting Wait for network to be configured by ifupdown...
         Starting ^[[0;1;39mWait for network to be configured by ifupdown^[[0m...
[    5.357911] systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
[    5.363081] systemd[1]: Starting Load Kernel Module configfs...
         Starting ^[[0;1;39mLoad Kernel Module configfs^[[0m...
[    5.368299] systemd[1]: Starting Load Kernel Module drm...
         Starting ^[[0;1;39mLoad Kernel Module drm^[[0m...
[    5.376882] systemd[1]: Starting Load Kernel Module fuse...
         Starting ^[[0;1;39mLoad Kernel Module fuse^[[0m...
[    5.386967] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[    5.387937] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
[    5.398865] systemd[1]: Starting Journal Service...
         Starting ^[[0;1;39mJournal Service^[[0m...
[    5.407719] systemd[1]: Starting Load Kernel Modules...
         Starting ^[[0;1;39mLoad Kernel Modules^[[0m...
[    5.412414] systemd[1]: Starting Remount Root and Kernel File Systems...
         Starting ^[[0;1;39mRemount Root and Kernel File Systems^[[0m...
[    5.429850] systemd[1]: Starting Coldplug All udev Devices...
         Starting ^[[0;1;39mColdplug All udev Devices^[[0m...
[    5.447220] systemd[1]: Mounted Huge Pages File System.
[^[[0;32m  OK  ^[[0m] Mounted ^[[0;1;39mHuge Pages File System^[[0m.
[    5.449061] systemd[1]: Mounted POSIX Message Queue File System.
[^[[0;32m  OK  ^[[0m] Mounted ^[[0;1;39mPOSIX Message Queue File System^[[0m.
[    5.453688] EXT4-fs (mmcblk0p3): re-mounted d210e617-6a4c-4771-b955-ddd835a32d2b.
[    5.459654] systemd[1]: Mounted Kernel Debug File System.
[^[[0;32m  OK  ^[[0m] Mounted ^[[0;1;39mKernel Debug File System^[[0m.
[    5.467229] systemd[1]: Finished Wait for network to be configured by ifupdown.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mWait for network to be configured by ifupdown^[[0m.
[    5.474892] systemd[1]: modprobe@configfs.service: Succeeded.
[    5.476453] systemd[1]: Finished Load Kernel Module configfs.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mLoad Kernel Module configfs^[[0m.
[    5.486629] systemd[1]: modprobe@drm.service: Succeeded.
[    5.488053] systemd[1]: Finished Load Kernel Module drm.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mLoad Kernel Module drm^[[0m.
[    5.498681] systemd[1]: modprobe@fuse.service: Succeeded.
[    5.500190] systemd[1]: Finished Load Kernel Module fuse.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mLoad Kernel Module fuse^[[0m.
[    5.511197] systemd[1]: Finished Load Kernel Modules.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mLoad Kernel Modules^[[0m.
[    5.523376] systemd[1]: Finished Remount Root and Kernel File Systems.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mRemount Root and Kernel File Systems^[[0m.
[    5.534599] systemd[1]: Condition check resulted in FUSE Control File System being skipped.
[    5.550113] systemd[1]: Mounting Kernel Configuration File System...
         Mounting ^[[0;1;39mKernel Configuration File System^[[0m...
[    5.556704] systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
[    5.557696] systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
[    5.561425] systemd[1]: Starting Load/Save Random Seed...
         Starting ^[[0;1;39mLoad/Save Random Seed^[[0m...
[    5.572806] systemd[1]: Starting Apply Kernel Variables...
         Starting ^[[0;1;39mApply Kernel Variables^[[0m...
[    5.584810] systemd[1]: Starting Create System Users...
         Starting ^[[0;1;39mCreate System Users^[[0m...
[    5.596109] systemd[1]: Started Journal Service.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mJournal Service^[[0m.
[^[[0;32m  OK  ^[[0m] Mounted ^[[0;1;39mKernel Configuration File System^[[0m.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mLoad/Save Random Seed^[[0m.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mApply Kernel Variables^[[0m.
         Starting ^[[0;1;39mFlush Journal to Persistent Storage^[[0m...
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mCreate System Users^[[0m.
[    5.637155] systemd-journald[230]: Received client request to flush runtime journal.
[    5.640079] systemd-journald[230]: File /var/log/journal/f26be486655e4e559a1282889eb20124/system.journal corrupted or uncleanly shut down, renaming and replacing.
         Starting ^[[0;1;39mCreate Static Device Nodes in /dev^[[0m...
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mCreate Static Device Nodes in /dev^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mLocal File Systems (Pre)^[[0m.
[^[[0;32m  OK  ^[[0m] Set up automount ^[[0;1;39mboot-efi.automount^[[0m.
[^[[0;32m  OK  ^[[0m] Set up automount ^[[0;1;39mconfig.automount^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mLocal File Systems^[[0m.
         Starting ^[[0;1;39mRule-based Manage…for Device Events and Files^[[0m...
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mFlush Journal to Persistent Storage^[[0m.
         Starting ^[[0;1;39mCreate Volatile Files and Directories^[[0m...
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mRule-based Manager for Device Events and Files^[[0m.
         Starting ^[[0;1;39mNetwork Service^[[0m...
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mCreate Volatile Files and Directories^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mEntropy Daemon based on the HAVEGE algorithm^[[0m.
         Starting ^[[0;1;39mNetwork Time Synchronization^[[0m...
         Starting ^[[0;1;39mUpdate UTMP about System Boot/Shutdown^[[0m...
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mColdplug All udev Devices^[[0m.
         Starting ^[[0;1;39mHelper to synchronize boot up for ifupdown^[[0m...
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mHelper to synchronize boot up for ifupdown^[[0m.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mUpdate UTMP about System Boot/Shutdown^[[0m.
         Starting ^[[0;1;39mRaise network interfaces^[[0m...
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mNetwork Service^[[0m.
         Starting ^[[0;1;39mWait for Network to be Configured^[[0m...
         Starting ^[[0;1;39mNetwork Name Resolution^[[0m...
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mNetwork Time Synchronization^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mSystem Time Set^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mSystem Time Synchronized^[[0m.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mRaise network interfaces^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mNetwork Name Resolution^[[0m.
[    6.840374] rocket: loading out-of-tree module taints kernel.
[    6.877647] [drm] Initialized rocket 0.0.0 for rknn on minor 0
[    6.879950] rocket fde40000.npu: Rockchip NPU core 0 version: 0
[^[[0;32m  OK  ^[[0m] Found device ^[[0;1;39mEDILOCA EN605 512GB^[[0m.
         Starting ^[[0;1;39mFile System Check…1edf-455d-9058-56e0855a1edb^[[0m...
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mFile System Check Daemon to report status^[[0m.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mLoad/Save RF …itch Status /dev/rfkill Watch^[[0m.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mFile System Check…d-1edf-455d-9058-56e0855a1edb^[[0m.
[^[[0;32m  OK  ^[[0m] Found device ^[[0;1;39m/dev/ttyS2^[[0m.
         Mounting ^[[0;1;39m/mnt/nvme^[[0m...
[    7.209346] EXT4-fs (nvme0n1): mounted filesystem 0d9000fd-1edf-455d-9058-56e0855a1edb r/w with ordered data mode. Quota mode: none.
[^[[0;32m  OK  ^[[0m] Mounted ^[[0;1;39m/mnt/nvme^[[0m.
         Activating swap ^[[0;1;39m/mnt/nvme/swapfile^[[0m...
[    7.236341] Adding 10485756k swap on /mnt/nvme/swapfile.  Priority:-1 extents:10 across:19374076k SS
[^[[0;32m  OK  ^[[0m] Activated swap ^[[0;1;39m/mnt/nvme/swapfile^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mSwap^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mSystem Initialization^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mDaily apt download activities^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mDaily apt upgrade and clean activities^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mPeriodic ext4 Onli…ata Check for All Filesystems^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mDiscard unused blocks once a week^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mDaily man-db regeneration^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mDaily Cleanup of Temporary Directories^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mTimers^[[0m.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mAvahi mDNS/DNS-SD Stack Activation Socket^[[0m.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mD-Bus System Message Bus Socket^[[0m.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mOpenBSD Secure Shell server socket^[[0m.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mJournal Varli… Socket for Namespace netdata^[[0m.
[^[[0;32m  OK  ^[[0m] Listening on ^[[0;1;39mJournal Socket for Namespace netdata^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mSockets^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mBasic System^[[0m.
         Starting ^[[0;1;39mAvahi mDNS/DNS-SD Stack^[[0m...
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mD-Bus System Message Bus^[[0m.
         Starting ^[[0;1;39mNetwork Manager^[[0m...
         Starting ^[[0;1;39mRemove Stale Onli…t4 Metadata Check Snapshots^[[0m...
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mrsetup configuration service^[[0m.
         Starting ^[[0;1;39mLSB: Set sysfs variables from /etc/sysfs.conf^[[0m...
         Starting ^[[0;1;39mUser Login Management^[[0m...
         Starting ^[[0;1;39mWPA supplicant^[[0m...
         Starting ^[[0;1;39mLinux zramswap setup^[[0m...
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mAvahi mDNS/DNS-SD Stack^[[0m.
         Mounting ^[[0;1;39m/config^[[0m...
[    7.640448] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[^[[0;1;31mFAILED^[[0m] Failed to start ^[[0;1;39mLinux zramswap setup^[[0m.
See 'systemctl status zramswap.service' for details.
[^[[0;32m  OK  ^[[0m] Mounted ^[[0;1;39m/config^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mWPA supplicant^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mLSB: Set sysfs variables from /etc/sysfs.conf^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mUser Login Management^[[0m.
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mRemove Stale Onli…ext4 Metadata Check Snapshots^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mNetwork Manager^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mNetwork^[[0m.
         Starting ^[[0;1;39mNetwork Manager Wait Online^[[0m...
         Starting ^[[0;1;39mdnsmasq - A light…DHCP and caching DNS server^[[0m...
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mEnable adbd on supported Radxa products^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mEnable USB Ethernet on supported Radxa products^[[0m.
         Starting ^[[0;1;39mPermit User Sessions^[[0m...
         Starting ^[[0;1;39mTailscale node agent^[[0m...
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mPermit User Sessions^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mGetty on tty1^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mSerial Getty on ttyS2^[[0m.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mLogin Prompts^[[0m.
         Starting ^[[0;1;39mHostname Service^[[0m...
[^[[0;1;31mFAILED^[[0m] Failed to start ^[[0;1;39mdnsmasq - …t DHCP and caching DNS server^[[0m.
See 'systemctl status dnsmasq.service' for details.
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mHost and Network Name Lookups^[[0m.
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mHostname Service^[[0m.
         Starting ^[[0;1;39mNetwork Manager Script Dispatcher Service^[[0m...
[    8.192053] rk_gmac-dwmac fe010000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mNetwork Manager Script Dispatcher Service^[[0m.
[    8.221724] rk_gmac-dwmac fe010000.ethernet eth0: PHY [stmmac-1:01] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[    8.224669] dwmac4: Master AXI performs any burst length
[    8.225174] rk_gmac-dwmac fe010000.ethernet eth0: No Safety Features support found
[    8.227524] rk_gmac-dwmac fe010000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
[    8.229192] rk_gmac-dwmac fe010000.ethernet eth0: registered PTP clock
[    8.229923] rk_gmac-dwmac fe010000.ethernet eth0: configuring for phy/rgmii-id link mode
[    8.284289] rk_gmac-dwmac fe2a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mTailscale node agent^[[0m.
[    8.313571] rk_gmac-dwmac fe2a0000.ethernet eth1: PHY [stmmac-0:01] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[    8.325571] dwmac4: Master AXI performs any burst length
[    8.326067] rk_gmac-dwmac fe2a0000.ethernet eth1: No Safety Features support found
[    8.327175] rk_gmac-dwmac fe2a0000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
[    8.328451] rk_gmac-dwmac fe2a0000.ethernet eth1: registered PTP clock
[    8.329047] rk_gmac-dwmac fe2a0000.ethernet eth1: configuring for phy/rgmii-id link mode
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mWait for Network to be Configured^[[0m.
[   12.359432] rk_gmac-dwmac fe010000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Debian GNU/Linux 11 rock-3b ttyS2

rock-3b login: [   17.158640] platform sdio-pwrseq: deferred probe pending: pwrseq_simple: reset control not ready
[   17.159481] platform 3c0000000.pcie: deferred probe pending: rockchip-dw-pcie: failed to initialize the phy
[   17.160361] platform fcc00000.usb: deferred probe pending: dwc3: failed to initialize core
[   17.161108] platform fd000000.usb: deferred probe pending: dwc3: failed to initialize core
[   17.161993] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to fd000000.usb
[   17.162945] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to fcc00000.usb
[   17.163884] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to fde60000.gpu
[   17.164821] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to fdea0000.video-codec
[   17.165877] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to fdeb0000.rga
[   17.166824] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to fdee0000.video-codec
[   17.167821] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to fe040000.vop
[   17.168757] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to fe0a0000.hdmi
[   17.169795] rockchip-pm-domain fdd90000.power-management:power-controller: sync_state() pending due to 3c0000000.pcie

rock-3b login: radxa
Password: 
Linux rock-3b 7.1.0-rc6-chaoyi-00011-ga31e2e6fae27 #5 SMP PREEMPT Tue Jun  9 12:40:22 CEST 2026 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Tue Jun  9 12:55:51 CEST 2026 on ttyS2
^[[?2004hradxa@rock-3b:~$ ^[[7mlsmod | grep rocket^[[27m\r^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[Clsmod | grep rocket
^[[?2004l\r^[[01;31m^[[Krocket^[[m^[[K                 24576  0
^[[?2004hradxa@rock-3b:~$ ^[[7msudo dmesg -C; python3 ~/npu-debug/teflon_test.py^[[27m\r^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[C^[[Csudo dmesg -C; python3 ~/npu-debug/teflon_test.py
^[[?2004l\rInput: [ 1 80 80 16] <class 'numpy.uint8'>
Output: [  1  40  40 128] <class 'numpy.uint8'>
[  136.899509] rocket fde40000.npu: NPU job timed out
Run 0: elapsed=529.6ms out_sum=26214400
[  137.443518] rocket fde40000.npu: NPU job timed out
Run 1: elapsed=538.0ms out_sum=26214400
[  137.987451] rocket fde40000.npu: NPU job timed out
Run 2: elapsed=543.3ms out_sum=26214400
[  138.531447] rocket fde40000.npu: NPU job timed out
Run 3: elapsed=540.3ms out_sum=26214400
[  139.075418] rocket fde40000.npu: NPU job timed out
Run 4: elapsed=541.1ms out_sum=26214400
Done
^[[?2004hradxa@rock-3b:~$ 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-09 11:11             ` Midgy Balon
@ 2026-06-10  1:14               ` Chaoyi Chen
  2026-06-10 10:05                 ` Diederik de Haas
  0 siblings, 1 reply; 35+ messages in thread
From: Chaoyi Chen @ 2026-06-10  1:14 UTC (permalink / raw)
  To: Midgy Balon
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao

Hi Midgy,

On 6/9/2026 7:11 PM, Midgy Balon wrote:
> Hello Chaoyi,
> 
> You were right - building rocket as a module fixes it. Thanks for the pointer.
> 
> I rebuilt with CONFIG_DRM_ACCEL_ROCKET=m (everything else the same:
> need_regulator on
> the RK3568 NPU power domain via a DOMAIN_M_R variant, domain-supply =
> <&vdd_npu>, and the
> regulator-always-on workaround dropped). The board now boots cleanly
> and, more importantly,
> an NPU job submit no longer hangs: I ran the test workload five times
> with no RCU stall and
> no freeze.
> 
> So with rocket=m the need_regulator approach works on RK3568, and I'll
> keep it for v4
> (domain-supply + need_regulator, instead of marking vdd_npu
> always-on). rocket=m is the
> normal configuration anyway; my earlier hang came from building it =y
> in a self-contained
> image, so it probed in the initcalls (around 2 s) and the genpd ->
> I2C-PMIC regulator
> transition ran before the system was ready. As a module it loads from
> udev much later
> (~6.8 s here), after the I2C controller and regulator core are fully up.
> 
> On your question of when the device-link error is printed - it is at
> power-domain
> controller probe, not at the rocket probe:
> 
>   [    2.700618] vdd_npu: Bringing 500000uV into 825000-825000uV
>   [    2.749637] rockchip-pm-domain fdd90000.power-management:power-controller:
>                  Failed to create device link (0x180) with supplier 0-0020 for
>                  /power-management@fdd90000/power-controller/power-domain@6
>   [    2.945955] platform fde40000.npu: Adding to iommu group 3
>   ...
>   [    6.840374] rocket: loading out-of-tree module taints kernel.
>   [    6.877647] [drm] Initialized rocket 0.0.0 for rknn on minor 0
>   [    6.879950] rocket fde40000.npu: Rockchip NPU core 0 version: 0
> 
> So the device-link to the rk809 PMIC (0-0020) fails to form at ~2.75
> s, well before rocket
> loads at ~6.8 s. It is non-fatal here - the vdd_npu rail is brought up
> by the regulator core
> and all jobs run - and there is no "failed to get ack on domain npu"
> NoC warning this boot
> (the always-on kernel had one). The complete boot log is attached.
> 
> Two notes / one question:
> - This boot used fw_devlink=permissive on the command line. Is the
> "Failed to create device
>   link ... supplier 0-0020" at pmdomain probe expected/benign, or is
> there a clean way to make
>   it order correctly (so it also works without permissive, and a =y
> build wouldn't deadlock in
>   the initcalls)?

We encountered the same issue on the RK3588 NPU before. And it was
resolved with the following patch at that time.

https://lore.kernel.org/all/20251216055247.13150-1-rmxpzlb@gmail.com/

Please compare the differences in NPU pmdomain and DTS configuration
between the RK3568 and RK3588.

> - (The convolution output is still uniform zero-point / the job times
> out - that is the
>   separate NPU compute-completion issue, unrelated to the power-domain
> work. Finley, that is
>   the one I flagged earlier re PVTPLL/NoC.)
> 
> Kind regards,
> Midgy
> 

-- 
Best, 
Chaoyi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-10  1:14               ` Chaoyi Chen
@ 2026-06-10 10:05                 ` Diederik de Haas
  2026-06-10 13:38                   ` Midgy Balon
  0 siblings, 1 reply; 35+ messages in thread
From: Diederik de Haas @ 2026-06-10 10:05 UTC (permalink / raw)
  To: Chaoyi Chen, Midgy Balon
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro, will,
	robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao,
	Jonas Karlman

Hi,

On Wed Jun 10, 2026 at 3:14 AM CEST, Chaoyi Chen wrote:
> Hi Midgy,
>
> On 6/9/2026 7:11 PM, Midgy Balon wrote:
>> Hello Chaoyi,
>> 
>> You were right - building rocket as a module fixes it. Thanks for the pointer.
>> 
>> I rebuilt with CONFIG_DRM_ACCEL_ROCKET=m (everything else the same:
>> need_regulator on
>> the RK3568 NPU power domain via a DOMAIN_M_R variant, domain-supply =
>> <&vdd_npu>, and the
>> regulator-always-on workaround dropped). The board now boots cleanly
>> and, more importantly,
>> an NPU job submit no longer hangs: I ran the test workload five times
>> with no RCU stall and
>> no freeze.
>> 
>> So with rocket=m the need_regulator approach works on RK3568, and I'll
>> keep it for v4
>> (domain-supply + need_regulator, instead of marking vdd_npu
>> always-on). rocket=m is the
>> normal configuration anyway; my earlier hang came from building it =y
>> in a self-contained
>> image, so it probed in the initcalls (around 2 s) and the genpd ->
>> I2C-PMIC regulator
>> transition ran before the system was ready. As a module it loads from
>> udev much later
>> (~6.8 s here), after the I2C controller and regulator core are fully up.
>> 
>> On your question of when the device-link error is printed - it is at
>> power-domain
>> controller probe, not at the rocket probe:
>> 
>>   [    2.700618] vdd_npu: Bringing 500000uV into 825000-825000uV
>>   [    2.749637] rockchip-pm-domain fdd90000.power-management:power-controller:
>>                  Failed to create device link (0x180) with supplier 0-0020 for
>>                  /power-management@fdd90000/power-controller/power-domain@6
>>   [    2.945955] platform fde40000.npu: Adding to iommu group 3
>>   ...
>>   [    6.840374] rocket: loading out-of-tree module taints kernel.
>>   [    6.877647] [drm] Initialized rocket 0.0.0 for rknn on minor 0
>>   [    6.879950] rocket fde40000.npu: Rockchip NPU core 0 version: 0
>> 
>> So the device-link to the rk809 PMIC (0-0020) fails to form at ~2.75
>> s, well before rocket
>> loads at ~6.8 s. It is non-fatal here - the vdd_npu rail is brought up
>> by the regulator core
>> and all jobs run - and there is no "failed to get ack on domain npu"
>> NoC warning this boot
>> (the always-on kernel had one). The complete boot log is attached.
>> 
>> Two notes / one question:
>> - This boot used fw_devlink=permissive on the command line. Is the
>> "Failed to create device
>>   link ... supplier 0-0020" at pmdomain probe expected/benign, or is
>> there a clean way to make
>>   it order correctly (so it also works without permissive, and a =y
>> build wouldn't deadlock in
>>   the initcalls)?
>
> We encountered the same issue on the RK3588 NPU before. And it was
> resolved with the following patch at that time.
>
> https://lore.kernel.org/all/20251216055247.13150-1-rmxpzlb@gmail.com/
>
> Please compare the differences in NPU pmdomain and DTS configuration
> between the RK3568 and RK3588.

About a month ago on #linux-rockchip we were discussing PM 'stuff':
https://libera.catirclogs.org/linux-rockchip/2026-05-15#39939137;
which references this paste
https://paste.sr.ht/~diederik/89d9f84e22474e837b55286d213b67f03859ce2e
I've since removed the DCDC_REG2 for PineTab2 and the 'fix' should likely
be extended to cover all RK3566/RK3568 devices though.

It's what I made at the time hoping to fix a suspend/resume issue when
trying upstream TF-A. It didn't fix the issue at the time, but may still
be useful/needed and I think it's what Chaoyi hinted at.

Just yesterday, Jonas posted this patch which may be useful/needed too:
https://lore.kernel.org/linux-rockchip/20260609154124.445182-1-jonas@kwiboo.se/

HTH,
  Diederik

>> - (The convolution output is still uniform zero-point / the job times
>> out - that is the
>>   separate NPU compute-completion issue, unrelated to the power-domain
>> work. Finley, that is
>>   the one I flagged earlier re PVTPLL/NoC.)
>> 
>> Kind regards,
>> Midgy
>> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-10 10:05                 ` Diederik de Haas
@ 2026-06-10 13:38                   ` Midgy Balon
  2026-06-10 14:28                     ` Diederik de Haas
  0 siblings, 1 reply; 35+ messages in thread
From: Midgy Balon @ 2026-06-10 13:38 UTC (permalink / raw)
  To: Diederik de Haas
  Cc: Chaoyi Chen, tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro,
	will, robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao,
	Jonas Karlman

Hello Chaoyi & Diederik,

I compared the RK3568 and RK3588 NPU power-domain + DTS as you
suggested, and it lines up
exactly with what you described.

The difference is the `need_regulator` capability. RK3588's NPU domain is
`DOMAIN_RK3588("npu", …, false, true)` — the trailing `true` is
`regulator`/`need_regulator`.
The mainline RK3568 macro `DOMAIN_RK3568(name, pwr, req, wakeup)` has
no regulator parameter at
all, so `RK3568_PD_NPU` can't be marked need_regulator. My v4 adds
that: a regulator-capable
RK3568 NPU domain (need_regulator = true) plus `domain-supply =
<&vdd_npu>` on the NPU node —
i.e. the same shape as RK3588.

And the fix you referenced (Frank Zhang's "pmdomain: rockchip: Fix init genpd as
GENPD_STATE_ON before regulator ready", plus "quiet regulator error on
-EPROBE_DEFER") is
already in my base (v7.1-rc6), so the `if (need_regulator)
rockchip_pd_power(pd, false)`
default-off path is in effect. That's what resolves the actual problem
for me: with rocket
built as a module (the normal config), need_regulator on the NPU
domain, and those pmdomain
patches in place, the board boots cleanly and NPU jobs run with no RCU
stall / no deadlock. My
earlier hang was an artifact of a self-contained rocket=y image
probing in the initcalls before
the I2C regulator core was up — as a module it loads ~6.8 s in, well
after, so it's gone.

I also went back and checked the `fw_devlink=permissive` question
myself — and good news, it
turns out it is NOT needed. I rebooted the exact same kernel with
permissive removed from the
cmdline (strict fw_devlink, the default), and the board boots cleanly,
the NPU probes
(`rocket fde40000.npu: Rockchip NPU core 0 version: 0`), and NPU jobs
submit and run five times
in a row with no deadlock and no RCU stall. So strict fw_devlink
resolves the NPU/PMIC ordering
fine via deferred probe.

The one remaining thing is cosmetic: at power-domain-controller probe
(~2.94 s) I still get,
in BOTH modes (with or without permissive):

  rockchip-pm-domain …: Failed to create device link (0x180) with
supplier 0-0020 …power-domain@6

i.e. genpd can't form the link to the rk809 (the I2C PMIC supplying
vdd_npu) because the PMIC
isn't registered yet at that point. It's non-fatal — the domain
defaults off (Frank's patch),
the rail comes up via the regulator core, the NPU probes a few seconds
later, and all jobs run.

One question: on RK3588 with need_regulator, do you also see that
"Failed to create device
link … supplier <pmic>" line at pmdomain probe, or does it order
cleanly? If RK3588 is clean,
is there a DTS detail (e.g. the regulator's bus/probe order) I should
mirror on RK3568 to make
the link form in time — or is this line just expected/harmless and
best left as-is?

@Diederik — thanks; the DCDC_REG2 change and Jonas's USB-suspend
series look like generally
useful RK356x robustness fixes, though for this specific NPU
device-link the need_regulator +
Frank's pmdomain patches seem to be the relevant piece. I'll keep them
in mind for suspend.

The convolution-output / compute-completion issue is still separate
and open (@Finley — that's
the PVTPLL/NoC one); the power-domain side is in good shape for v4.

Thanks y'all for your help :)

Kind regards,
Midgy

Le mer. 10 juin 2026 à 12:05, Diederik de Haas
<diederik@cknow-tech.com> a écrit :
>
> Hi,
>
> On Wed Jun 10, 2026 at 3:14 AM CEST, Chaoyi Chen wrote:
> > Hi Midgy,
> >
> > On 6/9/2026 7:11 PM, Midgy Balon wrote:
> >> Hello Chaoyi,
> >>
> >> You were right - building rocket as a module fixes it. Thanks for the pointer.
> >>
> >> I rebuilt with CONFIG_DRM_ACCEL_ROCKET=m (everything else the same:
> >> need_regulator on
> >> the RK3568 NPU power domain via a DOMAIN_M_R variant, domain-supply =
> >> <&vdd_npu>, and the
> >> regulator-always-on workaround dropped). The board now boots cleanly
> >> and, more importantly,
> >> an NPU job submit no longer hangs: I ran the test workload five times
> >> with no RCU stall and
> >> no freeze.
> >>
> >> So with rocket=m the need_regulator approach works on RK3568, and I'll
> >> keep it for v4
> >> (domain-supply + need_regulator, instead of marking vdd_npu
> >> always-on). rocket=m is the
> >> normal configuration anyway; my earlier hang came from building it =y
> >> in a self-contained
> >> image, so it probed in the initcalls (around 2 s) and the genpd ->
> >> I2C-PMIC regulator
> >> transition ran before the system was ready. As a module it loads from
> >> udev much later
> >> (~6.8 s here), after the I2C controller and regulator core are fully up.
> >>
> >> On your question of when the device-link error is printed - it is at
> >> power-domain
> >> controller probe, not at the rocket probe:
> >>
> >>   [    2.700618] vdd_npu: Bringing 500000uV into 825000-825000uV
> >>   [    2.749637] rockchip-pm-domain fdd90000.power-management:power-controller:
> >>                  Failed to create device link (0x180) with supplier 0-0020 for
> >>                  /power-management@fdd90000/power-controller/power-domain@6
> >>   [    2.945955] platform fde40000.npu: Adding to iommu group 3
> >>   ...
> >>   [    6.840374] rocket: loading out-of-tree module taints kernel.
> >>   [    6.877647] [drm] Initialized rocket 0.0.0 for rknn on minor 0
> >>   [    6.879950] rocket fde40000.npu: Rockchip NPU core 0 version: 0
> >>
> >> So the device-link to the rk809 PMIC (0-0020) fails to form at ~2.75
> >> s, well before rocket
> >> loads at ~6.8 s. It is non-fatal here - the vdd_npu rail is brought up
> >> by the regulator core
> >> and all jobs run - and there is no "failed to get ack on domain npu"
> >> NoC warning this boot
> >> (the always-on kernel had one). The complete boot log is attached.
> >>
> >> Two notes / one question:
> >> - This boot used fw_devlink=permissive on the command line. Is the
> >> "Failed to create device
> >>   link ... supplier 0-0020" at pmdomain probe expected/benign, or is
> >> there a clean way to make
> >>   it order correctly (so it also works without permissive, and a =y
> >> build wouldn't deadlock in
> >>   the initcalls)?
> >
> > We encountered the same issue on the RK3588 NPU before. And it was
> > resolved with the following patch at that time.
> >
> > https://lore.kernel.org/all/20251216055247.13150-1-rmxpzlb@gmail.com/
> >
> > Please compare the differences in NPU pmdomain and DTS configuration
> > between the RK3568 and RK3588.
>
> About a month ago on #linux-rockchip we were discussing PM 'stuff':
> https://libera.catirclogs.org/linux-rockchip/2026-05-15#39939137;
> which references this paste
> https://paste.sr.ht/~diederik/89d9f84e22474e837b55286d213b67f03859ce2e
> I've since removed the DCDC_REG2 for PineTab2 and the 'fix' should likely
> be extended to cover all RK3566/RK3568 devices though.
>
> It's what I made at the time hoping to fix a suspend/resume issue when
> trying upstream TF-A. It didn't fix the issue at the time, but may still
> be useful/needed and I think it's what Chaoyi hinted at.
>
> Just yesterday, Jonas posted this patch which may be useful/needed too:
> https://lore.kernel.org/linux-rockchip/20260609154124.445182-1-jonas@kwiboo.se/
>
> HTH,
>   Diederik
>
> >> - (The convolution output is still uniform zero-point / the job times
> >> out - that is the
> >>   separate NPU compute-completion issue, unrelated to the power-domain
> >> work. Finley, that is
> >>   the one I flagged earlier re PVTPLL/NoC.)
> >>
> >> Kind regards,
> >> Midgy
> >>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
  2026-06-10 13:38                   ` Midgy Balon
@ 2026-06-10 14:28                     ` Diederik de Haas
  0 siblings, 0 replies; 35+ messages in thread
From: Diederik de Haas @ 2026-06-10 14:28 UTC (permalink / raw)
  To: Midgy Balon, Diederik de Haas
  Cc: Chaoyi Chen, tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, joro,
	will, robin.murphy, dri-devel, linux-rockchip, devicetree,
	linux-arm-kernel, iommu, linux-kernel, Simon Xue, Finley Xiao,
	Jonas Karlman

On Wed Jun 10, 2026 at 3:36 PM CEST, Midgy Balon wrote:
> Hello Chaoyi & Diederik,
>
> I compared the RK3568 and RK3588 NPU power-domain + DTS as you
> suggested, and it lines up
> exactly with what you described.
>
> The difference is the `need_regulator` capability. RK3588's NPU domain is
> `DOMAIN_RK3588("npu", …, false, true)` — the trailing `true` is
> `regulator`/`need_regulator`.
> The mainline RK3568 macro `DOMAIN_RK3568(name, pwr, req, wakeup)` has
> no regulator parameter at
> all, so `RK3568_PD_NPU` can't be marked need_regulator. My v4 adds
> that: a regulator-capable
> RK3568 NPU domain (need_regulator = true) plus `domain-supply =
> <&vdd_npu>` on the NPU node —
> i.e. the same shape as RK3588.
>
> And the fix you referenced (Frank Zhang's "pmdomain: rockchip: Fix init genpd as
> GENPD_STATE_ON before regulator ready", plus "quiet regulator error on
> -EPROBE_DEFER") is
> already in my base (v7.1-rc6), so the `if (need_regulator)
> rockchip_pd_power(pd, false)`
> default-off path is in effect. That's what resolves the actual problem
> for me: with rocket
> built as a module (the normal config), need_regulator on the NPU
> domain, and those pmdomain
> patches in place, the board boots cleanly and NPU jobs run with no RCU
> stall / no deadlock. My
> earlier hang was an artifact of a self-contained rocket=y image
> probing in the initcalls before
> the I2C regulator core was up — as a module it loads ~6.8 s in, well
> after, so it's gone.
>
> I also went back and checked the `fw_devlink=permissive` question
> myself — and good news, it
> turns out it is NOT needed. I rebooted the exact same kernel with
> permissive removed from the
> cmdline (strict fw_devlink, the default), and the board boots cleanly,
> the NPU probes
> (`rocket fde40000.npu: Rockchip NPU core 0 version: 0`), and NPU jobs
> submit and run five times
> in a row with no deadlock and no RCU stall. So strict fw_devlink
> resolves the NPU/PMIC ordering
> fine via deferred probe.
>
> The one remaining thing is cosmetic: at power-domain-controller probe
> (~2.94 s) I still get,
> in BOTH modes (with or without permissive):
>
>   rockchip-pm-domain …: Failed to create device link (0x180) with
> supplier 0-0020 …power-domain@6
>
> i.e. genpd can't form the link to the rk809 (the I2C PMIC supplying
> vdd_npu) because the PMIC
> isn't registered yet at that point. It's non-fatal — the domain
> defaults off (Frank's patch),
> the rail comes up via the regulator core, the NPU probes a few seconds
> later, and all jobs run.
>
> One question: on RK3588 with need_regulator, do you also see that
> "Failed to create device
> link … supplier <pmic>" line at pmdomain probe, or does it order
> cleanly? If RK3588 is clean,
> is there a DTS detail (e.g. the regulator's bus/probe order) I should
> mirror on RK3568 to make
> the link form in time — or is this line just expected/harmless and
> best left as-is?

[    2.110935] rockchip-pm-domain fd8d8000.power-management:power-controller: Failed to create device link (0x180) with supplier 2-0042 for /power-management@fd8d8000/power-controller/power-domain@8
[    2.557459] sdhci-dwcmshc fe2e0000.mmc: Can't reduce the clock below 52MHz in HS200/HS400 mode
[    2.647174] rockchip-pm-domain fd8d8000.power-management:power-controller: Failed to create device link (0x180) with supplier 2-0042 for /power-management@fd8d8000/power-controller/power-domain@8
[    2.945089] rockchip-pm-domain fd8d8000.power-management:power-controller: Failed to create device link (0x180) with supplier spi2.0 for /power-management@fd8d8000/power-controller/power-domain@12

8 = NPU; 12 = GPU

on both nanopc-t6-lts and nanopc-t6-plus (both RK3588).
And on a 6.18 dmesg output I have for Rock 5B, I see the ~ same, but then
it's 1-0042 instead of 2-0042. 

I don't know if it's bad or harmless, but it is consistent.

HTH,
  Diederik

> @Diederik — thanks; the DCDC_REG2 change and Jonas's USB-suspend
> series look like generally
> useful RK356x robustness fixes, though for this specific NPU
> device-link the need_regulator +
> Frank's pmdomain patches seem to be the relevant piece. I'll keep them
> in mind for suspend.
>
> The convolution-output / compute-completion issue is still separate
> and open (@Finley — that's
> the PVTPLL/NoC one); the power-domain side is in good shape for v4.
>
> Thanks y'all for your help :)
>
> Kind regards,
> Midgy
>
> Le mer. 10 juin 2026 à 12:05, Diederik de Haas
> <diederik@cknow-tech.com> a écrit :
>>
>> Hi,
>>
>> On Wed Jun 10, 2026 at 3:14 AM CEST, Chaoyi Chen wrote:
>> > Hi Midgy,
>> >
>> > On 6/9/2026 7:11 PM, Midgy Balon wrote:
>> >> Hello Chaoyi,
>> >>
>> >> You were right - building rocket as a module fixes it. Thanks for the pointer.
>> >>
>> >> I rebuilt with CONFIG_DRM_ACCEL_ROCKET=m (everything else the same:
>> >> need_regulator on
>> >> the RK3568 NPU power domain via a DOMAIN_M_R variant, domain-supply =
>> >> <&vdd_npu>, and the
>> >> regulator-always-on workaround dropped). The board now boots cleanly
>> >> and, more importantly,
>> >> an NPU job submit no longer hangs: I ran the test workload five times
>> >> with no RCU stall and
>> >> no freeze.
>> >>
>> >> So with rocket=m the need_regulator approach works on RK3568, and I'll
>> >> keep it for v4
>> >> (domain-supply + need_regulator, instead of marking vdd_npu
>> >> always-on). rocket=m is the
>> >> normal configuration anyway; my earlier hang came from building it =y
>> >> in a self-contained
>> >> image, so it probed in the initcalls (around 2 s) and the genpd ->
>> >> I2C-PMIC regulator
>> >> transition ran before the system was ready. As a module it loads from
>> >> udev much later
>> >> (~6.8 s here), after the I2C controller and regulator core are fully up.
>> >>
>> >> On your question of when the device-link error is printed - it is at
>> >> power-domain
>> >> controller probe, not at the rocket probe:
>> >>
>> >>   [    2.700618] vdd_npu: Bringing 500000uV into 825000-825000uV
>> >>   [    2.749637] rockchip-pm-domain fdd90000.power-management:power-controller:
>> >>                  Failed to create device link (0x180) with supplier 0-0020 for
>> >>                  /power-management@fdd90000/power-controller/power-domain@6
>> >>   [    2.945955] platform fde40000.npu: Adding to iommu group 3
>> >>   ...
>> >>   [    6.840374] rocket: loading out-of-tree module taints kernel.
>> >>   [    6.877647] [drm] Initialized rocket 0.0.0 for rknn on minor 0
>> >>   [    6.879950] rocket fde40000.npu: Rockchip NPU core 0 version: 0
>> >>
>> >> So the device-link to the rk809 PMIC (0-0020) fails to form at ~2.75
>> >> s, well before rocket
>> >> loads at ~6.8 s. It is non-fatal here - the vdd_npu rail is brought up
>> >> by the regulator core
>> >> and all jobs run - and there is no "failed to get ack on domain npu"
>> >> NoC warning this boot
>> >> (the always-on kernel had one). The complete boot log is attached.
>> >>
>> >> Two notes / one question:
>> >> - This boot used fw_devlink=permissive on the command line. Is the
>> >> "Failed to create device
>> >>   link ... supplier 0-0020" at pmdomain probe expected/benign, or is
>> >> there a clean way to make
>> >>   it order correctly (so it also works without permissive, and a =y
>> >> build wouldn't deadlock in
>> >>   the initcalls)?
>> >
>> > We encountered the same issue on the RK3588 NPU before. And it was
>> > resolved with the following patch at that time.
>> >
>> > https://lore.kernel.org/all/20251216055247.13150-1-rmxpzlb@gmail.com/
>> >
>> > Please compare the differences in NPU pmdomain and DTS configuration
>> > between the RK3568 and RK3588.
>>
>> About a month ago on #linux-rockchip we were discussing PM 'stuff':
>> https://libera.catirclogs.org/linux-rockchip/2026-05-15#39939137;
>> which references this paste
>> https://paste.sr.ht/~diederik/89d9f84e22474e837b55286d213b67f03859ce2e
>> I've since removed the DCDC_REG2 for PineTab2 and the 'fix' should likely
>> be extended to cover all RK3566/RK3568 devices though.
>>
>> It's what I made at the time hoping to fix a suspend/resume issue when
>> trying upstream TF-A. It didn't fix the issue at the time, but may still
>> be useful/needed and I think it's what Chaoyi hinted at.
>>
>> Just yesterday, Jonas posted this patch which may be useful/needed too:
>> https://lore.kernel.org/linux-rockchip/20260609154124.445182-1-jonas@kwiboo.se/
>>
>> HTH,
>>   Diederik
>>
>> >> - (The convolution output is still uniform zero-point / the job times
>> >> out - that is the
>> >>   separate NPU compute-completion issue, unrelated to the power-domain
>> >> work. Finley, that is
>> >>   the one I flagged earlier re PVTPLL/NoC.)
>> >>
>> >> Kind regards,
>> >> Midgy
>> >>
>>
>
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2026-06-10 14:28 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 13:52 [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Midgy BALON
2026-06-04 13:52 ` [RFC PATCH v3 1/9] accel: rocket: Introduce per-SoC rocket_soc_data Midgy BALON
2026-06-04 14:08   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 2/9] accel: rocket: Derive DMA width and core count from match data Midgy BALON
2026-06-04 14:05   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 3/9] accel: rocket: Add RK3568 SoC support Midgy BALON
2026-06-04 14:05   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout Midgy BALON
2026-06-04 14:10   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 5/9] accel: rocket: Keep the IOMMU domain attached across jobs Midgy BALON
2026-06-04 14:08   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 6/9] iommu/rockchip: Clear AUTO_GATING bit 1 on the RK356x v1 IOMMU Midgy BALON
2026-06-04 14:04   ` sashiko-bot
2026-06-04 14:20   ` Tomeu Vizoso
2026-06-05  1:59   ` Chaoyi Chen
2026-06-07 21:05     ` Midgy Balon
2026-06-08  1:45       ` Chaoyi Chen
2026-06-08  3:40         ` Chaoyi Chen
2026-06-04 13:52 ` [RFC PATCH v3 7/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 Midgy BALON
2026-06-04 14:08   ` sashiko-bot
2026-06-04 16:55     ` Conor Dooley
2026-06-04 13:52 ` [RFC PATCH v3 8/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU Midgy BALON
2026-06-04 14:11   ` sashiko-bot
2026-06-04 13:52 ` [RFC PATCH v3 9/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU Midgy BALON
2026-06-05  1:36 ` [RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support Chaoyi Chen
2026-06-07 21:03   ` Midgy Balon
2026-06-08  1:40     ` Chaoyi Chen
2026-06-08  8:05       ` Midgy Balon
2026-06-08  9:14         ` Midgy Balon
2026-06-08  9:38           ` Chaoyi Chen
2026-06-09 11:11             ` Midgy Balon
2026-06-10  1:14               ` Chaoyi Chen
2026-06-10 10:05                 ` Diederik de Haas
2026-06-10 13:38                   ` Midgy Balon
2026-06-10 14:28                     ` Diederik de Haas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox