Re: [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: MidG971 <midgy971@gmail.com>
Cc: conor+dt@kernel.org, robh@kernel.org,
	dri-devel@lists.freedesktop.org, devicetree@vger.kernel.org
Subject: Re: [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support
Date: Sat, 13 Jun 2026 07:11:50 +0000	[thread overview]
Message-ID: <20260613071151.0553B1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260613070116.438906-4-midgy971@gmail.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] `pm_runtime_put_sync()` fails to suspend the device synchronously due to active autosuspend, leaking runtime PM state when `noc_init` fails.
- [High] `rk3568_noc_init()` ignores `clk_set_rate()` failures and incorrectly uses short-circuiting logic, allowing initialization to proceed with an inactive compute clock.
- [High] The RK3568 NPU compute clock (PVTPLL) is not restarted during runtime resume, leading to a hardware hang.

Pre-existing issues:
- [High] Uninitialized bulk clock IDs cause the driver to silently fail at managing the NPU's actual clocks, fetching the identical default clock repeatedly.
- [High] `devm_drm_dev_alloc()` memory is leaked on probe deferral because it is allocated on a global facade device.
--

commit 18b5e0abd4d8cdab9122360b8a9bd630b4573645
Author: Midgy BALON <midgy971@gmail.com>

accel: rocket: Add RK3568 SoC support

The RK3568 has a single core of the same NVDLA-derived NPU IP as the
RK3588, with a 32-bit AXI master.  Add rk3568_soc_data and its
compatible.

Unlike the RK3588, the RK3568 NPU's compute clock is a PVTPLL managed by
TF-A via SCMI; start it from an noc_init callback with a real rate change
(an intermediate rate defeats the clock framework's unchanged-rate
shortcut).  Powering on and de-idling the NPU NoC are left to the power
domain (genpd), which performs them when the IOMMU supplier is resumed,
so the driver does not poke the PMU directly.

If noc_init fails, unwind through rocket_core_fini() so the core is torn
down completely rather than leaking the runtime-PM and IOMMU state.

> diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
> index 09c445af7de73..779e951596a15 100644
> --- a/drivers/accel/rocket/rocket_core.c
> +++ b/drivers/accel/rocket/rocket_core.c
> @@ -88,6 +88,15 @@ int rocket_core_init(struct rocket_core *core)
>  		return err;
>  	}
>  
> +	if (core->soc_data->noc_init) {
> +		err = core->soc_data->noc_init(core);
> +		if (err) {
> +			pm_runtime_put_sync(dev);

[Severity: High]
Does this error path leak the runtime PM state?

Since rocket_core_init() enables autosuspend via pm_runtime_use_autosuspend(),
calling pm_runtime_put_sync() behaves asynchronously and schedules an
autosuspend timer instead of suspending immediately.

Then rocket_core_fini() calls pm_runtime_disable() which cancels pending
timers without suspending the device. Does this leave the device permanently
in the RPM_ACTIVE state and keep the power domain powered on?

> +			rocket_core_fini(core);
> +			return err;
> +		}
> +	}
> +
>  	version = rocket_pc_readl(core, VERSION);
>  	version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index f0beed2d522c7..86484110ad6f0 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c

[ ... ]

> @@ -223,12 +224,42 @@ static void rocket_remove(struct platform_device *pdev)
>  	}
>  }
>  
> +/*
> + * The NPU compute clock is a PVTPLL managed by TF-A via SCMI; spin it up
> + * with a real rate change (an intermediate rate defeats the clock
> + * framework's unchanged-rate shortcut).  Powering on and de-idling the NPU
> + * NoC are handled by the power domain (genpd) before the NPU is accessed.
> + */
> +static int rk3568_noc_init(struct rocket_core *core)
> +{
> +	struct clk *npu_clk;
> +
> +	npu_clk = of_clk_get_by_name(core->dev->of_node, "npu");
> +	if (IS_ERR(npu_clk))
> +		return dev_err_probe(core->dev, PTR_ERR(npu_clk),
> +				     "failed to get the NPU SCMI clock\n");
> +
> +	if (clk_set_rate(npu_clk, 600000000UL) ||
> +	    clk_set_rate(npu_clk, 1000000000UL))

[Severity: High]
Does this logical OR prevent the second rate change from executing if the
first one fails?

If clk_set_rate() fails for the first call, the short-circuiting logic will
prevent the second clk_set_rate() from running.

> +		dev_warn(core->dev, "failed to set the NPU compute clock rate\n");
> +	clk_put(npu_clk);
> +
> +	return 0;

[Severity: High]
Should this return an error code if the clock rate fails to set?

Returning 0 here allows initialization to proceed even if the compute clock
is inactive. When rocket_core_init() subsequently reads the NPU VERSION
register over the AXI bus, will this cause a system hang due to the
inactive clock?

> +}

[Severity: High]
This isn't a bug introduced by this patch, but does rocket_core_init() fail
to properly initialize the bulk clock IDs?

When rdev->cores is allocated with devm_kcalloc(), the core->clks array is
initialized with NULL pointers for the .id fields. 

When rocket_core_init() later fetches the clocks:

	err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);

With NULL .id strings, clk_get() will repeatedly fetch the first clock listed
in the device tree's clocks property for all elements. Does this leave the
other NPU clocks unmanaged and potentially disabled, leading to AXI bus
hangs?

[Severity: High]
This isn't a bug introduced by this patch, but is there a memory leak in the
probe deferral path for devm_drm_dev_alloc()?

In rocket_device_init(), rdev is allocated on the global drm_dev:

    rdev = devm_drm_dev_alloc(drm_dev->dev, ...);

If rocket_core_init() fails (e.g. returning -EPROBE_DEFER), rocket_probe()
clears the rdev pointer but the devm resources are tied to the static global
drm_dev which is never removed. 

When probe is retried, does this result in allocating a new struct
rocket_device and leaking the previous allocation permanently?

[Severity: High]
Does the driver fail to restart the NPU compute clock during runtime resume?

During runtime suspend, the NPU power domain (DOMAIN_RK3568) is powered off
and the PVTPLL loses state. 

On runtime resume, rocket_device_runtime_resume() enables the bulk clocks:

    err = clk_bulk_prepare_enable(ARRAY_SIZE(rdev->cores[core].clks),
                                  rdev->cores[core].clks);

But it doesn't re-apply the rate-change sequence used in rk3568_noc_init() to
force TF-A to spin the clock up. Will this cause the hardware to hang on the
next NPU access because the compute clock remains off?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=3

next prev parent reply	other threads:[~2026-06-13  7:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
2026-06-13  7:01 ` [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data MidG971
2026-06-13  7:10   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data MidG971
2026-06-13  7:09   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support MidG971
2026-06-13  7:11   ` sashiko-bot [this message]
2026-06-13  7:01 ` [RFC PATCH v4 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout MidG971
2026-06-13  7:15   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 5/9] accel: rocket: Keep the IOMMU domain attached across jobs MidG971
2026-06-13  7:12   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 MidG971
2026-06-13  7:11   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU MidG971
2026-06-13  7:09   ` sashiko-bot
2026-06-13  8:18   ` Jonas Karlman
2026-06-13  7:01 ` [RFC PATCH v4 8/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU MidG971
2026-06-13  7:40   ` Jonas Karlman
2026-06-13  7:01 ` [RFC PATCH v4 9/9] pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain MidG971
2026-06-13  7:21   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260613071151.0553B1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=midgy971@gmail.com \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox