Re: [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: MidG971 <midgy971@gmail.com>
Cc: conor+dt@kernel.org, dri-devel@lists.freedesktop.org,
	devicetree@vger.kernel.org, robh@kernel.org
Subject: Re: [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data
Date: Sat, 13 Jun 2026 07:09:56 +0000	[thread overview]
Message-ID: <20260613070956.90D1D1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260613070116.438906-3-midgy971@gmail.com>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [High] The bounds check for the 'cores' array uses the current node's match data rather than the allocated capacity, allowing an out-of-bounds write if the device tree has mixed compatible strings.

Pre-existing issues:
- [High] If `rocket_device_init()` fails, the global `rdev` pointer is left containing an `ERR_PTR`, causing a kernel panic on subsequent core probes.
- [High] Global DRM device and core arrays are allocated using `devm_` helpers tied to the first probed core's lifecycle, causing a Use-After-Free when that core is unbound.
- [High] Removing and re-probing NPU cores out of order corrupts the 'cores' array because 'num_cores' is incorrectly used as both an active count and the array index.
- [High] Missing locking around the global `rdev` initialization and `rdev->num_cores` access allows concurrent probes to corrupt device tracking and leak allocations.
- [High] The DRM device is registered before its hardware cores are fully initialized, allowing userspace to trigger a NULL pointer dereference.
--

commit a4289515ae9af5aaa001a280a8b52c1288f26e24
Author: Midgy BALON <midgy971@gmail.com>
One line subject: accel: rocket: Derive DMA width and core count from match data

The probe already has the per-SoC match data, which now records the core
count and DMA width. This patch uses it for the cores array allocation and
the device DMA mask instead of re-scanning the device tree for available
core nodes. It also rejects a device tree that declares more NPU core nodes
than the SoC has to prevent overrun.

> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> index 46e6ee1e72c5f..6186f4faa3a2a 100644
> --- a/drivers/accel/rocket/rocket_device.c
> +++ b/drivers/accel/rocket/rocket_device.c
> @@ -6,18 +6,16 @@
>  #include <linux/clk.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/platform_device.h>
> -#include <linux/of.h>
>  
>  #include "rocket_device.h"
>  
>  struct rocket_device *rocket_device_init(struct platform_device *pdev,
> -					 const struct drm_driver *rocket_drm_driver)
> +					 const struct drm_driver *rocket_drm_driver,
> +					 const struct rocket_soc_data *soc_data)
>  {
>  	struct device *dev = &pdev->dev;
> -	struct device_node *core_node;
>  	struct rocket_device *rdev;
>  	struct drm_device *ddev;
> -	unsigned int num_cores = 0;
>  	int err;
>  
>  	rdev = devm_drm_dev_alloc(dev, rocket_drm_driver, struct rocket_device, ddev);
> @@ -27,17 +25,14 @@ struct rocket_device *rocket_device_init(struct platform_device *pdev,
>  	ddev = &rdev->ddev;
>  	dev_set_drvdata(dev, rdev);
>  
> -	for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
> -		if (of_device_is_available(core_node))
> -			num_cores++;
> -
> -	rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
> +	rdev->cores = devm_kcalloc(dev, soc_data->num_cores, sizeof(*rdev->cores),
> +				   GFP_KERNEL);

[Severity: High]
This is a pre-existing issue, but does using devres tied to the first
probed core's lifecycle cause a use-after-free when that core is unbound?

If userspace unbinds this first core via sysfs, rocket_remove() decrements
num_cores but does not explicitly clean up the DRM device because other
cores are still bound. Upon returning from remove, the driver core
automatically frees all devres tied to the first core's device. This
would destroy the global rdev memory, causing a use-after-free when any
remaining cores attempt to access it.

[Severity: High]
This is a pre-existing issue, but does registering the DRM device
before its hardware cores are fully initialized allow userspace to
trigger a NULL pointer dereference?

In rocket_device_init(), the driver calls drm_dev_register() to expose
the character device to userspace. This occurs before the caller,
rocket_probe(), populates rdev->cores[0].dev and initializes the hardware.

If userspace opens the device during this small timing window,
rocket_open() is called and invokes:

    rocket_iommu_domain_create(rdev->cores[0].dev);

Since the device pointer is still NULL, this would attempt to dereference
a NULL device pointer in the IOMMU subsystem, crashing the system.

[ ... ]

> diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
> index 384c38e13acce..f0beed2d522c7 100644
> --- a/drivers/accel/rocket/rocket_drv.c
> +++ b/drivers/accel/rocket/rocket_drv.c
> @@ -159,11 +159,15 @@ static const struct drm_driver rocket_drm_driver = {
>  
>  static int rocket_probe(struct platform_device *pdev)
>  {
> +	const struct rocket_soc_data *soc_data = of_device_get_match_data(&pdev->dev);
>  	int ret;
>  
> +	if (!soc_data)
> +		return -EINVAL;
> +
>  	if (rdev == NULL) {

[Severity: High]
This is a pre-existing issue, but does the lack of locking around the global
rdev initialization allow concurrent probes to corrupt device tracking
and leak memory?

If concurrent probes occur (for example, via asynchronous probing or
concurrent sysfs binds), multiple threads can pass this if check,
causing multiple DRM device allocations. They can also read the same
num_cores value, leading both threads to overwrite the same index in the
rdev->cores array.

>  		/* First core probing, initialize DRM device. */
> -		rdev = rocket_device_init(drm_dev, &rocket_drm_driver);
> +		rdev = rocket_device_init(drm_dev, &rocket_drm_driver, soc_data);
>  		if (IS_ERR(rdev)) {
>  			dev_err(&pdev->dev, "failed to initialize rocket device\n");
>  			return PTR_ERR(rdev);

[Severity: High]
This is a pre-existing issue, but does this error path leave the global
rdev pointer containing an ERR_PTR?

If rocket_device_init() fails, it returns an error pointer which is
assigned to the global rdev. The function then returns the error but fails
to reset rdev to NULL.

When the next core probes, the (rdev == NULL) check is bypassed, and the
driver immediately dereferences the error pointer when reading
rdev->num_cores:

    unsigned int core = rdev->num_cores;

Could this lead to a reproducible kernel panic?

> @@ -172,6 +176,12 @@ static int rocket_probe(struct platform_device *pdev)
>  
>  	unsigned int core = rdev->num_cores;
>  
> +	if (core >= soc_data->num_cores) {

[Severity: High]
Does this bounds check use the current node's capacity rather than the
array's actual allocated size?

The array rdev->cores is permanently allocated using soc_data->num_cores
from the first probed device in rocket_device_init(). However, this bounds
check evaluates against the soc_data of the current probing device.

If a device tree or overlay declares a node with a smaller num_cores
followed by one with a larger num_cores, the check will pass for an index
that exceeds the initial allocation size. Will this cause a direct
out-of-bounds write into the rdev->cores array, corrupting kernel memory?

> +		dev_err(&pdev->dev, "too many NPU core nodes (max %u)\n",
> +			soc_data->num_cores);
> +		return -EINVAL;
> +	}
> +
>  	dev_set_drvdata(&pdev->dev, rdev);
>  
>  	rdev->cores[core].rdev = rdev;

[Severity: High]
This is a pre-existing issue, but does removing and re-probing NPU cores
out of order corrupt the cores array?

When a core is removed, rocket_remove() decrements rdev->num_cores
without shifting the remaining cores in the array to close the gap:

    rdev->num_cores--;

If cores are removed out of order, active cores are left at indices equal
to or greater than the new num_cores. The driver later uses num_cores as its
search bound, making those trailing active cores unreachable.

Furthermore, if a new core is probed, it assigns itself to the index
rdev->num_cores, blindly overwriting a still-active core:

    rdev->cores[core].rdev = rdev;

Could this cause state corruption and resource leaks?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613070116.438906-1-midgy971@gmail.com?part=2

next prev parent reply	other threads:[~2026-06-13  7:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-13  7:01 [RFC PATCH v4 0/9] accel: rocket: Add RK3568 NPU support MidG971
2026-06-13  7:01 ` [RFC PATCH v4 1/9] accel: rocket: Introduce per-SoC rocket_soc_data MidG971
2026-06-13  7:10   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 2/9] accel: rocket: Derive DMA width and core count from match data MidG971
2026-06-13  7:09   ` sashiko-bot [this message]
2026-06-13  7:01 ` [RFC PATCH v4 3/9] accel: rocket: Add RK3568 SoC support MidG971
2026-06-13  7:11   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 4/9] accel: rocket: Reset the NPU before detaching the IOMMU on timeout MidG971
2026-06-13  7:15   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 5/9] accel: rocket: Keep the IOMMU domain attached across jobs MidG971
2026-06-13  7:12   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568 MidG971
2026-06-13  7:11   ` sashiko-bot
2026-06-13  7:01 ` [RFC PATCH v4 7/9] arm64: dts: rockchip: rk356x: Add the NPU and its IOMMU MidG971
2026-06-13  7:09   ` sashiko-bot
2026-06-13  8:18   ` Jonas Karlman
2026-06-13  7:01 ` [RFC PATCH v4 8/9] arm64: dts: rockchip: rk3568-rock-3b: Enable the NPU MidG971
2026-06-13  7:40   ` Jonas Karlman
2026-06-13  7:01 ` [RFC PATCH v4 9/9] pmdomain: rockchip: Add a regulator to the RK3568 NPU power domain MidG971
2026-06-13  7:21   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260613070956.90D1D1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=midgy971@gmail.com \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox