* [PATCH 0/2] accel/rocket: fix unwinding in error paths of two functions
@ 2025-12-15 16:36 Quentin Schulz
2025-12-15 16:36 ` [PATCH 1/2] accel/rocket: fix unwinding in error path in rocket_core_init Quentin Schulz
2025-12-15 16:36 ` [PATCH 2/2] accel/rocket: fix unwinding in error path in rocket_probe Quentin Schulz
0 siblings, 2 replies; 3+ messages in thread
From: Quentin Schulz @ 2025-12-15 16:36 UTC (permalink / raw)
To: Tomeu Vizoso, Oded Gabbay, Jeff Hugo
Cc: dri-devel, linux-kernel, linux-rockchip, Quentin Schulz, stable
As reported[1], in the current state of master (that is, *without*
that[2] patch, yet unmerged), it is possible to trigger
Oops/out-of-bounds errors/unbalanced runtime PM by simply compiling
DRM_ACCEL_ROCKET built-in (=y) instead of as a module (=m).
This fixes points 1 and 2 reported here[1] by fixing the unwinding in
two functions to properly undo everything done in the same function
prior to the error.
Note that this doesn't mean the Rocket device is usable if one core is
missing. In fact, it seems it doesn't as I'm hit with many
rocket fdac0000.npu: NPU job timed out
followed by one
rocket fdad0000.npu: NPU job timed out
(and that, five times) whenever core0 (fdab0000.npu) fails to probe and
I'm running the example from
https://docs.mesa3d.org/teflon.html#do-some-inference-with-mobilenetv1
so something else probably needs some additional love.
[1] https://lore.kernel.org/linux-rockchip/0b20d760-ad4f-41c0-b733-39db10d6cc41@cherry.de/
[2] https://lore.kernel.org/linux-rockchip/20251205064739.20270-1-rmxpzlb@gmail.com/
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
---
Quentin Schulz (2):
accel/rocket: fix unwinding in error path in rocket_core_init
accel/rocket: fix unwinding in error path in rocket_probe
drivers/accel/rocket/rocket_core.c | 7 +++++--
drivers/accel/rocket/rocket_drv.c | 15 ++++++++++++++-
2 files changed, 19 insertions(+), 3 deletions(-)
---
base-commit: a619746d25c8adafe294777cc98c47a09759b3ed
change-id: 20251212-rocket-error-path-f9784c46a503
Best regards,
--
Quentin Schulz <quentin.schulz@cherry.de>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] accel/rocket: fix unwinding in error path in rocket_core_init
2025-12-15 16:36 [PATCH 0/2] accel/rocket: fix unwinding in error paths of two functions Quentin Schulz
@ 2025-12-15 16:36 ` Quentin Schulz
2025-12-15 16:36 ` [PATCH 2/2] accel/rocket: fix unwinding in error path in rocket_probe Quentin Schulz
1 sibling, 0 replies; 3+ messages in thread
From: Quentin Schulz @ 2025-12-15 16:36 UTC (permalink / raw)
To: Tomeu Vizoso, Oded Gabbay, Jeff Hugo
Cc: dri-devel, linux-kernel, linux-rockchip, Quentin Schulz, stable
From: Quentin Schulz <quentin.schulz@cherry.de>
When rocket_job_init() is called, iommu_group_get() has already been
called, therefore we should call iommu_group_put() and make the
iommu_group pointer NULL. This aligns with what's done in
rocket_core_fini().
If pm_runtime_resume_and_get() somehow fails, not only should
rocket_job_fini() be called but we should also unwind everything done
before that, that is, disable PM, put the iommu_group, NULLify it and
then call rocket_job_fini(). This is exactly what's done in
rocket_core_fini() so let's call that function instead of duplicating
the code.
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Cc: stable@vger.kernel.org
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
---
drivers/accel/rocket/rocket_core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index abe7719c1db46..b3b2fa9ba645a 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -59,8 +59,11 @@ int rocket_core_init(struct rocket_core *core)
core->iommu_group = iommu_group_get(dev);
err = rocket_job_init(core);
- if (err)
+ if (err) {
+ iommu_group_put(core->iommu_group);
+ core->iommu_group = NULL;
return err;
+ }
pm_runtime_use_autosuspend(dev);
@@ -76,7 +79,7 @@ int rocket_core_init(struct rocket_core *core)
err = pm_runtime_resume_and_get(dev);
if (err) {
- rocket_job_fini(core);
+ rocket_core_fini(core);
return err;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] accel/rocket: fix unwinding in error path in rocket_probe
2025-12-15 16:36 [PATCH 0/2] accel/rocket: fix unwinding in error paths of two functions Quentin Schulz
2025-12-15 16:36 ` [PATCH 1/2] accel/rocket: fix unwinding in error path in rocket_core_init Quentin Schulz
@ 2025-12-15 16:36 ` Quentin Schulz
1 sibling, 0 replies; 3+ messages in thread
From: Quentin Schulz @ 2025-12-15 16:36 UTC (permalink / raw)
To: Tomeu Vizoso, Oded Gabbay, Jeff Hugo
Cc: dri-devel, linux-kernel, linux-rockchip, Quentin Schulz, stable
From: Quentin Schulz <quentin.schulz@cherry.de>
When rocket_core_init() fails (as could be the case with EPROBE_DEFER),
we need to properly unwind by decrementing the counter we just
incremented and if this is the first core we failed to probe, remove the
rocket DRM device with rocket_device_fini() as well. This matches the
logic in rocket_remove(). Failing to properly unwind results in
out-of-bounds accesses.
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Cc: stable@vger.kernel.org
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
---
Note that this means that, technically, the first core (in HW) may not
be the first core in the kernel (if EPROBE_DEFER is returned by the
first core's probe for example). This may be a real issue if we rely on
this. E.g. I see the iommu domain is set in in rocket_open() with
rocket_iommu_domain_create(rdev->cores[0].dev) which could be expecting
the main core (but it isn't clear from the commit logs, so maybe all
cores may be totally independent). In any case, this patch is keeping
the status quo.
---
drivers/accel/rocket/rocket_drv.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 5c0b63f0a8f00..f6ef4c7aeef11 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -13,6 +13,7 @@
#include <linux/platform_device.h>
#include <linux/pm_runtime.h>
+#include "rocket_device.h"
#include "rocket_drv.h"
#include "rocket_gem.h"
#include "rocket_job.h"
@@ -158,6 +159,8 @@ static const struct drm_driver rocket_drm_driver = {
static int rocket_probe(struct platform_device *pdev)
{
+ int ret;
+
if (rdev == NULL) {
/* First core probing, initialize DRM device. */
rdev = rocket_device_init(drm_dev, &rocket_drm_driver);
@@ -177,7 +180,17 @@ static int rocket_probe(struct platform_device *pdev)
rdev->num_cores++;
- return rocket_core_init(&rdev->cores[core]);
+ ret = rocket_core_init(&rdev->cores[core]);
+ if (ret) {
+ rdev->num_cores--;
+
+ if (rdev->num_cores == 0) {
+ rocket_device_fini(rdev);
+ rdev = NULL;
+ }
+ }
+
+ return ret;
}
static void rocket_remove(struct platform_device *pdev)
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-12-15 16:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-15 16:36 [PATCH 0/2] accel/rocket: fix unwinding in error paths of two functions Quentin Schulz
2025-12-15 16:36 ` [PATCH 1/2] accel/rocket: fix unwinding in error path in rocket_core_init Quentin Schulz
2025-12-15 16:36 ` [PATCH 2/2] accel/rocket: fix unwinding in error path in rocket_probe Quentin Schulz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).