* [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases
@ 2026-04-09 8:46 Jingyi Wang
2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
0 siblings, 2 replies; 5+ messages in thread
From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw)
To: Bjorn Andersson, Mathieu Poirier
Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang,
linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang
Failure in rproc_attach() path can cause issues like NULL pointer
dereference when doing recovery and rproc_add() fail, improve the
robustness for rproc_attach fail cases.
Previous discussion:
https://lore.kernel.org/all/9bdc6b6d-ddf0-47af-b1ed-8d1e75bf30c2@oss.qualcomm.com/
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
---
Jingyi Wang (2):
remoteproc: core: Attach rproc asynchronously in rproc_add() path
remoteproc: qcom: Check glink->edge in glink_subdev_stop()
drivers/remoteproc/qcom_common.c | 3 +++
drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++--------
include/linux/remoteproc.h | 1 +
3 files changed, 16 insertions(+), 8 deletions(-)
---
base-commit: db7efce4ae23ad5e42f5f55428f529ff62b86fab
change-id: 20260409-rproc-attach-issue-2e3290d6d013
Best regards,
--
Jingyi Wang <jingyi.wang@oss.qualcomm.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path
2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang
@ 2026-04-09 8:46 ` Jingyi Wang
2026-04-10 14:28 ` Stephan Gerhold
2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
1 sibling, 1 reply; 5+ messages in thread
From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw)
To: Bjorn Andersson, Mathieu Poirier
Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang,
linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang
For rproc with state RPROC_DETACHED and auto_boot enabled, the attach
callback will be called in the rproc_add()->rproc_trigger_auto_boot()->
rproc_boot() path, the failure in this path will cause the rproc_add()
fail and the resource release, which will cause issue like rproc recovery
or falling back to firmware load fail. Add attach_work for rproc and call
it asynchronously in rproc_add() path like what rproc_start() do.
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
---
drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++--------
include/linux/remoteproc.h | 1 +
2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index b087ed21858a..f02db1113fae 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context)
release_firmware(fw);
}
+static void rproc_attach_work(struct work_struct *work)
+{
+ struct rproc *rproc = container_of(work, struct rproc, attach_work);
+
+ rproc_boot(rproc);
+}
+
static int rproc_trigger_auto_boot(struct rproc *rproc)
{
int ret;
- /*
- * Since the remote processor is in a detached state, it has already
- * been booted by another entity. As such there is no point in waiting
- * for a firmware image to be loaded, we can simply initiate the process
- * of attaching to it immediately.
- */
- if (rproc->state == RPROC_DETACHED)
- return rproc_boot(rproc);
+ if (rproc->state == RPROC_DETACHED) {
+ schedule_work(&rproc->attach_work);
+ return 0;
+ }
/*
* We're initiating an asynchronous firmware loading, so we can
@@ -2512,6 +2515,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name,
INIT_LIST_HEAD(&rproc->dump_segments);
INIT_WORK(&rproc->crash_handler, rproc_crash_handler_work);
+ INIT_WORK(&rproc->attach_work, rproc_attach_work);
rproc->state = RPROC_OFFLINE;
diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index b4795698d8c2..9b3f748e9eca 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -568,6 +568,7 @@ struct rproc {
struct list_head subdevs;
struct idr notifyids;
int index;
+ struct work_struct attach_work;
struct work_struct crash_handler;
unsigned int crash_cnt;
bool recovery_disabled;
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop()
2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang
2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
@ 2026-04-09 8:46 ` Jingyi Wang
2026-04-10 14:15 ` Stephan Gerhold
1 sibling, 1 reply; 5+ messages in thread
From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw)
To: Bjorn Andersson, Mathieu Poirier
Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang,
linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang
For rproc that doing attach, glink_subdev_start() is called only when
attach successfully. If rproc_report_crash() is called in the attach
function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could
be called and cause NULL pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300
Mem abort info:
...
pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem]
lr : glink_subdev_stop+0x1c/0x30 [qcom_common]
...
Call trace:
qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P)
glink_subdev_stop+0x1c/0x30 [qcom_common]
rproc_stop+0x58/0x17c
rproc_trigger_recovery+0xb0/0x150
rproc_crash_handler_work+0xa4/0xc4
process_scheduled_works+0x18c/0x2d8
worker_thread+0x144/0x280
kthread+0x124/0x138
ret_from_fork+0x10/0x20
Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000)
---[ end trace 0000000000000000 ]---
Add NULL pointer check in the glink_subdev_stop() to make sure
qcom_glink_smem_unregister() will not be called if glink_subdev_start()
is not called.
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
---
drivers/remoteproc/qcom_common.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index fd2b6824ad26..79d9d45e0b81 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -220,6 +220,9 @@ static void glink_subdev_stop(struct rproc_subdev *subdev, bool crashed)
{
struct qcom_rproc_glink *glink = to_glink_subdev(subdev);
+ if (!glink->edge)
+ return;
+
qcom_glink_smem_unregister(glink->edge);
glink->edge = NULL;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop()
2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
@ 2026-04-10 14:15 ` Stephan Gerhold
0 siblings, 0 replies; 5+ messages in thread
From: Stephan Gerhold @ 2026-04-10 14:15 UTC (permalink / raw)
To: Jingyi Wang
Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang,
trilok.soni, yijie.yang, linux-remoteproc, linux-kernel,
linux-arm-msm
On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote:
> For rproc that doing attach, glink_subdev_start() is called only when
> attach successfully. If rproc_report_crash() is called in the attach
> function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could
> be called and cause NULL pointer dereference:
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300
> Mem abort info:
> ...
> pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem]
> lr : glink_subdev_stop+0x1c/0x30 [qcom_common]
> ...
> Call trace:
> qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P)
> glink_subdev_stop+0x1c/0x30 [qcom_common]
> rproc_stop+0x58/0x17c
> rproc_trigger_recovery+0xb0/0x150
> rproc_crash_handler_work+0xa4/0xc4
> process_scheduled_works+0x18c/0x2d8
> worker_thread+0x144/0x280
> kthread+0x124/0x138
> ret_from_fork+0x10/0x20
> Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000)
> ---[ end trace 0000000000000000 ]---
>
> Add NULL pointer check in the glink_subdev_stop() to make sure
> qcom_glink_smem_unregister() will not be called if glink_subdev_start()
> is not called.
>
You mention the actual root problem here: Why is glink_subdev_stop()
called if glink_subdev_start() wasn't called?
The call to rproc_start_subdevices() in __rproc_attach() makes sure that
all subdevices are in consistent state when exiting the function (either
prepared+started or stopped+unprepared). Only if all subdevices were
started successfully, the rproc->state is changed to RPROC_ATTACHED.
In your case, attaching the rproc failed so the rproc->state should be
still RPROC_DETACHED. All subdevices should be stopped+unprepared. We
shouldn't stop/unprepare any subdevices again in this state, they all
might crash like glink does here.
We know that subdevices are already stopped+unprepared in RPROC_DETACHED
state, so I think you just need to skip rproc_stop_subdevices() and
rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff
below.
Thanks,
Stephan
@@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
if (!rproc->ops->stop)
return -EINVAL;
- /* Stop any subdevices for the remote processor */
- rproc_stop_subdevices(rproc, crashed);
+ /* Stop any subdevices for the remote processor if it was attached */
+ if (rproc->state != RPROC_DETACHED)
+ rproc_stop_subdevices(rproc, crashed);
/* the installed resource table is no longer accessible */
ret = rproc_reset_rsc_table_on_stop(rproc);
@@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
return ret;
}
- rproc_unprepare_subdevices(rproc);
+ if (rproc->state != RPROC_DETACHED)
+ rproc_unprepare_subdevices(rproc);
rproc->state = RPROC_OFFLINE;
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path
2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
@ 2026-04-10 14:28 ` Stephan Gerhold
0 siblings, 0 replies; 5+ messages in thread
From: Stephan Gerhold @ 2026-04-10 14:28 UTC (permalink / raw)
To: Jingyi Wang
Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang,
trilok.soni, yijie.yang, linux-remoteproc, linux-kernel,
linux-arm-msm, Bartosz Golaszewski, Dmitry Baryshkov
+Cc Bartosz, Dmitry
On Thu, Apr 09, 2026 at 01:46:21AM -0700, Jingyi Wang wrote:
> For rproc with state RPROC_DETACHED and auto_boot enabled, the attach
> callback will be called in the rproc_add()->rproc_trigger_auto_boot()->
> rproc_boot() path, the failure in this path will cause the rproc_add()
> fail and the resource release, which will cause issue like rproc recovery
> or falling back to firmware load fail. Add attach_work for rproc and call
> it asynchronously in rproc_add() path like what rproc_start() do.
>
> Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
> ---
> drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++--------
> include/linux/remoteproc.h | 1 +
> 2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index b087ed21858a..f02db1113fae 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context)
> release_firmware(fw);
> }
>
> +static void rproc_attach_work(struct work_struct *work)
> +{
> + struct rproc *rproc = container_of(work, struct rproc, attach_work);
> +
> + rproc_boot(rproc);
> +}
> +
> static int rproc_trigger_auto_boot(struct rproc *rproc)
> {
> int ret;
>
> - /*
> - * Since the remote processor is in a detached state, it has already
> - * been booted by another entity. As such there is no point in waiting
> - * for a firmware image to be loaded, we can simply initiate the process
> - * of attaching to it immediately.
> - */
> - if (rproc->state == RPROC_DETACHED)
> - return rproc_boot(rproc);
> + if (rproc->state == RPROC_DETACHED) {
> + schedule_work(&rproc->attach_work);
> + return 0;
> + }
I think the change itself is reasonable to make "auto-attach" behavior
consistent with "auto-boot". The commit message is a bit misleading
though:
- You're really doing two separate functional changes here:
(1) Ignore the return value of rproc_boot() during auto-boot attach,
to keep the remoteproc registered and available in sysfs even if
attaching fails.
(2) Run the rproc_boot() in the background using schedule_work().
[To improve boot performance? To work around some locking issues?]
- The actual issue you are seeing sounds like a use-after-free in the
remoteproc core error cleanup path. I think this one is still
present, we should really have a call to
cancel_work_sync(&rproc->crash_handler) as Dmitry wrote in the
previous discussion [1].
Thanks,
Stephan
[1]: https://lore.kernel.org/all/ce24a2sgg4b6wymoxwgl2ve6np2nxn2wuxfqxfpmvqqrpvgouf@xihd6ziqwu4m/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-10 14:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang
2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
2026-04-10 14:28 ` Stephan Gerhold
2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
2026-04-10 14:15 ` Stephan Gerhold
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox