* [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases
@ 2026-04-09 8:46 Jingyi Wang
2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
0 siblings, 2 replies; 5+ messages in thread
From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw)
To: Bjorn Andersson, Mathieu Poirier
Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang,
linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang
Failure in rproc_attach() path can cause issues like NULL pointer
dereference when doing recovery and rproc_add() fail, improve the
robustness for rproc_attach fail cases.
Previous discussion:
https://lore.kernel.org/all/9bdc6b6d-ddf0-47af-b1ed-8d1e75bf30c2@oss.qualcomm.com/
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
---
Jingyi Wang (2):
remoteproc: core: Attach rproc asynchronously in rproc_add() path
remoteproc: qcom: Check glink->edge in glink_subdev_stop()
drivers/remoteproc/qcom_common.c | 3 +++
drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++--------
include/linux/remoteproc.h | 1 +
3 files changed, 16 insertions(+), 8 deletions(-)
---
base-commit: db7efce4ae23ad5e42f5f55428f529ff62b86fab
change-id: 20260409-rproc-attach-issue-2e3290d6d013
Best regards,
--
Jingyi Wang <jingyi.wang@oss.qualcomm.com>
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path 2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang @ 2026-04-09 8:46 ` Jingyi Wang 2026-04-10 14:28 ` Stephan Gerhold 2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang 1 sibling, 1 reply; 5+ messages in thread From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw) To: Bjorn Andersson, Mathieu Poirier Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang For rproc with state RPROC_DETACHED and auto_boot enabled, the attach callback will be called in the rproc_add()->rproc_trigger_auto_boot()-> rproc_boot() path, the failure in this path will cause the rproc_add() fail and the resource release, which will cause issue like rproc recovery or falling back to firmware load fail. Add attach_work for rproc and call it asynchronously in rproc_add() path like what rproc_start() do. Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> --- drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++-------- include/linux/remoteproc.h | 1 + 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index b087ed21858a..f02db1113fae 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context) release_firmware(fw); } +static void rproc_attach_work(struct work_struct *work) +{ + struct rproc *rproc = container_of(work, struct rproc, attach_work); + + rproc_boot(rproc); +} + static int rproc_trigger_auto_boot(struct rproc *rproc) { int ret; - /* - * Since the remote processor is in a detached state, it has already - * been booted by another entity. As such there is no point in waiting - * for a firmware image to be loaded, we can simply initiate the process - * of attaching to it immediately. - */ - if (rproc->state == RPROC_DETACHED) - return rproc_boot(rproc); + if (rproc->state == RPROC_DETACHED) { + schedule_work(&rproc->attach_work); + return 0; + } /* * We're initiating an asynchronous firmware loading, so we can @@ -2512,6 +2515,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name, INIT_LIST_HEAD(&rproc->dump_segments); INIT_WORK(&rproc->crash_handler, rproc_crash_handler_work); + INIT_WORK(&rproc->attach_work, rproc_attach_work); rproc->state = RPROC_OFFLINE; diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index b4795698d8c2..9b3f748e9eca 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -568,6 +568,7 @@ struct rproc { struct list_head subdevs; struct idr notifyids; int index; + struct work_struct attach_work; struct work_struct crash_handler; unsigned int crash_cnt; bool recovery_disabled; -- 2.34.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path 2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang @ 2026-04-10 14:28 ` Stephan Gerhold 0 siblings, 0 replies; 5+ messages in thread From: Stephan Gerhold @ 2026-04-10 14:28 UTC (permalink / raw) To: Jingyi Wang Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm, Bartosz Golaszewski, Dmitry Baryshkov +Cc Bartosz, Dmitry On Thu, Apr 09, 2026 at 01:46:21AM -0700, Jingyi Wang wrote: > For rproc with state RPROC_DETACHED and auto_boot enabled, the attach > callback will be called in the rproc_add()->rproc_trigger_auto_boot()-> > rproc_boot() path, the failure in this path will cause the rproc_add() > fail and the resource release, which will cause issue like rproc recovery > or falling back to firmware load fail. Add attach_work for rproc and call > it asynchronously in rproc_add() path like what rproc_start() do. > > Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> > --- > drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++-------- > include/linux/remoteproc.h | 1 + > 2 files changed, 13 insertions(+), 8 deletions(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index b087ed21858a..f02db1113fae 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context) > release_firmware(fw); > } > > +static void rproc_attach_work(struct work_struct *work) > +{ > + struct rproc *rproc = container_of(work, struct rproc, attach_work); > + > + rproc_boot(rproc); > +} > + > static int rproc_trigger_auto_boot(struct rproc *rproc) > { > int ret; > > - /* > - * Since the remote processor is in a detached state, it has already > - * been booted by another entity. As such there is no point in waiting > - * for a firmware image to be loaded, we can simply initiate the process > - * of attaching to it immediately. > - */ > - if (rproc->state == RPROC_DETACHED) > - return rproc_boot(rproc); > + if (rproc->state == RPROC_DETACHED) { > + schedule_work(&rproc->attach_work); > + return 0; > + } I think the change itself is reasonable to make "auto-attach" behavior consistent with "auto-boot". The commit message is a bit misleading though: - You're really doing two separate functional changes here: (1) Ignore the return value of rproc_boot() during auto-boot attach, to keep the remoteproc registered and available in sysfs even if attaching fails. (2) Run the rproc_boot() in the background using schedule_work(). [To improve boot performance? To work around some locking issues?] - The actual issue you are seeing sounds like a use-after-free in the remoteproc core error cleanup path. I think this one is still present, we should really have a call to cancel_work_sync(&rproc->crash_handler) as Dmitry wrote in the previous discussion [1]. Thanks, Stephan [1]: https://lore.kernel.org/all/ce24a2sgg4b6wymoxwgl2ve6np2nxn2wuxfqxfpmvqqrpvgouf@xihd6ziqwu4m/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() 2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang 2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang @ 2026-04-09 8:46 ` Jingyi Wang 2026-04-10 14:15 ` Stephan Gerhold 1 sibling, 1 reply; 5+ messages in thread From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw) To: Bjorn Andersson, Mathieu Poirier Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang For rproc that doing attach, glink_subdev_start() is called only when attach successfully. If rproc_report_crash() is called in the attach function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could be called and cause NULL pointer dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300 Mem abort info: ... pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] lr : glink_subdev_stop+0x1c/0x30 [qcom_common] ... Call trace: qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P) glink_subdev_stop+0x1c/0x30 [qcom_common] rproc_stop+0x58/0x17c rproc_trigger_recovery+0xb0/0x150 rproc_crash_handler_work+0xa4/0xc4 process_scheduled_works+0x18c/0x2d8 worker_thread+0x144/0x280 kthread+0x124/0x138 ret_from_fork+0x10/0x20 Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000) ---[ end trace 0000000000000000 ]--- Add NULL pointer check in the glink_subdev_stop() to make sure qcom_glink_smem_unregister() will not be called if glink_subdev_start() is not called. Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> --- drivers/remoteproc/qcom_common.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c index fd2b6824ad26..79d9d45e0b81 100644 --- a/drivers/remoteproc/qcom_common.c +++ b/drivers/remoteproc/qcom_common.c @@ -220,6 +220,9 @@ static void glink_subdev_stop(struct rproc_subdev *subdev, bool crashed) { struct qcom_rproc_glink *glink = to_glink_subdev(subdev); + if (!glink->edge) + return; + qcom_glink_smem_unregister(glink->edge); glink->edge = NULL; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() 2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang @ 2026-04-10 14:15 ` Stephan Gerhold 0 siblings, 0 replies; 5+ messages in thread From: Stephan Gerhold @ 2026-04-10 14:15 UTC (permalink / raw) To: Jingyi Wang Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote: > For rproc that doing attach, glink_subdev_start() is called only when > attach successfully. If rproc_report_crash() is called in the attach > function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could > be called and cause NULL pointer dereference: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300 > Mem abort info: > ... > pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] > lr : glink_subdev_stop+0x1c/0x30 [qcom_common] > ... > Call trace: > qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P) > glink_subdev_stop+0x1c/0x30 [qcom_common] > rproc_stop+0x58/0x17c > rproc_trigger_recovery+0xb0/0x150 > rproc_crash_handler_work+0xa4/0xc4 > process_scheduled_works+0x18c/0x2d8 > worker_thread+0x144/0x280 > kthread+0x124/0x138 > ret_from_fork+0x10/0x20 > Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000) > ---[ end trace 0000000000000000 ]--- > > Add NULL pointer check in the glink_subdev_stop() to make sure > qcom_glink_smem_unregister() will not be called if glink_subdev_start() > is not called. > You mention the actual root problem here: Why is glink_subdev_stop() called if glink_subdev_start() wasn't called? The call to rproc_start_subdevices() in __rproc_attach() makes sure that all subdevices are in consistent state when exiting the function (either prepared+started or stopped+unprepared). Only if all subdevices were started successfully, the rproc->state is changed to RPROC_ATTACHED. In your case, attaching the rproc failed so the rproc->state should be still RPROC_DETACHED. All subdevices should be stopped+unprepared. We shouldn't stop/unprepare any subdevices again in this state, they all might crash like glink does here. We know that subdevices are already stopped+unprepared in RPROC_DETACHED state, so I think you just need to skip rproc_stop_subdevices() and rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff below. Thanks, Stephan @@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool crashed) if (!rproc->ops->stop) return -EINVAL; - /* Stop any subdevices for the remote processor */ - rproc_stop_subdevices(rproc, crashed); + /* Stop any subdevices for the remote processor if it was attached */ + if (rproc->state != RPROC_DETACHED) + rproc_stop_subdevices(rproc, crashed); /* the installed resource table is no longer accessible */ ret = rproc_reset_rsc_table_on_stop(rproc); @@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool crashed) return ret; } - rproc_unprepare_subdevices(rproc); + if (rproc->state != RPROC_DETACHED) + rproc_unprepare_subdevices(rproc); rproc->state = RPROC_OFFLINE; ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-10 14:28 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang 2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang 2026-04-10 14:28 ` Stephan Gerhold 2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang 2026-04-10 14:15 ` Stephan Gerhold
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox