* [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases
@ 2026-04-09 8:46 Jingyi Wang
2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
0 siblings, 2 replies; 9+ messages in thread
From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw)
To: Bjorn Andersson, Mathieu Poirier
Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang,
linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang
Failure in rproc_attach() path can cause issues like NULL pointer
dereference when doing recovery and rproc_add() fail, improve the
robustness for rproc_attach fail cases.
Previous discussion:
https://lore.kernel.org/all/9bdc6b6d-ddf0-47af-b1ed-8d1e75bf30c2@oss.qualcomm.com/
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
---
Jingyi Wang (2):
remoteproc: core: Attach rproc asynchronously in rproc_add() path
remoteproc: qcom: Check glink->edge in glink_subdev_stop()
drivers/remoteproc/qcom_common.c | 3 +++
drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++--------
include/linux/remoteproc.h | 1 +
3 files changed, 16 insertions(+), 8 deletions(-)
---
base-commit: db7efce4ae23ad5e42f5f55428f529ff62b86fab
change-id: 20260409-rproc-attach-issue-2e3290d6d013
Best regards,
--
Jingyi Wang <jingyi.wang@oss.qualcomm.com>
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path 2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang @ 2026-04-09 8:46 ` Jingyi Wang 2026-04-10 14:28 ` Stephan Gerhold 2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang 1 sibling, 1 reply; 9+ messages in thread From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw) To: Bjorn Andersson, Mathieu Poirier Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang For rproc with state RPROC_DETACHED and auto_boot enabled, the attach callback will be called in the rproc_add()->rproc_trigger_auto_boot()-> rproc_boot() path, the failure in this path will cause the rproc_add() fail and the resource release, which will cause issue like rproc recovery or falling back to firmware load fail. Add attach_work for rproc and call it asynchronously in rproc_add() path like what rproc_start() do. Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> --- drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++-------- include/linux/remoteproc.h | 1 + 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index b087ed21858a..f02db1113fae 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context) release_firmware(fw); } +static void rproc_attach_work(struct work_struct *work) +{ + struct rproc *rproc = container_of(work, struct rproc, attach_work); + + rproc_boot(rproc); +} + static int rproc_trigger_auto_boot(struct rproc *rproc) { int ret; - /* - * Since the remote processor is in a detached state, it has already - * been booted by another entity. As such there is no point in waiting - * for a firmware image to be loaded, we can simply initiate the process - * of attaching to it immediately. - */ - if (rproc->state == RPROC_DETACHED) - return rproc_boot(rproc); + if (rproc->state == RPROC_DETACHED) { + schedule_work(&rproc->attach_work); + return 0; + } /* * We're initiating an asynchronous firmware loading, so we can @@ -2512,6 +2515,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name, INIT_LIST_HEAD(&rproc->dump_segments); INIT_WORK(&rproc->crash_handler, rproc_crash_handler_work); + INIT_WORK(&rproc->attach_work, rproc_attach_work); rproc->state = RPROC_OFFLINE; diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index b4795698d8c2..9b3f748e9eca 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -568,6 +568,7 @@ struct rproc { struct list_head subdevs; struct idr notifyids; int index; + struct work_struct attach_work; struct work_struct crash_handler; unsigned int crash_cnt; bool recovery_disabled; -- 2.34.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path 2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang @ 2026-04-10 14:28 ` Stephan Gerhold 2026-04-14 3:41 ` Jingyi Wang 0 siblings, 1 reply; 9+ messages in thread From: Stephan Gerhold @ 2026-04-10 14:28 UTC (permalink / raw) To: Jingyi Wang Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm, Bartosz Golaszewski, Dmitry Baryshkov +Cc Bartosz, Dmitry On Thu, Apr 09, 2026 at 01:46:21AM -0700, Jingyi Wang wrote: > For rproc with state RPROC_DETACHED and auto_boot enabled, the attach > callback will be called in the rproc_add()->rproc_trigger_auto_boot()-> > rproc_boot() path, the failure in this path will cause the rproc_add() > fail and the resource release, which will cause issue like rproc recovery > or falling back to firmware load fail. Add attach_work for rproc and call > it asynchronously in rproc_add() path like what rproc_start() do. > > Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> > --- > drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++-------- > include/linux/remoteproc.h | 1 + > 2 files changed, 13 insertions(+), 8 deletions(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index b087ed21858a..f02db1113fae 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context) > release_firmware(fw); > } > > +static void rproc_attach_work(struct work_struct *work) > +{ > + struct rproc *rproc = container_of(work, struct rproc, attach_work); > + > + rproc_boot(rproc); > +} > + > static int rproc_trigger_auto_boot(struct rproc *rproc) > { > int ret; > > - /* > - * Since the remote processor is in a detached state, it has already > - * been booted by another entity. As such there is no point in waiting > - * for a firmware image to be loaded, we can simply initiate the process > - * of attaching to it immediately. > - */ > - if (rproc->state == RPROC_DETACHED) > - return rproc_boot(rproc); > + if (rproc->state == RPROC_DETACHED) { > + schedule_work(&rproc->attach_work); > + return 0; > + } I think the change itself is reasonable to make "auto-attach" behavior consistent with "auto-boot". The commit message is a bit misleading though: - You're really doing two separate functional changes here: (1) Ignore the return value of rproc_boot() during auto-boot attach, to keep the remoteproc registered and available in sysfs even if attaching fails. (2) Run the rproc_boot() in the background using schedule_work(). [To improve boot performance? To work around some locking issues?] - The actual issue you are seeing sounds like a use-after-free in the remoteproc core error cleanup path. I think this one is still present, we should really have a call to cancel_work_sync(&rproc->crash_handler) as Dmitry wrote in the previous discussion [1]. Thanks, Stephan [1]: https://lore.kernel.org/all/ce24a2sgg4b6wymoxwgl2ve6np2nxn2wuxfqxfpmvqqrpvgouf@xihd6ziqwu4m/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path 2026-04-10 14:28 ` Stephan Gerhold @ 2026-04-14 3:41 ` Jingyi Wang 2026-04-14 8:13 ` Stephan Gerhold 0 siblings, 1 reply; 9+ messages in thread From: Jingyi Wang @ 2026-04-14 3:41 UTC (permalink / raw) To: Stephan Gerhold Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm, Bartosz Golaszewski, Dmitry Baryshkov On 4/10/2026 10:28 PM, Stephan Gerhold wrote: > +Cc Bartosz, Dmitry > > On Thu, Apr 09, 2026 at 01:46:21AM -0700, Jingyi Wang wrote: >> For rproc with state RPROC_DETACHED and auto_boot enabled, the attach >> callback will be called in the rproc_add()->rproc_trigger_auto_boot()-> >> rproc_boot() path, the failure in this path will cause the rproc_add() >> fail and the resource release, which will cause issue like rproc recovery >> or falling back to firmware load fail. Add attach_work for rproc and call >> it asynchronously in rproc_add() path like what rproc_start() do. >> >> Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> >> --- >> drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++-------- >> include/linux/remoteproc.h | 1 + >> 2 files changed, 13 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c >> index b087ed21858a..f02db1113fae 100644 >> --- a/drivers/remoteproc/remoteproc_core.c >> +++ b/drivers/remoteproc/remoteproc_core.c >> @@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context) >> release_firmware(fw); >> } >> >> +static void rproc_attach_work(struct work_struct *work) >> +{ >> + struct rproc *rproc = container_of(work, struct rproc, attach_work); >> + >> + rproc_boot(rproc); >> +} >> + >> static int rproc_trigger_auto_boot(struct rproc *rproc) >> { >> int ret; >> >> - /* >> - * Since the remote processor is in a detached state, it has already >> - * been booted by another entity. As such there is no point in waiting >> - * for a firmware image to be loaded, we can simply initiate the process >> - * of attaching to it immediately. >> - */ >> - if (rproc->state == RPROC_DETACHED) >> - return rproc_boot(rproc); >> + if (rproc->state == RPROC_DETACHED) { >> + schedule_work(&rproc->attach_work); >> + return 0; >> + } > > I think the change itself is reasonable to make "auto-attach" behavior > consistent with "auto-boot". The commit message is a bit misleading > though: > > - You're really doing two separate functional changes here: > > (1) Ignore the return value of rproc_boot() during auto-boot attach, > to keep the remoteproc registered and available in sysfs even if > attaching fails. > (2) Run the rproc_boot() in the background using schedule_work(). > [To improve boot performance? To work around some locking issues?] > > - The actual issue you are seeing sounds like a use-after-free in the > remoteproc core error cleanup path. I think this one is still > present, we should really have a call to > cancel_work_sync(&rproc->crash_handler) as Dmitry wrote in the > previous discussion [1]. > > Thanks, > Stephan > > [1]: https://lore.kernel.org/all/ce24a2sgg4b6wymoxwgl2ve6np2nxn2wuxfqxfpmvqqrpvgouf@xihd6ziqwu4m/ Hi Stephan, Exactly as you say, what this change do is allowing rproc_attach return false. It should be okay to keep this change and describe it more clear in commit msg in next version? And the use-after-free issue is what we want to resolve in the patch2 in this series, I think cancel_work_sync() is a reasonable change but it cannot resolve this issue as the worker could be executing when we call this(and this is what it behaves when I did local test) and the use-after-free issue still exists. Shall we send a separate patch for this cancel_work_sync? Thanks, Jingyi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path 2026-04-14 3:41 ` Jingyi Wang @ 2026-04-14 8:13 ` Stephan Gerhold 0 siblings, 0 replies; 9+ messages in thread From: Stephan Gerhold @ 2026-04-14 8:13 UTC (permalink / raw) To: Jingyi Wang Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm, Bartosz Golaszewski, Dmitry Baryshkov On Tue, Apr 14, 2026 at 11:41:39AM +0800, Jingyi Wang wrote: > > > On 4/10/2026 10:28 PM, Stephan Gerhold wrote: > > +Cc Bartosz, Dmitry > > > > On Thu, Apr 09, 2026 at 01:46:21AM -0700, Jingyi Wang wrote: > > > For rproc with state RPROC_DETACHED and auto_boot enabled, the attach > > > callback will be called in the rproc_add()->rproc_trigger_auto_boot()-> > > > rproc_boot() path, the failure in this path will cause the rproc_add() > > > fail and the resource release, which will cause issue like rproc recovery > > > or falling back to firmware load fail. Add attach_work for rproc and call > > > it asynchronously in rproc_add() path like what rproc_start() do. > > > > > > Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> > > > --- > > > drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++-------- > > > include/linux/remoteproc.h | 1 + > > > 2 files changed, 13 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > > > index b087ed21858a..f02db1113fae 100644 > > > --- a/drivers/remoteproc/remoteproc_core.c > > > +++ b/drivers/remoteproc/remoteproc_core.c > > > @@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context) > > > release_firmware(fw); > > > } > > > +static void rproc_attach_work(struct work_struct *work) > > > +{ > > > + struct rproc *rproc = container_of(work, struct rproc, attach_work); > > > + > > > + rproc_boot(rproc); > > > +} > > > + > > > static int rproc_trigger_auto_boot(struct rproc *rproc) > > > { > > > int ret; > > > - /* > > > - * Since the remote processor is in a detached state, it has already > > > - * been booted by another entity. As such there is no point in waiting > > > - * for a firmware image to be loaded, we can simply initiate the process > > > - * of attaching to it immediately. > > > - */ > > > - if (rproc->state == RPROC_DETACHED) > > > - return rproc_boot(rproc); > > > + if (rproc->state == RPROC_DETACHED) { > > > + schedule_work(&rproc->attach_work); > > > + return 0; > > > + } > > > > I think the change itself is reasonable to make "auto-attach" behavior > > consistent with "auto-boot". The commit message is a bit misleading > > though: > > > > - You're really doing two separate functional changes here: > > > > (1) Ignore the return value of rproc_boot() during auto-boot attach, > > to keep the remoteproc registered and available in sysfs even if > > attaching fails. > > (2) Run the rproc_boot() in the background using schedule_work(). > > [To improve boot performance? To work around some locking issues?] > > > > - The actual issue you are seeing sounds like a use-after-free in the > > remoteproc core error cleanup path. I think this one is still > > present, we should really have a call to > > cancel_work_sync(&rproc->crash_handler) as Dmitry wrote in the > > previous discussion [1]. > > > > Thanks, > > Stephan > > > > [1]: https://lore.kernel.org/all/ce24a2sgg4b6wymoxwgl2ve6np2nxn2wuxfqxfpmvqqrpvgouf@xihd6ziqwu4m/ > > Hi Stephan, > > Exactly as you say, what this change do is allowing rproc_attach return false. > It should be okay to keep this change and describe it more clear in commit msg > in next version? > That's fine for me. > And the use-after-free issue is what we want to resolve in the patch2 > in this series, I think cancel_work_sync() is a reasonable change > but it cannot resolve this issue as the worker could be executing when > we call this(and this is what it behaves when I did local test) and > the use-after-free issue still exists. Shall we send a separate patch > for this cancel_work_sync? > cancel_work_sync() should wait until the worker execution has finished. If you call it before freeing the resources (= deleting the remoteproc), I would expect it should work as expected. It makes sense to have separate patches for this, one is about fixing the use-after-free issue, the other is more about the behavior when the initial auto-boot fails. Thanks, Stephan ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() 2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang 2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang @ 2026-04-09 8:46 ` Jingyi Wang 2026-04-10 14:15 ` Stephan Gerhold 1 sibling, 1 reply; 9+ messages in thread From: Jingyi Wang @ 2026-04-09 8:46 UTC (permalink / raw) To: Bjorn Andersson, Mathieu Poirier Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang For rproc that doing attach, glink_subdev_start() is called only when attach successfully. If rproc_report_crash() is called in the attach function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could be called and cause NULL pointer dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300 Mem abort info: ... pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] lr : glink_subdev_stop+0x1c/0x30 [qcom_common] ... Call trace: qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P) glink_subdev_stop+0x1c/0x30 [qcom_common] rproc_stop+0x58/0x17c rproc_trigger_recovery+0xb0/0x150 rproc_crash_handler_work+0xa4/0xc4 process_scheduled_works+0x18c/0x2d8 worker_thread+0x144/0x280 kthread+0x124/0x138 ret_from_fork+0x10/0x20 Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000) ---[ end trace 0000000000000000 ]--- Add NULL pointer check in the glink_subdev_stop() to make sure qcom_glink_smem_unregister() will not be called if glink_subdev_start() is not called. Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> --- drivers/remoteproc/qcom_common.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c index fd2b6824ad26..79d9d45e0b81 100644 --- a/drivers/remoteproc/qcom_common.c +++ b/drivers/remoteproc/qcom_common.c @@ -220,6 +220,9 @@ static void glink_subdev_stop(struct rproc_subdev *subdev, bool crashed) { struct qcom_rproc_glink *glink = to_glink_subdev(subdev); + if (!glink->edge) + return; + qcom_glink_smem_unregister(glink->edge); glink->edge = NULL; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() 2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang @ 2026-04-10 14:15 ` Stephan Gerhold 2026-04-14 3:23 ` Jingyi Wang 0 siblings, 1 reply; 9+ messages in thread From: Stephan Gerhold @ 2026-04-10 14:15 UTC (permalink / raw) To: Jingyi Wang Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote: > For rproc that doing attach, glink_subdev_start() is called only when > attach successfully. If rproc_report_crash() is called in the attach > function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could > be called and cause NULL pointer dereference: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300 > Mem abort info: > ... > pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] > lr : glink_subdev_stop+0x1c/0x30 [qcom_common] > ... > Call trace: > qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P) > glink_subdev_stop+0x1c/0x30 [qcom_common] > rproc_stop+0x58/0x17c > rproc_trigger_recovery+0xb0/0x150 > rproc_crash_handler_work+0xa4/0xc4 > process_scheduled_works+0x18c/0x2d8 > worker_thread+0x144/0x280 > kthread+0x124/0x138 > ret_from_fork+0x10/0x20 > Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000) > ---[ end trace 0000000000000000 ]--- > > Add NULL pointer check in the glink_subdev_stop() to make sure > qcom_glink_smem_unregister() will not be called if glink_subdev_start() > is not called. > You mention the actual root problem here: Why is glink_subdev_stop() called if glink_subdev_start() wasn't called? The call to rproc_start_subdevices() in __rproc_attach() makes sure that all subdevices are in consistent state when exiting the function (either prepared+started or stopped+unprepared). Only if all subdevices were started successfully, the rproc->state is changed to RPROC_ATTACHED. In your case, attaching the rproc failed so the rproc->state should be still RPROC_DETACHED. All subdevices should be stopped+unprepared. We shouldn't stop/unprepare any subdevices again in this state, they all might crash like glink does here. We know that subdevices are already stopped+unprepared in RPROC_DETACHED state, so I think you just need to skip rproc_stop_subdevices() and rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff below. Thanks, Stephan @@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool crashed) if (!rproc->ops->stop) return -EINVAL; - /* Stop any subdevices for the remote processor */ - rproc_stop_subdevices(rproc, crashed); + /* Stop any subdevices for the remote processor if it was attached */ + if (rproc->state != RPROC_DETACHED) + rproc_stop_subdevices(rproc, crashed); /* the installed resource table is no longer accessible */ ret = rproc_reset_rsc_table_on_stop(rproc); @@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool crashed) return ret; } - rproc_unprepare_subdevices(rproc); + if (rproc->state != RPROC_DETACHED) + rproc_unprepare_subdevices(rproc); rproc->state = RPROC_OFFLINE; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() 2026-04-10 14:15 ` Stephan Gerhold @ 2026-04-14 3:23 ` Jingyi Wang 2026-04-14 8:27 ` Stephan Gerhold 0 siblings, 1 reply; 9+ messages in thread From: Jingyi Wang @ 2026-04-14 3:23 UTC (permalink / raw) To: Stephan Gerhold Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm On 4/10/2026 10:15 PM, Stephan Gerhold wrote: > On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote: >> For rproc that doing attach, glink_subdev_start() is called only when >> attach successfully. If rproc_report_crash() is called in the attach >> function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could >> be called and cause NULL pointer dereference: >> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300 >> Mem abort info: >> ... >> pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] >> lr : glink_subdev_stop+0x1c/0x30 [qcom_common] >> ... >> Call trace: >> qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P) >> glink_subdev_stop+0x1c/0x30 [qcom_common] >> rproc_stop+0x58/0x17c >> rproc_trigger_recovery+0xb0/0x150 >> rproc_crash_handler_work+0xa4/0xc4 >> process_scheduled_works+0x18c/0x2d8 >> worker_thread+0x144/0x280 >> kthread+0x124/0x138 >> ret_from_fork+0x10/0x20 >> Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000) >> ---[ end trace 0000000000000000 ]--- >> >> Add NULL pointer check in the glink_subdev_stop() to make sure >> qcom_glink_smem_unregister() will not be called if glink_subdev_start() >> is not called. >> > > You mention the actual root problem here: Why is glink_subdev_stop() > called if glink_subdev_start() wasn't called? > > The call to rproc_start_subdevices() in __rproc_attach() makes sure that > all subdevices are in consistent state when exiting the function (either > prepared+started or stopped+unprepared). Only if all subdevices were > started successfully, the rproc->state is changed to RPROC_ATTACHED. > > In your case, attaching the rproc failed so the rproc->state should be > still RPROC_DETACHED. All subdevices should be stopped+unprepared. We > shouldn't stop/unprepare any subdevices again in this state, they all > might crash like glink does here. > > We know that subdevices are already stopped+unprepared in RPROC_DETACHED > state, so I think you just need to skip rproc_stop_subdevices() and > rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff > below. > > Thanks, > Stephan > > @@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool crashed) > if (!rproc->ops->stop) > return -EINVAL; > > - /* Stop any subdevices for the remote processor */ > - rproc_stop_subdevices(rproc, crashed); > + /* Stop any subdevices for the remote processor if it was attached */ > + if (rproc->state != RPROC_DETACHED) > + rproc_stop_subdevices(rproc, crashed); > > /* the installed resource table is no longer accessible */ > ret = rproc_reset_rsc_table_on_stop(rproc); > @@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool crashed) > return ret; > } > > - rproc_unprepare_subdevices(rproc); > + if (rproc->state != RPROC_DETACHED) > + rproc_unprepare_subdevices(rproc); > > rproc->state = RPROC_OFFLINE; > Hi Stephen, In this case, rproc_crash_handler_work()->rproc_trigger_recovery()-> rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() is called, "rproc->state = RPROC_CRASHED" is set in the rproc_crash_handler_work before rproc_boot_recovery is called, so checking RPROC_DETACHED can not work for this case. Thanks, Jingyi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() 2026-04-14 3:23 ` Jingyi Wang @ 2026-04-14 8:27 ` Stephan Gerhold 0 siblings, 0 replies; 9+ messages in thread From: Stephan Gerhold @ 2026-04-14 8:27 UTC (permalink / raw) To: Jingyi Wang Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang, linux-remoteproc, linux-kernel, linux-arm-msm On Tue, Apr 14, 2026 at 11:23:50AM +0800, Jingyi Wang wrote: > On 4/10/2026 10:15 PM, Stephan Gerhold wrote: > > On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote: > > > For rproc that doing attach, glink_subdev_start() is called only when > > > attach successfully. If rproc_report_crash() is called in the attach > > > function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could > > > be called and cause NULL pointer dereference: > > > > > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300 > > > Mem abort info: > > > ... > > > pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] > > > lr : glink_subdev_stop+0x1c/0x30 [qcom_common] > > > ... > > > Call trace: > > > qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P) > > > glink_subdev_stop+0x1c/0x30 [qcom_common] > > > rproc_stop+0x58/0x17c > > > rproc_trigger_recovery+0xb0/0x150 > > > rproc_crash_handler_work+0xa4/0xc4 > > > process_scheduled_works+0x18c/0x2d8 > > > worker_thread+0x144/0x280 > > > kthread+0x124/0x138 > > > ret_from_fork+0x10/0x20 > > > Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000) > > > ---[ end trace 0000000000000000 ]--- > > > > > > Add NULL pointer check in the glink_subdev_stop() to make sure > > > qcom_glink_smem_unregister() will not be called if glink_subdev_start() > > > is not called. > > > > > > > You mention the actual root problem here: Why is glink_subdev_stop() > > called if glink_subdev_start() wasn't called? > > > > The call to rproc_start_subdevices() in __rproc_attach() makes sure that > > all subdevices are in consistent state when exiting the function (either > > prepared+started or stopped+unprepared). Only if all subdevices were > > started successfully, the rproc->state is changed to RPROC_ATTACHED. > > > > In your case, attaching the rproc failed so the rproc->state should be > > still RPROC_DETACHED. All subdevices should be stopped+unprepared. We > > shouldn't stop/unprepare any subdevices again in this state, they all > > might crash like glink does here. > > > > We know that subdevices are already stopped+unprepared in RPROC_DETACHED > > state, so I think you just need to skip rproc_stop_subdevices() and > > rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff > > below. > > > > @@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool crashed) > > if (!rproc->ops->stop) > > return -EINVAL; > > - /* Stop any subdevices for the remote processor */ > > - rproc_stop_subdevices(rproc, crashed); > > + /* Stop any subdevices for the remote processor if it was attached */ > > + if (rproc->state != RPROC_DETACHED) > > + rproc_stop_subdevices(rproc, crashed); > > /* the installed resource table is no longer accessible */ > > ret = rproc_reset_rsc_table_on_stop(rproc); > > @@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool crashed) > > return ret; > > } > > - rproc_unprepare_subdevices(rproc); > > + if (rproc->state != RPROC_DETACHED) > > + rproc_unprepare_subdevices(rproc); > > rproc->state = RPROC_OFFLINE; > > In this case, rproc_crash_handler_work()->rproc_trigger_recovery()-> > rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() is called, > "rproc->state = RPROC_CRASHED" is set in the rproc_crash_handler_work > before rproc_boot_recovery is called, so checking RPROC_DETACHED can > not work for this case. > You're right, I forgot about that. I think we need a more generic solution for this though. rproc_stop_subdevices() should not be called without a prior call to rproc_start_subdevices(). I think there are a couple of options for this: - Add a bool "subdevs_started" to struct rproc and manage that separately from the rproc->state. - Track the rproc state before the crash separately (something like rproc->state_before_crash) and check that in the stop path. - Add a new state RPROC_CRASHED_DETACHED to make sure the states are unique. - ... Does the same issue also exist in qcom_pas_stop() of "[PATCH v5 4/5] remoteproc: qcom: pas: Add late attach support for subsystems" [1]? There you check for pas->rproc->state != RPROC_ATTACHED, wouldn't this also fail for the RPROC_CRASHED case? Thanks, Stephan [1]: https://lore.kernel.org/linux-arm-msm/20260409-knp-soccp-v5-4-805a492124da@oss.qualcomm.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-14 8:27 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-09 8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang 2026-04-09 8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang 2026-04-10 14:28 ` Stephan Gerhold 2026-04-14 3:41 ` Jingyi Wang 2026-04-14 8:13 ` Stephan Gerhold 2026-04-09 8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang 2026-04-10 14:15 ` Stephan Gerhold 2026-04-14 3:23 ` Jingyi Wang 2026-04-14 8:27 ` Stephan Gerhold
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox