[PATCH 0/2] remoteproc: improve robustness for rproc

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases
@ 2026-04-09  8:46 Jingyi Wang
  2026-04-09  8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
  2026-04-09  8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
  0 siblings, 2 replies; 5+ messages in thread
From: Jingyi Wang @ 2026-04-09  8:46 UTC (permalink / raw)
  To: Bjorn Andersson, Mathieu Poirier
  Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang,
	linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang

Failure in rproc_attach() path can cause issues like NULL pointer
dereference when doing recovery and rproc_add() fail, improve the
robustness for rproc_attach fail cases.

Previous discussion:
https://lore.kernel.org/all/9bdc6b6d-ddf0-47af-b1ed-8d1e75bf30c2@oss.qualcomm.com/

Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
---
Jingyi Wang (2):
      remoteproc: core: Attach rproc asynchronously in rproc_add() path
      remoteproc: qcom: Check glink->edge in glink_subdev_stop()

 drivers/remoteproc/qcom_common.c     |  3 +++
 drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++--------
 include/linux/remoteproc.h           |  1 +
 3 files changed, 16 insertions(+), 8 deletions(-)
---
base-commit: db7efce4ae23ad5e42f5f55428f529ff62b86fab
change-id: 20260409-rproc-attach-issue-2e3290d6d013

Best regards,
-- 
Jingyi Wang <jingyi.wang@oss.qualcomm.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path
  2026-04-09  8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang
@ 2026-04-09  8:46 ` Jingyi Wang
  2026-04-10 14:28   ` Stephan Gerhold
  2026-04-09  8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Jingyi Wang @ 2026-04-09  8:46 UTC (permalink / raw)
  To: Bjorn Andersson, Mathieu Poirier
  Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang,
	linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang

For rproc with state RPROC_DETACHED and auto_boot enabled, the attach
callback will be called in the rproc_add()->rproc_trigger_auto_boot()->
rproc_boot() path, the failure in this path will cause the rproc_add()
fail and the resource release, which will cause issue like rproc recovery
or falling back to firmware load fail. Add attach_work for rproc and call
it asynchronously in rproc_add() path like what rproc_start() do.

Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
---
 drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++--------
 include/linux/remoteproc.h           |  1 +
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index b087ed21858a..f02db1113fae 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context)
 	release_firmware(fw);
 }
 
+static void rproc_attach_work(struct work_struct *work)
+{
+	struct rproc *rproc = container_of(work, struct rproc, attach_work);
+
+	rproc_boot(rproc);
+}
+
 static int rproc_trigger_auto_boot(struct rproc *rproc)
 {
 	int ret;
 
-	/*
-	 * Since the remote processor is in a detached state, it has already
-	 * been booted by another entity.  As such there is no point in waiting
-	 * for a firmware image to be loaded, we can simply initiate the process
-	 * of attaching to it immediately.
-	 */
-	if (rproc->state == RPROC_DETACHED)
-		return rproc_boot(rproc);
+	if (rproc->state == RPROC_DETACHED) {
+		schedule_work(&rproc->attach_work);
+		return 0;
+	}
 
 	/*
 	 * We're initiating an asynchronous firmware loading, so we can
@@ -2512,6 +2515,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name,
 	INIT_LIST_HEAD(&rproc->dump_segments);
 
 	INIT_WORK(&rproc->crash_handler, rproc_crash_handler_work);
+	INIT_WORK(&rproc->attach_work, rproc_attach_work);
 
 	rproc->state = RPROC_OFFLINE;
 
diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index b4795698d8c2..9b3f748e9eca 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -568,6 +568,7 @@ struct rproc {
 	struct list_head subdevs;
 	struct idr notifyids;
 	int index;
+	struct work_struct attach_work;
 	struct work_struct crash_handler;
 	unsigned int crash_cnt;
 	bool recovery_disabled;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path
  2026-04-09  8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
@ 2026-04-10 14:28   ` Stephan Gerhold
  0 siblings, 0 replies; 5+ messages in thread
From: Stephan Gerhold @ 2026-04-10 14:28 UTC (permalink / raw)
  To: Jingyi Wang
  Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang,
	trilok.soni, yijie.yang, linux-remoteproc, linux-kernel,
	linux-arm-msm, Bartosz Golaszewski, Dmitry Baryshkov

+Cc Bartosz, Dmitry

On Thu, Apr 09, 2026 at 01:46:21AM -0700, Jingyi Wang wrote:
> For rproc with state RPROC_DETACHED and auto_boot enabled, the attach
> callback will be called in the rproc_add()->rproc_trigger_auto_boot()->
> rproc_boot() path, the failure in this path will cause the rproc_add()
> fail and the resource release, which will cause issue like rproc recovery
> or falling back to firmware load fail. Add attach_work for rproc and call
> it asynchronously in rproc_add() path like what rproc_start() do.
> 
> Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
> ---
>  drivers/remoteproc/remoteproc_core.c | 20 ++++++++++++--------
>  include/linux/remoteproc.h           |  1 +
>  2 files changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index b087ed21858a..f02db1113fae 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1673,18 +1673,21 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context)
>  	release_firmware(fw);
>  }
>  
> +static void rproc_attach_work(struct work_struct *work)
> +{
> +	struct rproc *rproc = container_of(work, struct rproc, attach_work);
> +
> +	rproc_boot(rproc);
> +}
> +
>  static int rproc_trigger_auto_boot(struct rproc *rproc)
>  {
>  	int ret;
>  
> -	/*
> -	 * Since the remote processor is in a detached state, it has already
> -	 * been booted by another entity.  As such there is no point in waiting
> -	 * for a firmware image to be loaded, we can simply initiate the process
> -	 * of attaching to it immediately.
> -	 */
> -	if (rproc->state == RPROC_DETACHED)
> -		return rproc_boot(rproc);
> +	if (rproc->state == RPROC_DETACHED) {
> +		schedule_work(&rproc->attach_work);
> +		return 0;
> +	}

I think the change itself is reasonable to make "auto-attach" behavior
consistent with "auto-boot". The commit message is a bit misleading
though:

 - You're really doing two separate functional changes here:

   (1) Ignore the return value of rproc_boot() during auto-boot attach,
       to keep the remoteproc registered and available in sysfs even if
       attaching fails.
   (2) Run the rproc_boot() in the background using schedule_work().
       [To improve boot performance? To work around some locking issues?]

 - The actual issue you are seeing sounds like a use-after-free in the
   remoteproc core error cleanup path. I think this one is still
   present, we should really have a call to
   cancel_work_sync(&rproc->crash_handler) as Dmitry wrote in the
   previous discussion [1].

Thanks,
Stephan

[1]: https://lore.kernel.org/all/ce24a2sgg4b6wymoxwgl2ve6np2nxn2wuxfqxfpmvqqrpvgouf@xihd6ziqwu4m/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop()
  2026-04-09  8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang
  2026-04-09  8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
@ 2026-04-09  8:46 ` Jingyi Wang
  2026-04-10 14:15   ` Stephan Gerhold
  1 sibling, 1 reply; 5+ messages in thread
From: Jingyi Wang @ 2026-04-09  8:46 UTC (permalink / raw)
  To: Bjorn Andersson, Mathieu Poirier
  Cc: aiqun.yu, tingwei.zhang, trilok.soni, yijie.yang,
	linux-remoteproc, linux-kernel, linux-arm-msm, Jingyi Wang

For rproc that doing attach, glink_subdev_start() is called only when
attach successfully. If rproc_report_crash() is called in the attach
function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could
be called and cause NULL pointer dereference:

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300
 Mem abort info:
 ...
 pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem]
 lr : glink_subdev_stop+0x1c/0x30 [qcom_common]
 ...
 Call trace:
  qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P)
  glink_subdev_stop+0x1c/0x30 [qcom_common]
  rproc_stop+0x58/0x17c
  rproc_trigger_recovery+0xb0/0x150
  rproc_crash_handler_work+0xa4/0xc4
  process_scheduled_works+0x18c/0x2d8
  worker_thread+0x144/0x280
  kthread+0x124/0x138
  ret_from_fork+0x10/0x20
 Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000)
 ---[ end trace 0000000000000000 ]---

Add NULL pointer check in the glink_subdev_stop() to make sure
qcom_glink_smem_unregister() will not be called if glink_subdev_start()
is not called.

Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
---
 drivers/remoteproc/qcom_common.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index fd2b6824ad26..79d9d45e0b81 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -220,6 +220,9 @@ static void glink_subdev_stop(struct rproc_subdev *subdev, bool crashed)
 {
 	struct qcom_rproc_glink *glink = to_glink_subdev(subdev);
 
+	if (!glink->edge)
+		return;
+
 	qcom_glink_smem_unregister(glink->edge);
 	glink->edge = NULL;
 }

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop()
  2026-04-09  8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
@ 2026-04-10 14:15   ` Stephan Gerhold
  0 siblings, 0 replies; 5+ messages in thread
From: Stephan Gerhold @ 2026-04-10 14:15 UTC (permalink / raw)
  To: Jingyi Wang
  Cc: Bjorn Andersson, Mathieu Poirier, aiqun.yu, tingwei.zhang,
	trilok.soni, yijie.yang, linux-remoteproc, linux-kernel,
	linux-arm-msm

On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote:
> For rproc that doing attach, glink_subdev_start() is called only when
> attach successfully. If rproc_report_crash() is called in the attach
> function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could
> be called and cause NULL pointer dereference:
> 
>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300
>  Mem abort info:
>  ...
>  pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem]
>  lr : glink_subdev_stop+0x1c/0x30 [qcom_common]
>  ...
>  Call trace:
>   qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P)
>   glink_subdev_stop+0x1c/0x30 [qcom_common]
>   rproc_stop+0x58/0x17c
>   rproc_trigger_recovery+0xb0/0x150
>   rproc_crash_handler_work+0xa4/0xc4
>   process_scheduled_works+0x18c/0x2d8
>   worker_thread+0x144/0x280
>   kthread+0x124/0x138
>   ret_from_fork+0x10/0x20
>  Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000)
>  ---[ end trace 0000000000000000 ]---
> 
> Add NULL pointer check in the glink_subdev_stop() to make sure
> qcom_glink_smem_unregister() will not be called if glink_subdev_start()
> is not called.
> 

You mention the actual root problem here: Why is glink_subdev_stop()
called if glink_subdev_start() wasn't called?

The call to rproc_start_subdevices() in __rproc_attach() makes sure that
all subdevices are in consistent state when exiting the function (either
prepared+started or stopped+unprepared). Only if all subdevices were
started successfully, the rproc->state is changed to RPROC_ATTACHED.

In your case, attaching the rproc failed so the rproc->state should be
still RPROC_DETACHED. All subdevices should be stopped+unprepared. We
shouldn't stop/unprepare any subdevices again in this state, they all
might crash like glink does here.

We know that subdevices are already stopped+unprepared in RPROC_DETACHED
state, so I think you just need to skip rproc_stop_subdevices() and
rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff
below.

Thanks,
Stephan

@@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
 	if (!rproc->ops->stop)
 		return -EINVAL;
 
-	/* Stop any subdevices for the remote processor */
-	rproc_stop_subdevices(rproc, crashed);
+	/* Stop any subdevices for the remote processor if it was attached */
+	if (rproc->state != RPROC_DETACHED)
+		rproc_stop_subdevices(rproc, crashed);
 
 	/* the installed resource table is no longer accessible */
 	ret = rproc_reset_rsc_table_on_stop(rproc);
@@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
 		return ret;
 	}
 
-	rproc_unprepare_subdevices(rproc);
+	if (rproc->state != RPROC_DETACHED)
+		rproc_unprepare_subdevices(rproc);
 
 	rproc->state = RPROC_OFFLINE;
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-10 14:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09  8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang
2026-04-09  8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
2026-04-10 14:28   ` Stephan Gerhold
2026-04-09  8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
2026-04-10 14:15   ` Stephan Gerhold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox