All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephan Gerhold <stephan.gerhold@linaro.org>
To: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Cc: Bjorn Andersson <andersson@kernel.org>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	aiqun.yu@oss.qualcomm.com, tingwei.zhang@oss.qualcomm.com,
	trilok.soni@oss.qualcomm.com, yijie.yang@oss.qualcomm.com,
	linux-remoteproc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-msm@vger.kernel.org
Subject: Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop()
Date: Tue, 21 Apr 2026 15:45:51 +0200	[thread overview]
Message-ID: <aed_jyMEuWLlQRDv@linaro.org> (raw)
In-Reply-To: <e9720deb-d46c-4556-b69d-1dec00cc7892@oss.qualcomm.com>

On Thu, Apr 16, 2026 at 10:52:17AM +0800, Jingyi Wang wrote:
> On 4/14/2026 4:27 PM, Stephan Gerhold wrote:
> > On Tue, Apr 14, 2026 at 11:23:50AM +0800, Jingyi Wang wrote:
> > > On 4/10/2026 10:15 PM, Stephan Gerhold wrote:
> > > > On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote:
> > > > > For rproc that doing attach, glink_subdev_start() is called only when
> > > > > attach successfully. If rproc_report_crash() is called in the attach
> > > > > function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could
> > > > > be called and cause NULL pointer dereference:
> > > > > 
> > > > >    Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300
> > > > >    Mem abort info:
> > > > >    ...
> > > > >    pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem]
> > > > >    lr : glink_subdev_stop+0x1c/0x30 [qcom_common]
> > > > >    ...
> > > > >    Call trace:
> > > > >     qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P)
> > > > >     glink_subdev_stop+0x1c/0x30 [qcom_common]
> > > > >     rproc_stop+0x58/0x17c
> > > > >     rproc_trigger_recovery+0xb0/0x150
> > > > >     rproc_crash_handler_work+0xa4/0xc4
> > > > >     process_scheduled_works+0x18c/0x2d8
> > > > >     worker_thread+0x144/0x280
> > > > >     kthread+0x124/0x138
> > > > >     ret_from_fork+0x10/0x20
> > > > >    Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000)
> > > > >    ---[ end trace 0000000000000000 ]---
> > > > > 
> > > > > Add NULL pointer check in the glink_subdev_stop() to make sure
> > > > > qcom_glink_smem_unregister() will not be called if glink_subdev_start()
> > > > > is not called.
> > > > > 
> > > > 
> > > > You mention the actual root problem here: Why is glink_subdev_stop()
> > > > called if glink_subdev_start() wasn't called?
> > > > 
> > > > The call to rproc_start_subdevices() in __rproc_attach() makes sure that
> > > > all subdevices are in consistent state when exiting the function (either
> > > > prepared+started or stopped+unprepared). Only if all subdevices were
> > > > started successfully, the rproc->state is changed to RPROC_ATTACHED.
> > > > 
> > > > In your case, attaching the rproc failed so the rproc->state should be
> > > > still RPROC_DETACHED. All subdevices should be stopped+unprepared. We
> > > > shouldn't stop/unprepare any subdevices again in this state, they all
> > > > might crash like glink does here.
> > > > 
> > > > We know that subdevices are already stopped+unprepared in RPROC_DETACHED
> > > > state, so I think you just need to skip rproc_stop_subdevices() and
> > > > rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff
> > > > below.
> > > > 
> > > > @@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
> > > >    	if (!rproc->ops->stop)
> > > >    		return -EINVAL;
> > > > -	/* Stop any subdevices for the remote processor */
> > > > -	rproc_stop_subdevices(rproc, crashed);
> > > > +	/* Stop any subdevices for the remote processor if it was attached */
> > > > +	if (rproc->state != RPROC_DETACHED)
> > > > +		rproc_stop_subdevices(rproc, crashed);
> > > >    	/* the installed resource table is no longer accessible */
> > > >    	ret = rproc_reset_rsc_table_on_stop(rproc);
> > > > @@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
> > > >    		return ret;
> > > >    	}
> > > > -	rproc_unprepare_subdevices(rproc);
> > > > +	if (rproc->state != RPROC_DETACHED)
> > > > +		rproc_unprepare_subdevices(rproc);
> > > >    	rproc->state = RPROC_OFFLINE;
> > > 
> > > In this case, rproc_crash_handler_work()->rproc_trigger_recovery()->
> > > rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() is called,
> > > "rproc->state = RPROC_CRASHED" is set in the rproc_crash_handler_work
> > > before rproc_boot_recovery is called, so checking RPROC_DETACHED can
> > > not work for this case.
> > > 
> > 
> > You're right, I forgot about that. I think we need a more generic
> > solution for this though. rproc_stop_subdevices() should not be called
> > without a prior call to rproc_start_subdevices().
> > 
> > I think there are a couple of options for this:
> > 
> >   - Add a bool "subdevs_started" to struct rproc and manage that
> >     separately from the rproc->state.
> > 
> >   - Track the rproc state before the crash separately (something like
> >     rproc->state_before_crash) and check that in the stop path.
> > 
> >   - Add a new state RPROC_CRASHED_DETACHED to make sure the states are
> >     unique.
> > 
> >   - ...
> > 
> 
> Sure, I think a bool like subdevs_started will be better for maintain?
> 
> > Does the same issue also exist in qcom_pas_stop() of "[PATCH v5 4/5]
> > remoteproc: qcom: pas: Add late attach support for subsystems" [1]?
> > There you check for pas->rproc->state != RPROC_ATTACHED, wouldn't this
> > also fail for the RPROC_CRASHED case?
> > 
> 
> I tested calling rproc_report_crash directly during qcom_pas_attach but
> did not see issue, handover_issued is set only if attach is success
> so "handover = qcom_q6v5_unprepare(&pas->q6v5);" will return false and
> "qcom_pas_handover(&pas->q6v5);" will not be called.
> 

Hm, as you mention, if you call rproc_report_crash() during
qcom_pas_attach() then handover_issued does not get set (so it's still
set to false). But qcom_q6v5_unprepare() returns !q6v5->handover_issued
(handover_issued negated), so !false -> true. So I think exactly the
opposite will happen and qcom_pas_handover(&pas->q6v5); will get called?
It should not be called in that case, because this will break the
reference counting for the regulator/clock resources.

In addition, even the disable_irq(q6v5->handover_irq); inside
qcom_q6v5_unprepare() is problematic. enable_irq()/disable_irq() are
also reference-counted, so disable_irq() should not be called without a
prior enable_irq() or you end up having the IRQ permanently disabled.
See e.g. commit 110be46f5afe2 ("remoteproc: qcom: q6v5: Avoid disabling
handover IRQ twice").

Thanks,
Stephan

  reply	other threads:[~2026-04-21 13:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09  8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang
2026-04-09  8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
2026-04-10 14:28   ` Stephan Gerhold
2026-04-14  3:41     ` Jingyi Wang
2026-04-14  8:13       ` Stephan Gerhold
2026-04-09  8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
2026-04-10 14:15   ` Stephan Gerhold
2026-04-14  3:23     ` Jingyi Wang
2026-04-14  8:27       ` Stephan Gerhold
2026-04-16  2:52         ` Jingyi Wang
2026-04-21 13:45           ` Stephan Gerhold [this message]
2026-04-28  4:19             ` Jingyi Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aed_jyMEuWLlQRDv@linaro.org \
    --to=stephan.gerhold@linaro.org \
    --cc=aiqun.yu@oss.qualcomm.com \
    --cc=andersson@kernel.org \
    --cc=jingyi.wang@oss.qualcomm.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-remoteproc@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=tingwei.zhang@oss.qualcomm.com \
    --cc=trilok.soni@oss.qualcomm.com \
    --cc=yijie.yang@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.