public inbox for linux-arm-msm@vger.kernel.org
 help / color / mirror / Atom feed
From: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
To: Stephan Gerhold <stephan.gerhold@linaro.org>
Cc: Bjorn Andersson <andersson@kernel.org>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	aiqun.yu@oss.qualcomm.com, tingwei.zhang@oss.qualcomm.com,
	trilok.soni@oss.qualcomm.com, yijie.yang@oss.qualcomm.com,
	linux-remoteproc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-msm@vger.kernel.org
Subject: Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop()
Date: Tue, 28 Apr 2026 12:19:32 +0800	[thread overview]
Message-ID: <f0d5d665-e960-4d32-899f-f80b186da649@oss.qualcomm.com> (raw)
In-Reply-To: <aed_jyMEuWLlQRDv@linaro.org>



On 4/21/2026 9:45 PM, Stephan Gerhold wrote:
> On Thu, Apr 16, 2026 at 10:52:17AM +0800, Jingyi Wang wrote:
>> On 4/14/2026 4:27 PM, Stephan Gerhold wrote:
>>> On Tue, Apr 14, 2026 at 11:23:50AM +0800, Jingyi Wang wrote:
>>>> On 4/10/2026 10:15 PM, Stephan Gerhold wrote:
>>>>> On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote:
>>>>>> For rproc that doing attach, glink_subdev_start() is called only when
>>>>>> attach successfully. If rproc_report_crash() is called in the attach
>>>>>> function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() could
>>>>>> be called and cause NULL pointer dereference:
>>>>>>
>>>>>>     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000300
>>>>>>     Mem abort info:
>>>>>>     ...
>>>>>>     pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem]
>>>>>>     lr : glink_subdev_stop+0x1c/0x30 [qcom_common]
>>>>>>     ...
>>>>>>     Call trace:
>>>>>>      qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P)
>>>>>>      glink_subdev_stop+0x1c/0x30 [qcom_common]
>>>>>>      rproc_stop+0x58/0x17c
>>>>>>      rproc_trigger_recovery+0xb0/0x150
>>>>>>      rproc_crash_handler_work+0xa4/0xc4
>>>>>>      process_scheduled_works+0x18c/0x2d8
>>>>>>      worker_thread+0x144/0x280
>>>>>>      kthread+0x124/0x138
>>>>>>      ret_from_fork+0x10/0x20
>>>>>>     Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000)
>>>>>>     ---[ end trace 0000000000000000 ]---
>>>>>>
>>>>>> Add NULL pointer check in the glink_subdev_stop() to make sure
>>>>>> qcom_glink_smem_unregister() will not be called if glink_subdev_start()
>>>>>> is not called.
>>>>>>
>>>>>
>>>>> You mention the actual root problem here: Why is glink_subdev_stop()
>>>>> called if glink_subdev_start() wasn't called?
>>>>>
>>>>> The call to rproc_start_subdevices() in __rproc_attach() makes sure that
>>>>> all subdevices are in consistent state when exiting the function (either
>>>>> prepared+started or stopped+unprepared). Only if all subdevices were
>>>>> started successfully, the rproc->state is changed to RPROC_ATTACHED.
>>>>>
>>>>> In your case, attaching the rproc failed so the rproc->state should be
>>>>> still RPROC_DETACHED. All subdevices should be stopped+unprepared. We
>>>>> shouldn't stop/unprepare any subdevices again in this state, they all
>>>>> might crash like glink does here.
>>>>>
>>>>> We know that subdevices are already stopped+unprepared in RPROC_DETACHED
>>>>> state, so I think you just need to skip rproc_stop_subdevices() and
>>>>> rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff
>>>>> below.
>>>>>
>>>>> @@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
>>>>>     	if (!rproc->ops->stop)
>>>>>     		return -EINVAL;
>>>>> -	/* Stop any subdevices for the remote processor */
>>>>> -	rproc_stop_subdevices(rproc, crashed);
>>>>> +	/* Stop any subdevices for the remote processor if it was attached */
>>>>> +	if (rproc->state != RPROC_DETACHED)
>>>>> +		rproc_stop_subdevices(rproc, crashed);
>>>>>     	/* the installed resource table is no longer accessible */
>>>>>     	ret = rproc_reset_rsc_table_on_stop(rproc);
>>>>> @@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
>>>>>     		return ret;
>>>>>     	}
>>>>> -	rproc_unprepare_subdevices(rproc);
>>>>> +	if (rproc->state != RPROC_DETACHED)
>>>>> +		rproc_unprepare_subdevices(rproc);
>>>>>     	rproc->state = RPROC_OFFLINE;
>>>>
>>>> In this case, rproc_crash_handler_work()->rproc_trigger_recovery()->
>>>> rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() is called,
>>>> "rproc->state = RPROC_CRASHED" is set in the rproc_crash_handler_work
>>>> before rproc_boot_recovery is called, so checking RPROC_DETACHED can
>>>> not work for this case.
>>>>
>>>
>>> You're right, I forgot about that. I think we need a more generic
>>> solution for this though. rproc_stop_subdevices() should not be called
>>> without a prior call to rproc_start_subdevices().
>>>
>>> I think there are a couple of options for this:
>>>
>>>    - Add a bool "subdevs_started" to struct rproc and manage that
>>>      separately from the rproc->state.
>>>
>>>    - Track the rproc state before the crash separately (something like
>>>      rproc->state_before_crash) and check that in the stop path.
>>>
>>>    - Add a new state RPROC_CRASHED_DETACHED to make sure the states are
>>>      unique.
>>>
>>>    - ...
>>>
>>
>> Sure, I think a bool like subdevs_started will be better for maintain?
>>
>>> Does the same issue also exist in qcom_pas_stop() of "[PATCH v5 4/5]
>>> remoteproc: qcom: pas: Add late attach support for subsystems" [1]?
>>> There you check for pas->rproc->state != RPROC_ATTACHED, wouldn't this
>>> also fail for the RPROC_CRASHED case?
>>>
>>
>> I tested calling rproc_report_crash directly during qcom_pas_attach but
>> did not see issue, handover_issued is set only if attach is success
>> so "handover = qcom_q6v5_unprepare(&pas->q6v5);" will return false and
>> "qcom_pas_handover(&pas->q6v5);" will not be called.
>>
> 
> Hm, as you mention, if you call rproc_report_crash() during
> qcom_pas_attach() then handover_issued does not get set (so it's still
> set to false). But qcom_q6v5_unprepare() returns !q6v5->handover_issued
> (handover_issued negated), so !false -> true. So I think exactly the
> opposite will happen and qcom_pas_handover(&pas->q6v5); will get called?
> It should not be called in that case, because this will break the
> reference counting for the regulator/clock resources.
> 
> In addition, even the disable_irq(q6v5->handover_irq); inside
> qcom_q6v5_unprepare() is problematic. enable_irq()/disable_irq() are
> also reference-counted, so disable_irq() should not be called without a
> prior enable_irq() or you end up having the IRQ permanently disabled.
> See e.g. commit 110be46f5afe2 ("remoteproc: qcom: q6v5: Avoid disabling
> handover IRQ twice").
> 
> Thanks,
> Stephan

ACK, will handle this more carefully.

Thanks,
Jingyi


      reply	other threads:[~2026-04-28  4:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09  8:46 [PATCH 0/2] remoteproc: improve robustness for rproc_attach fail cases Jingyi Wang
2026-04-09  8:46 ` [PATCH 1/2] remoteproc: core: Attach rproc asynchronously in rproc_add() path Jingyi Wang
2026-04-10 14:28   ` Stephan Gerhold
2026-04-14  3:41     ` Jingyi Wang
2026-04-14  8:13       ` Stephan Gerhold
2026-04-09  8:46 ` [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop() Jingyi Wang
2026-04-10 14:15   ` Stephan Gerhold
2026-04-14  3:23     ` Jingyi Wang
2026-04-14  8:27       ` Stephan Gerhold
2026-04-16  2:52         ` Jingyi Wang
2026-04-21 13:45           ` Stephan Gerhold
2026-04-28  4:19             ` Jingyi Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0d5d665-e960-4d32-899f-f80b186da649@oss.qualcomm.com \
    --to=jingyi.wang@oss.qualcomm.com \
    --cc=aiqun.yu@oss.qualcomm.com \
    --cc=andersson@kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-remoteproc@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=stephan.gerhold@linaro.org \
    --cc=tingwei.zhang@oss.qualcomm.com \
    --cc=trilok.soni@oss.qualcomm.com \
    --cc=yijie.yang@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox