From: "Aiqun(Maria) Yu" <aiqun.yu@oss.qualcomm.com>
To: Stephan Gerhold <stephan.gerhold@linaro.org>,
Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Cc: Bjorn Andersson <andersson@kernel.org>,
Mathieu Poirier <mathieu.poirier@linaro.org>,
Rob Herring <robh@kernel.org>,
Krzysztof Kozlowski <krzk+dt@kernel.org>,
Conor Dooley <conor+dt@kernel.org>,
Manivannan Sadhasivam <mani@kernel.org>,
tingwei.zhang@oss.qualcomm.com, trilok.soni@oss.qualcomm.com,
yijie.yang@oss.qualcomm.com, linux-arm-msm@vger.kernel.org,
linux-remoteproc@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org,
Gokul krishna Krishnakumar <Gokul.krishnakumar@oss.qualcomm>
Subject: Re: [PATCH v2 4/7] remoteproc: qcom: pas: Add late attach support for subsystems
Date: Fri, 31 Oct 2025 17:03:56 +0800 [thread overview]
Message-ID: <c15f083d-a2c1-462a-aad4-a72b36fbe1ac@oss.qualcomm.com> (raw)
In-Reply-To: <aQHmanEiWmEac7aV@linaro.org>
On 10/29/2025 6:03 PM, Stephan Gerhold wrote:
> On Wed, Oct 29, 2025 at 01:05:42AM -0700, Jingyi Wang wrote:
>> From: Gokul krishna Krishnakumar <Gokul.krishnakumar@oss.qualcomm>
>>
>> Subsystems can be brought out of reset by entities such as
>> bootloaders. Before attaching such subsystems, it is important to
>> check the state of the subsystem. This patch adds support to attach
>> to a subsystem by ensuring that the subsystem is in a sane state by
>> reading SMP2P bits and pinging the subsystem.
>>
>> Signed-off-by: Gokul krishna Krishnakumar <Gokul.krishnakumar@oss.qualcomm>
>> Co-developed-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
>> Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
>> ---
>> drivers/remoteproc/qcom_q6v5.c | 89 ++++++++++++++++++++++++++++++++++++-
>> drivers/remoteproc/qcom_q6v5.h | 14 +++++-
>> drivers/remoteproc/qcom_q6v5_adsp.c | 2 +-
>> drivers/remoteproc/qcom_q6v5_mss.c | 2 +-
>> drivers/remoteproc/qcom_q6v5_pas.c | 63 +++++++++++++++++++++++++-
>> 5 files changed, 165 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c
>> index 58d5b85e58cd..4ce9e43fc5c7 100644
>> --- a/drivers/remoteproc/qcom_q6v5.c
>> +++ b/drivers/remoteproc/qcom_q6v5.c
>> [...]
>> @@ -234,6 +246,77 @@ unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5)
>> }
>> EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
>>
>> +static irqreturn_t q6v5_pong_interrupt(int irq, void *data)
>> +{
>> + struct qcom_q6v5 *q6v5 = data;
>> +
>> + complete(&q6v5->ping_done);
>> +
>> + return IRQ_HANDLED;
>> +}
>> +
>> +int qcom_q6v5_ping_subsystem(struct qcom_q6v5 *q6v5)
>> +{
>> + int ret;
>> + int ping_failed = 0;
>> +
>> + reinit_completion(&q6v5->ping_done);
>> +
>> + /* Set master kernel Ping bit */
>> + ret = qcom_smem_state_update_bits(q6v5->ping_state,
>> + BIT(q6v5->ping_bit), BIT(q6v5->ping_bit));
>> + if (ret) {
>> + dev_err(q6v5->dev, "Failed to update ping bits\n");
>> + return ret;
>> + }
>> +
>> + ret = wait_for_completion_timeout(&q6v5->ping_done, msecs_to_jiffies(PING_TIMEOUT));
>> + if (!ret) {
>> + ping_failed = -ETIMEDOUT;
>> + dev_err(q6v5->dev, "Failed to get back pong\n");
>> + }
>> +
>> + /* Clear ping bit master kernel */
>> + ret = qcom_smem_state_update_bits(q6v5->ping_state, BIT(q6v5->ping_bit), 0);
>> + if (ret) {
>> + pr_err("Failed to clear master kernel bits\n");
>
> dev_err()?
>
>> + return ret;
>> + }
>> +
>> + if (ping_failed)
>> + return ping_failed;
>
> Could just "return ping_failed;" directly.
>
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_q6v5_ping_subsystem);
>> +
>> +int qcom_q6v5_ping_subsystem_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev)
>> +{
>> + int ret = -ENODEV;
>> +
>> + q6v5->ping_state = devm_qcom_smem_state_get(&pdev->dev, "ping", &q6v5->ping_bit);
>> + if (IS_ERR(q6v5->ping_state)) {
>> + dev_err(&pdev->dev, "failed to acquire smem state %ld\n",
>> + PTR_ERR(q6v5->ping_state));
>> + return ret;
>
> return PTR_ERR(q6v5->ping_state)?
>
>> + }
>> +
>> + q6v5->pong_irq = platform_get_irq_byname(pdev, "pong");
>> + if (q6v5->pong_irq < 0)
>> + return q6v5->pong_irq;
>> +
>> + ret = devm_request_threaded_irq(&pdev->dev, q6v5->pong_irq, NULL,
>> + q6v5_pong_interrupt, IRQF_TRIGGER_RISING | IRQF_ONESHOT,
>> + "q6v5 pong", q6v5);
>> + if (ret)
>> + dev_err(&pdev->dev, "failed to acquire pong IRQ\n");
>> +
>> + init_completion(&q6v5->ping_done);
>
> It would be better to have init_completion() before requesting the
> interrupt, to guarantee that complete(&q6v5->ping_done); cannot happen
> before the completion struct is initialized.
>
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_q6v5_ping_subsystem_init);
>> +
>> /**
>> * qcom_q6v5_init() - initializer of the q6v5 common struct
>> * @q6v5: handle to be initialized
>> @@ -247,7 +330,7 @@ EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
>> */
>> int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>> struct rproc *rproc, int crash_reason, const char *load_state,
>> - void (*handover)(struct qcom_q6v5 *q6v5))
>> + bool early_boot, void (*handover)(struct qcom_q6v5 *q6v5))
>> {
>> int ret;
>>
>> @@ -255,10 +338,14 @@ int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>> q6v5->dev = &pdev->dev;
>> q6v5->crash_reason = crash_reason;
>> q6v5->handover = handover;
>> + q6v5->early_boot = early_boot;
>>
>> init_completion(&q6v5->start_done);
>> init_completion(&q6v5->stop_done);
>>
>> + if (early_boot)
>> + init_completion(&q6v5->subsys_booted);
>> +
>> q6v5->wdog_irq = platform_get_irq_byname(pdev, "wdog");
>> if (q6v5->wdog_irq < 0)
>> return q6v5->wdog_irq;
>> diff --git a/drivers/remoteproc/qcom_q6v5.h b/drivers/remoteproc/qcom_q6v5.h
>> index 5a859c41896e..8a227bf70d7e 100644
>> --- a/drivers/remoteproc/qcom_q6v5.h
>> +++ b/drivers/remoteproc/qcom_q6v5.h
>> @@ -12,27 +12,35 @@ struct rproc;
>> struct qcom_smem_state;
>> struct qcom_sysmon;
>>
>> +#define PING_TIMEOUT 500 /* in milliseconds */
>> +#define PING_TEST_WAIT 500 /* in milliseconds */
>
> Why is this defined in the shared header rather than the C file that
> uses this?
>
> PING_TEST_WAIT looks unused.
>
>> +
>> struct qcom_q6v5 {
>> struct device *dev;
>> struct rproc *rproc;
>>
>> struct qcom_smem_state *state;
>> + struct qcom_smem_state *ping_state;
>> struct qmp *qmp;
>>
>> struct icc_path *path;
>>
>> unsigned stop_bit;
>> + unsigned int ping_bit;
>>
>> int wdog_irq;
>> int fatal_irq;
>> int ready_irq;
>> int handover_irq;
>> int stop_irq;
>> + int pong_irq;
>>
>> bool handover_issued;
>>
>> struct completion start_done;
>> struct completion stop_done;
>> + struct completion subsys_booted;
>> + struct completion ping_done;
>>
>> int crash_reason;
>>
>> @@ -40,11 +48,13 @@ struct qcom_q6v5 {
>>
>> const char *load_state;
>> void (*handover)(struct qcom_q6v5 *q6v5);
>> +
>> + bool early_boot;
>> };
>>
>> int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>> struct rproc *rproc, int crash_reason, const char *load_state,
>> - void (*handover)(struct qcom_q6v5 *q6v5));
>> + bool early_boot, void (*handover)(struct qcom_q6v5 *q6v5));
>> void qcom_q6v5_deinit(struct qcom_q6v5 *q6v5);
>>
>> int qcom_q6v5_prepare(struct qcom_q6v5 *q6v5);
>> @@ -52,5 +62,7 @@ int qcom_q6v5_unprepare(struct qcom_q6v5 *q6v5);
>> int qcom_q6v5_request_stop(struct qcom_q6v5 *q6v5, struct qcom_sysmon *sysmon);
>> int qcom_q6v5_wait_for_start(struct qcom_q6v5 *q6v5, int timeout);
>> unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5);
>> +int qcom_q6v5_ping_subsystem(struct qcom_q6v5 *q6v5);
>> +int qcom_q6v5_ping_subsystem_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev);
>>
>> #endif
>> [...]
>> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
>> index 158bcd6cc85c..b667c11aadb5 100644
>> --- a/drivers/remoteproc/qcom_q6v5_pas.c
>> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
>> @@ -35,6 +35,8 @@
>>
>> #define MAX_ASSIGN_COUNT 3
>>
>> +#define EARLY_BOOT_RETRY_INTERVAL_MS 5000
>> +
>> struct qcom_pas_data {
>> int crash_reason_smem;
>> const char *firmware_name;
>> @@ -59,6 +61,7 @@ struct qcom_pas_data {
>> int region_assign_count;
>> bool region_assign_shared;
>> int region_assign_vmid;
>> + bool early_boot;
>> };
>>
>> struct qcom_pas {
>> @@ -409,6 +412,8 @@ static int qcom_pas_stop(struct rproc *rproc)
>> if (pas->smem_host_id)
>> ret = qcom_smem_bust_hwspin_lock_by_host(pas->smem_host_id);
>>
>> + pas->q6v5.early_boot = false;
>> +
>> return ret;
>> }
>>
>> @@ -434,6 +439,51 @@ static unsigned long qcom_pas_panic(struct rproc *rproc)
>> return qcom_q6v5_panic(&pas->q6v5);
>> }
>>
>> +static int qcom_pas_attach(struct rproc *rproc)
>> +{
>> + int ret;
>> + struct qcom_pas *adsp = rproc->priv;
>> + bool ready_state;
>> + bool crash_state;
>> +
>> + if (!adsp->q6v5.early_boot)
>> + return -EINVAL;
>> +
>> + ret = irq_get_irqchip_state(adsp->q6v5.fatal_irq,
>> + IRQCHIP_STATE_LINE_LEVEL, &crash_state);
>> +
>> + if (crash_state) {
>
> crash_state will be uninitialized if irq_get_irqchip_state() returns an
> error.
Good catch.
Suggest to check ret result. If don't have fatal_irq available, then
just return fail on the attach and don't need to try crash reporting.
>
>> + dev_err(adsp->dev, "Sub system has crashed before driver probe\n");
>> + adsp->rproc->state = RPROC_CRASHED;
>> + return -EINVAL;
>
> Ok, so the subsystem has crashed. Now what? We probably want to restart
> it, but I don't think anyone will handle the RPROC_CRASHED state you are
> setting here.
Agree. RPROC_CRASHED needed to be left for crash handler to set.
>
> I think it would make more sense to call rproc_report_crash() here. This
> will set RPROC_CRASHED for you and trigger recovery. I'm not sure if
> this works properly in RPROC_DETACHED state, please test to make sure
> that works as intended.
Agree.
Suggest to have:
q6v5->running = false;
rproc_report_crash(q6v5->rproc, RPROC_FATAL_ERROR);
Test to be performed like:
Explicitly hack to always comes to crash_state here to see if it is good
to perform the crash recovery.
>
>> + }
>> +
>> + ret = irq_get_irqchip_state(adsp->q6v5.ready_irq,
>> + IRQCHIP_STATE_LINE_LEVEL, &ready_state);
>> +
>> + if (ready_state) {
>> + dev_info(adsp->dev, "Sub system has boot-up before driver probe\n");
>
> This message feels redundant, dmesg already shows a different message
> for "attaching" vs "booting" a remoteproc.
>
>> + adsp->rproc->state = RPROC_DETACHED;
>
> What is the point of this assignment? You have already set this state
> inside qcom_pas_probe().
Make sense.
>
>> + } else {
>> + ret = wait_for_completion_timeout(&adsp->q6v5.subsys_booted,
>> + msecs_to_jiffies(EARLY_BOOT_RETRY_INTERVAL_MS));
>> + if (!ret) {
>> + dev_err(adsp->dev, "Timeout on waiting for subsystem interrupt\n");
>> + return -ETIMEDOUT;
>> + }
>
> This looks like you want to handle the case where the remoteproc is
> still booting while this code is running (i.e. it has not finished
> booting yet by signaling the ready state). Is this situation actually
> possible with the current firmware design?
This shouldn't happen during the initial boot stage, as far as I understand.
The current remoteproc is required by the bootloader/firmware before the
kernel even starts, so it shouldn't be in a state where it's still
booting at that point. If it were, the early_boot feature wouldn't be
necessary at all.
However, if the remoteproc is designed like in a second attempt to
attach—especially when RPROC_FEAT_ATTACH_ON_RECOVERY is enabled—then
it's possible this could occur(remoteproc is auto booting while kernel
is trying to attach with ready state check) as a corner case during boot-up.
>
> I don't see how this would reliably work in practice. If firmware boots
> a remoteproc early it should wait until:
>
> - Handover is signaled, to ensure the proxy votes are kept
> - Ready is signaled, to ensure the metadata region remains reserved
>
> None of this is guaranteed if the firmware gives up control to Linux
> before waiting for the signals.
>
> I would suggest dropping all the code related to handling the late
> "subsys_booted" completion. If this is needed, can you explain when
> exactly this situation happens and how you guarantee reliable startup of
> the remoteproc?
For the kaanapali specific remoteproc(soccp) with early-boot feature
here, it is rely on the rproc_shutdown/boot to recovery. And it should
be very corner case like bootloader/firmware bug to have such kind of
not ready state. Maybe we can simple remove the "subsys_booted"
mechanism, and only do a rproc_report_crash in this corner case.
>
>> + }
>> +
>> + ret = qcom_q6v5_ping_subsystem(&adsp->q6v5);
>> + if (ret) {
>> + dev_err(adsp->dev, "Failed to ping subsystem, assuming device crashed\n");
>> + rproc->state = RPROC_CRASHED;
>> + return ret;
>> + }
>> +
>> + adsp->q6v5.running = true;
>
> You should probably also set q6v5->handover_issued = true;, otherwise
> qcom_pas_stop() will later drop all the handover votes that you have
> never made. This will break all the reference counting.
Acked for all above comments you described and well understood.
>
> Overall, this patch feels quite fragile in the current state. Please
> make sure you carefully consider all side effects and new edge cases
> introduced by your changes.
While for other edge cases and side effects, maybe Stephan can help on
have more details.
>
> Thanks,
> Stephan
--
Thx and BRs,
Aiqun(Maria) Yu
next prev parent reply other threads:[~2025-10-31 9:04 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 8:05 [PATCH v2 0/7] Add initial remoteproc support for Kaanapali and Glymur SoCs Jingyi Wang
2025-10-29 8:05 ` [PATCH v2 1/7] dt-bindings: remoteproc: qcom,sm8550-pas: Add Kaanapali ADSP Jingyi Wang
2025-10-29 8:05 ` [PATCH v2 2/7] dt-bindings: remoteproc: qcom,sm8550-pas: Add Kaanapali CDSP Jingyi Wang
2025-10-29 8:05 ` [PATCH v2 3/7] dt-bindings: remoteproc: qcom,pas: Document pas for SoCCP on Kaanapali and Glymur platforms Jingyi Wang
2025-10-29 11:28 ` Rob Herring (Arm)
2025-10-30 2:10 ` Jingyi Wang
2025-10-30 6:24 ` Krzysztof Kozlowski
2025-10-29 13:22 ` Rob Herring
2025-10-30 7:44 ` Jingyi Wang
2025-10-30 6:27 ` Krzysztof Kozlowski
2025-10-30 7:50 ` Jingyi Wang
2025-10-29 8:05 ` [PATCH v2 4/7] remoteproc: qcom: pas: Add late attach support for subsystems Jingyi Wang
2025-10-29 10:03 ` Stephan Gerhold
2025-10-31 9:03 ` Aiqun(Maria) Yu [this message]
2025-10-29 10:09 ` Aiqun(Maria) Yu
2025-11-01 9:53 ` kernel test robot
2025-11-01 17:49 ` kernel test robot
2025-11-01 18:34 ` Bjorn Andersson
2025-10-29 8:05 ` [PATCH v2 5/7] remoteproc: qcom_q6v5_pas: Add SoCCP node on Kaanapali Jingyi Wang
2025-10-29 8:05 ` [PATCH v2 6/7] dt-bindings: remoteproc: qcom,sm8550-pas: Document Glymur ADSP Jingyi Wang
2025-10-29 11:28 ` Rob Herring (Arm)
2025-10-29 13:23 ` Rob Herring
2025-10-29 13:28 ` Rob Herring
2025-10-29 8:05 ` [PATCH v2 7/7] dt-bindings: remoteproc: qcom,sm8550-pas: Document Glymur CDSP Jingyi Wang
2025-10-29 11:28 ` Rob Herring (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c15f083d-a2c1-462a-aad4-a72b36fbe1ac@oss.qualcomm.com \
--to=aiqun.yu@oss.qualcomm.com \
--cc=Gokul.krishnakumar@oss.qualcomm \
--cc=andersson@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=jingyi.wang@oss.qualcomm.com \
--cc=krzk+dt@kernel.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-remoteproc@vger.kernel.org \
--cc=mani@kernel.org \
--cc=mathieu.poirier@linaro.org \
--cc=robh@kernel.org \
--cc=stephan.gerhold@linaro.org \
--cc=tingwei.zhang@oss.qualcomm.com \
--cc=trilok.soni@oss.qualcomm.com \
--cc=yijie.yang@oss.qualcomm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).