public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms
@ 2024-07-08 16:22 Stephan Gerhold
  2024-07-08 16:22 ` [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100 Stephan Gerhold
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Stephan Gerhold @ 2024-07-08 16:22 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio
  Cc: linux-arm-msm, linux-kernel, Dmitry Baryshkov, Abel Vesa,
	Johan Hovold

Add X1E80100 to the in-kernel pd-mapper to avoid having to run the
userspace daemon for charging and audio functionality. Also add entries for
some missing older platforms without protection domains.

Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
---
Stephan Gerhold (2):
      soc: qcom: pd_mapper: Add X1E80100
      soc: qcom: pd_mapper: Add more older platforms without domains

 drivers/soc/qcom/qcom_pd_mapper.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
---
base-commit: e01314c6936b0ab08d4c661e2f68852bfe1003f8
change-id: 20240708-x1e80100-pd-mapper-2a31288eb8ca

Best regards,
-- 
Stephan Gerhold <stephan.gerhold@linaro.org>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-08 16:22 [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms Stephan Gerhold
@ 2024-07-08 16:22 ` Stephan Gerhold
  2024-07-08 20:22   ` Dmitry Baryshkov
  2024-07-29 14:57   ` Johan Hovold
  2024-07-08 16:22 ` [PATCH 2/2] soc: qcom: pd_mapper: Add more older platforms without domains Stephan Gerhold
  2024-08-15 20:40 ` [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms Bjorn Andersson
  2 siblings, 2 replies; 15+ messages in thread
From: Stephan Gerhold @ 2024-07-08 16:22 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio
  Cc: linux-arm-msm, linux-kernel, Dmitry Baryshkov, Abel Vesa,
	Johan Hovold

X1E80100 has the same protection domains as SM8550, except that MPSS is
missing. Add it to the in-kernel pd-mapper to avoid having to run the
daemon in userspace for charging and audio functionality.

Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
---
 drivers/soc/qcom/qcom_pd_mapper.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/soc/qcom/qcom_pd_mapper.c b/drivers/soc/qcom/qcom_pd_mapper.c
index a4c007080665..ff6aa3e521bd 100644
--- a/drivers/soc/qcom/qcom_pd_mapper.c
+++ b/drivers/soc/qcom/qcom_pd_mapper.c
@@ -517,6 +517,15 @@ static const struct qcom_pdm_domain_data *sm8550_domains[] = {
 	NULL,
 };
 
+static const struct qcom_pdm_domain_data *x1e80100_domains[] = {
+	&adsp_audio_pd,
+	&adsp_root_pd,
+	&adsp_charger_pd,
+	&adsp_sensor_pd,
+	&cdsp_root_pd,
+	NULL,
+};
+
 static const struct of_device_id qcom_pdm_domains[] = {
 	{ .compatible = "qcom,apq8064", .data = NULL, },
 	{ .compatible = "qcom,apq8074", .data = NULL, },
@@ -545,6 +554,7 @@ static const struct of_device_id qcom_pdm_domains[] = {
 	{ .compatible = "qcom,sm8450", .data = sm8350_domains, },
 	{ .compatible = "qcom,sm8550", .data = sm8550_domains, },
 	{ .compatible = "qcom,sm8650", .data = sm8550_domains, },
+	{ .compatible = "qcom,x1e80100", .data = x1e80100_domains, },
 	{},
 };
 

-- 
2.44.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] soc: qcom: pd_mapper: Add more older platforms without domains
  2024-07-08 16:22 [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms Stephan Gerhold
  2024-07-08 16:22 ` [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100 Stephan Gerhold
@ 2024-07-08 16:22 ` Stephan Gerhold
  2024-07-08 20:22   ` Dmitry Baryshkov
  2024-08-15 20:40 ` [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms Bjorn Andersson
  2 siblings, 1 reply; 15+ messages in thread
From: Stephan Gerhold @ 2024-07-08 16:22 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio
  Cc: linux-arm-msm, linux-kernel, Dmitry Baryshkov, Abel Vesa,
	Johan Hovold

MSM8909, MSM8916 and MSM8939 all do not make use of pd-mapper, add them to
the list similar to the other older platforms to avoid the following
message in dmesg when booting:

 "PDM: no support for the platform, userspace daemon might be required."

Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
---
 drivers/soc/qcom/qcom_pd_mapper.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/soc/qcom/qcom_pd_mapper.c b/drivers/soc/qcom/qcom_pd_mapper.c
index ff6aa3e521bd..0b9a149b2953 100644
--- a/drivers/soc/qcom/qcom_pd_mapper.c
+++ b/drivers/soc/qcom/qcom_pd_mapper.c
@@ -527,11 +527,15 @@ static const struct qcom_pdm_domain_data *x1e80100_domains[] = {
 };
 
 static const struct of_device_id qcom_pdm_domains[] = {
+	{ .compatible = "qcom,apq8016", .data = NULL, },
 	{ .compatible = "qcom,apq8064", .data = NULL, },
 	{ .compatible = "qcom,apq8074", .data = NULL, },
 	{ .compatible = "qcom,apq8084", .data = NULL, },
 	{ .compatible = "qcom,apq8096", .data = msm8996_domains, },
 	{ .compatible = "qcom,msm8226", .data = NULL, },
+	{ .compatible = "qcom,msm8909", .data = NULL, },
+	{ .compatible = "qcom,msm8916", .data = NULL, },
+	{ .compatible = "qcom,msm8939", .data = NULL, },
 	{ .compatible = "qcom,msm8974", .data = NULL, },
 	{ .compatible = "qcom,msm8996", .data = msm8996_domains, },
 	{ .compatible = "qcom,msm8998", .data = msm8998_domains, },

-- 
2.44.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] soc: qcom: pd_mapper: Add more older platforms without domains
  2024-07-08 16:22 ` [PATCH 2/2] soc: qcom: pd_mapper: Add more older platforms without domains Stephan Gerhold
@ 2024-07-08 20:22   ` Dmitry Baryshkov
  0 siblings, 0 replies; 15+ messages in thread
From: Dmitry Baryshkov @ 2024-07-08 20:22 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Bjorn Andersson, Konrad Dybcio, linux-arm-msm, linux-kernel,
	Abel Vesa, Johan Hovold

On Mon, Jul 08, 2024 at 06:22:10PM GMT, Stephan Gerhold wrote:
> MSM8909, MSM8916 and MSM8939 all do not make use of pd-mapper, add them to
> the list similar to the other older platforms to avoid the following
> message in dmesg when booting:
> 
>  "PDM: no support for the platform, userspace daemon might be required."
> 
> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
> ---
>  drivers/soc/qcom/qcom_pd_mapper.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-08 16:22 ` [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100 Stephan Gerhold
@ 2024-07-08 20:22   ` Dmitry Baryshkov
  2024-07-29 14:57   ` Johan Hovold
  1 sibling, 0 replies; 15+ messages in thread
From: Dmitry Baryshkov @ 2024-07-08 20:22 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Bjorn Andersson, Konrad Dybcio, linux-arm-msm, linux-kernel,
	Abel Vesa, Johan Hovold

On Mon, Jul 08, 2024 at 06:22:09PM GMT, Stephan Gerhold wrote:
> X1E80100 has the same protection domains as SM8550, except that MPSS is
> missing. Add it to the in-kernel pd-mapper to avoid having to run the
> daemon in userspace for charging and audio functionality.
> 
> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
> ---
>  drivers/soc/qcom/qcom_pd_mapper.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>


-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-08 16:22 ` [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100 Stephan Gerhold
  2024-07-08 20:22   ` Dmitry Baryshkov
@ 2024-07-29 14:57   ` Johan Hovold
  2024-07-30  9:28     ` Johan Hovold
  1 sibling, 1 reply; 15+ messages in thread
From: Johan Hovold @ 2024-07-29 14:57 UTC (permalink / raw)
  To: Stephan Gerhold, Bjorn Andersson, Dmitry Baryshkov
  Cc: Konrad Dybcio, linux-arm-msm, linux-kernel, Abel Vesa

On Mon, Jul 08, 2024 at 06:22:09PM +0200, Stephan Gerhold wrote:
> X1E80100 has the same protection domains as SM8550, except that MPSS is
> missing. Add it to the in-kernel pd-mapper to avoid having to run the
> daemon in userspace for charging and audio functionality.

I'm seeing a bunch of new errors when running with this patch applied on
top of 6.11-rc1. I'm assuming it is due to changes in timing that are
either exposing existing bugs or there is a general problem with the
in-kernel pd-mapper implementation.

In any case, this does does not seem to be specific to x1e80100 even if
I'm not seeing as many issues on sc8280xp (there is one new error there
too however).

It doesn't happen on every boot, but with the in-kernel pd-mapper I
often (every 3-4 boots) see the following errors on the x1e80100 CRD
during boot:

	[    9.799719] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
        [    9.812446] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
        [    9.831796] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125

	[    9.269230] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications

I've also seen the following, which may also be related:

	[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
        [   14.571943] PDR: service lookup for avs/audio failed: -110

I haven't seen the -ECANCELED (-125) errors in 30 reboots with the patch
reverted again.

I once saw

	power_supply qcom-battmgr-bat: driver failed to report `status' property: -110

on the Lenovo ThinkPad X13s which may or may not be related.

Johan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-29 14:57   ` Johan Hovold
@ 2024-07-30  9:28     ` Johan Hovold
  2024-07-30 11:06       ` Dmitry Baryshkov
  0 siblings, 1 reply; 15+ messages in thread
From: Johan Hovold @ 2024-07-30  9:28 UTC (permalink / raw)
  To: Stephan Gerhold, Bjorn Andersson, Dmitry Baryshkov
  Cc: Konrad Dybcio, linux-arm-msm, linux-kernel, Abel Vesa

On Mon, Jul 29, 2024 at 04:57:54PM +0200, Johan Hovold wrote:
> On Mon, Jul 08, 2024 at 06:22:09PM +0200, Stephan Gerhold wrote:
> > X1E80100 has the same protection domains as SM8550, except that MPSS is
> > missing. Add it to the in-kernel pd-mapper to avoid having to run the
> > daemon in userspace for charging and audio functionality.
> 
> I'm seeing a bunch of new errors when running with this patch applied on
> top of 6.11-rc1. I'm assuming it is due to changes in timing that are
> either exposing existing bugs or there is a general problem with the
> in-kernel pd-mapper implementation.
> 
> In any case, this does does not seem to be specific to x1e80100 even if
> I'm not seeing as many issues on sc8280xp (there is one new error there
> too however).
> 
> It doesn't happen on every boot, but with the in-kernel pd-mapper I
> often (every 3-4 boots) see the following errors on the x1e80100 CRD
> during boot:
> 
> 	[    9.799719] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>         [    9.812446] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>         [    9.831796] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> 
> 	[    9.269230] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> 
> I've also seen the following, which may also be related:
> 
> 	[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
>         [   14.571943] PDR: service lookup for avs/audio failed: -110
> 
> I haven't seen the -ECANCELED (-125) errors in 30 reboots with the patch
> reverted again.

Here's another bug, a NULL deref in the battery driver, that is
apparently exposed by the in-kernel pd-mapper. This is also on the
x1e80100 CRD with a couple of added printks to indicate when the
pd-mapper probes and when the pmic glink services are up:

[    8.933775] remoteproc remoteproc1: powering up 32300000.remoteproc
[    8.934623] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fd5000.phy
[    8.945244] remoteproc remoteproc1: Booting fw image qcom/x1e80100/cdsp.mbn, size 3027368
[    8.965537] remoteproc remoteproc0: powering up 30000000.remoteproc
[    8.971075] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fda000.phy
[    8.974299] remoteproc remoteproc0: Booting fw image qcom/x1e80100/adsp.mbn, size 21424472
[    8.999726] msm-mdss ae00000.display-subsystem: Adding to iommu group 4
[    9.007697] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fdf000.phy
[    9.101196] remoteproc remoteproc1: remote processor 32300000.remoteproc is now up
[    9.103860] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.1: qcom_pdm_probe
[    9.105989] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.0: qcom_pdm_probe

 - pd-mapper probing

[    9.112983] qcom-snps-eusb2-hsphy fd3000.phy: Registered Qcom-eUSB2 phy
[    9.296879] remoteproc remoteproc0: remote processor 30000000.remoteproc is now up

 - adsp is up

[    9.300086] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 7fffffff

 - SERVREG_SERVICE_STATE_UNINIT

[    9.301878] qcom-snps-eusb2-hsphy fd9000.phy: Registered Qcom-eUSB2 phy
[    9.306985] qcom,fastrpc 30000000.remoteproc:glink-edge.fastrpcglink-apps-dsp.-1.-1: no reserved DMA memory for FAST
RPC
[    9.309924] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@3: Adding to iommu group 5
[    9.311367] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@4: Adding to iommu group 6
[    9.318330] PDR: Indication received from msm/adsp/charger_pd, state: 0x1fffffff, trans-id: 1

 - This looks suspicious

[    9.323924] qcom-snps-eusb2-hsphy fde000.phy: Registered Qcom-eUSB2 phy
[    9.325275] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@5: Adding to iommu group 7
[    9.326008] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@6: Adding to iommu group 8
[    9.326733] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@7: Adding to iommu group 9
[    9.336582] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 1fffffff

 - SERVREG_SERVICE_STATE_UP

[    9.345544] dwc3 a000000.usb: Adding to iommu group 10
[    9.361410] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
[    9.362803] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
[    9.362882] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125

 - -ECANCELED errors I reported earlier

[    9.364298] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[    9.364339] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[    9.364395] CPU: 6 UID: 0 PID: 111 Comm: kworker/6:4 Not tainted 6.11.0-rc1 #70
[    9.364397] Hardware name: Qualcomm CRD, BIOS 6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
[    9.364398] Workqueue: events qcom_battmgr_enable_worker [qcom_battmgr]
[    9.364401] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    9.364403] pc : pmic_glink_send+0xc/0x24 [pmic_glink]
[    9.364405] lr : qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
...
[    9.364427] Call trace:
[    9.364428]  pmic_glink_send+0xc/0x24 [pmic_glink]
[    9.364429]  qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
[    9.364430]  process_one_work+0x210/0x614
[    9.364435]  worker_thread+0x244/0x388
[    9.364436]  kthread+0x124/0x128
[    9.364437]  ret_from_fork+0x10/0x20
[    9.364439] Code: 17fffff7 d503233f a9bf7bfd 910003fd (f9400800)
[    9.364441] ---[ end trace 0000000000000000 ]---

[    9.365205] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125

Johan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-30  9:28     ` Johan Hovold
@ 2024-07-30 11:06       ` Dmitry Baryshkov
  2024-07-30 23:50         ` Chris Lew
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Baryshkov @ 2024-07-30 11:06 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Stephan Gerhold, Bjorn Andersson, Konrad Dybcio, linux-arm-msm,
	linux-kernel, Abel Vesa

On Tue, Jul 30, 2024 at 11:28:14AM GMT, Johan Hovold wrote:
> On Mon, Jul 29, 2024 at 04:57:54PM +0200, Johan Hovold wrote:
> > On Mon, Jul 08, 2024 at 06:22:09PM +0200, Stephan Gerhold wrote:
> > > X1E80100 has the same protection domains as SM8550, except that MPSS is
> > > missing. Add it to the in-kernel pd-mapper to avoid having to run the
> > > daemon in userspace for charging and audio functionality.
> > 
> > I'm seeing a bunch of new errors when running with this patch applied on
> > top of 6.11-rc1. I'm assuming it is due to changes in timing that are
> > either exposing existing bugs or there is a general problem with the
> > in-kernel pd-mapper implementation.
> > 
> > In any case, this does does not seem to be specific to x1e80100 even if
> > I'm not seeing as many issues on sc8280xp (there is one new error there
> > too however).
> > 
> > It doesn't happen on every boot, but with the in-kernel pd-mapper I
> > often (every 3-4 boots) see the following errors on the x1e80100 CRD
> > during boot:
> > 
> > 	[    9.799719] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> >         [    9.812446] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> >         [    9.831796] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> > 
> > 	[    9.269230] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> > 
> > I've also seen the following, which may also be related:
> > 
> > 	[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
> >         [   14.571943] PDR: service lookup for avs/audio failed: -110
> > 
> > I haven't seen the -ECANCELED (-125) errors in 30 reboots with the patch
> > reverted again.
> 
> Here's another bug, a NULL deref in the battery driver, that is
> apparently exposed by the in-kernel pd-mapper. This is also on the
> x1e80100 CRD with a couple of added printks to indicate when the
> pd-mapper probes and when the pmic glink services are up:

The backtrace looks like an issue in the battmgr / pmic_glink core. Yes,
maybe pd-mapper exposes that. But most likely nobody have seen those
because userspace pd-mapper usually starts much later (thanks udevadm
trigger for triggering all the drivers).

The pd-mapper server is fine to be started early. Even the userspace
one.  I think we went over these discussions during reviews of earlier
series. The net result was that it is fine, provided that the response
don't change later on (e.g.  some of the firmware might save the state
and won't re-query it later on if servreg restarts).

> [    8.933775] remoteproc remoteproc1: powering up 32300000.remoteproc
> [    8.934623] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fd5000.phy
> [    8.945244] remoteproc remoteproc1: Booting fw image qcom/x1e80100/cdsp.mbn, size 3027368
> [    8.965537] remoteproc remoteproc0: powering up 30000000.remoteproc
> [    8.971075] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fda000.phy
> [    8.974299] remoteproc remoteproc0: Booting fw image qcom/x1e80100/adsp.mbn, size 21424472
> [    8.999726] msm-mdss ae00000.display-subsystem: Adding to iommu group 4
> [    9.007697] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fdf000.phy
> [    9.101196] remoteproc remoteproc1: remote processor 32300000.remoteproc is now up
> [    9.103860] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.1: qcom_pdm_probe
> [    9.105989] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.0: qcom_pdm_probe
> 
>  - pd-mapper probing
> 
> [    9.112983] qcom-snps-eusb2-hsphy fd3000.phy: Registered Qcom-eUSB2 phy
> [    9.296879] remoteproc remoteproc0: remote processor 30000000.remoteproc is now up
> 
>  - adsp is up
> 
> [    9.300086] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 7fffffff
> 
>  - SERVREG_SERVICE_STATE_UNINIT
> 
> [    9.301878] qcom-snps-eusb2-hsphy fd9000.phy: Registered Qcom-eUSB2 phy
> [    9.306985] qcom,fastrpc 30000000.remoteproc:glink-edge.fastrpcglink-apps-dsp.-1.-1: no reserved DMA memory for FAST
> RPC
> [    9.309924] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@3: Adding to iommu group 5
> [    9.311367] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@4: Adding to iommu group 6
> [    9.318330] PDR: Indication received from msm/adsp/charger_pd, state: 0x1fffffff, trans-id: 1
> 
>  - This looks suspicious
> 
> [    9.323924] qcom-snps-eusb2-hsphy fde000.phy: Registered Qcom-eUSB2 phy
> [    9.325275] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@5: Adding to iommu group 7
> [    9.326008] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@6: Adding to iommu group 8
> [    9.326733] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@7: Adding to iommu group 9
> [    9.336582] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 1fffffff
> 
>  - SERVREG_SERVICE_STATE_UP
> 
> [    9.345544] dwc3 a000000.usb: Adding to iommu group 10
> [    9.361410] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
> [    9.362803] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> [    9.362882] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> 
>  - -ECANCELED errors I reported earlier


The qcom_glink_request_intent() looks like the only place which can
return ECANCELED here. Not sure why GLINK_CMD_RX_INTENT_REQ_ACK() would
return failure here.

It might be that the ADSP has been running the preliminary firmware,
then it is shut down and then restarted with the proper firmware (and
Linux fails to track that). But in this case the same error can happen
if the pd-mapper has been running before starting the ADSP.

> 
> [    9.364298] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> ...
> [    9.364339] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [    9.364395] CPU: 6 UID: 0 PID: 111 Comm: kworker/6:4 Not tainted 6.11.0-rc1 #70
> [    9.364397] Hardware name: Qualcomm CRD, BIOS 6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
> [    9.364398] Workqueue: events qcom_battmgr_enable_worker [qcom_battmgr]
> [    9.364401] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [    9.364403] pc : pmic_glink_send+0xc/0x24 [pmic_glink]
> [    9.364405] lr : qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
> ...
> [    9.364427] Call trace:
> [    9.364428]  pmic_glink_send+0xc/0x24 [pmic_glink]

It looks like pmic_glink_send might need to hold pg->state_lock.

> [    9.364429]  qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
> [    9.364430]  process_one_work+0x210/0x614
> [    9.364435]  worker_thread+0x244/0x388
> [    9.364436]  kthread+0x124/0x128
> [    9.364437]  ret_from_fork+0x10/0x20
> [    9.364439] Code: 17fffff7 d503233f a9bf7bfd 910003fd (f9400800)
> [    9.364441] ---[ end trace 0000000000000000 ]---
> 
> [    9.365205] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> 
> Johan

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-30 11:06       ` Dmitry Baryshkov
@ 2024-07-30 23:50         ` Chris Lew
  2024-07-31  1:41           ` Dmitry Baryshkov
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Lew @ 2024-07-30 23:50 UTC (permalink / raw)
  To: Dmitry Baryshkov, Johan Hovold
  Cc: Stephan Gerhold, Bjorn Andersson, Konrad Dybcio, linux-arm-msm,
	linux-kernel, Abel Vesa



On 7/30/2024 4:06 AM, Dmitry Baryshkov wrote:
> On Tue, Jul 30, 2024 at 11:28:14AM GMT, Johan Hovold wrote:
>> On Mon, Jul 29, 2024 at 04:57:54PM +0200, Johan Hovold wrote:
>>> On Mon, Jul 08, 2024 at 06:22:09PM +0200, Stephan Gerhold wrote:
>>>> X1E80100 has the same protection domains as SM8550, except that MPSS is
>>>> missing. Add it to the in-kernel pd-mapper to avoid having to run the
>>>> daemon in userspace for charging and audio functionality.
>>>
>>> I'm seeing a bunch of new errors when running with this patch applied on
>>> top of 6.11-rc1. I'm assuming it is due to changes in timing that are
>>> either exposing existing bugs or there is a general problem with the
>>> in-kernel pd-mapper implementation.
>>>
>>> In any case, this does does not seem to be specific to x1e80100 even if
>>> I'm not seeing as many issues on sc8280xp (there is one new error there
>>> too however).
>>>
>>> It doesn't happen on every boot, but with the in-kernel pd-mapper I
>>> often (every 3-4 boots) see the following errors on the x1e80100 CRD
>>> during boot:
>>>
>>> 	[    9.799719] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>>>          [    9.812446] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>>>          [    9.831796] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
>>>
>>> 	[    9.269230] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
>>>
>>> I've also seen the following, which may also be related:
>>>
>>> 	[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
>>>          [   14.571943] PDR: service lookup for avs/audio failed: -110
>>>
>>> I haven't seen the -ECANCELED (-125) errors in 30 reboots with the patch
>>> reverted again.
>>
>> Here's another bug, a NULL deref in the battery driver, that is
>> apparently exposed by the in-kernel pd-mapper. This is also on the
>> x1e80100 CRD with a couple of added printks to indicate when the
>> pd-mapper probes and when the pmic glink services are up:
> 
> The backtrace looks like an issue in the battmgr / pmic_glink core. Yes,
> maybe pd-mapper exposes that. But most likely nobody have seen those
> because userspace pd-mapper usually starts much later (thanks udevadm
> trigger for triggering all the drivers).
> 
> The pd-mapper server is fine to be started early. Even the userspace
> one.  I think we went over these discussions during reviews of earlier
> series. The net result was that it is fine, provided that the response
> don't change later on (e.g.  some of the firmware might save the state
> and won't re-query it later on if servreg restarts).
> 
>> [    8.933775] remoteproc remoteproc1: powering up 32300000.remoteproc
>> [    8.934623] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fd5000.phy
>> [    8.945244] remoteproc remoteproc1: Booting fw image qcom/x1e80100/cdsp.mbn, size 3027368
>> [    8.965537] remoteproc remoteproc0: powering up 30000000.remoteproc
>> [    8.971075] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fda000.phy
>> [    8.974299] remoteproc remoteproc0: Booting fw image qcom/x1e80100/adsp.mbn, size 21424472
>> [    8.999726] msm-mdss ae00000.display-subsystem: Adding to iommu group 4
>> [    9.007697] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fdf000.phy
>> [    9.101196] remoteproc remoteproc1: remote processor 32300000.remoteproc is now up
>> [    9.103860] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.1: qcom_pdm_probe
>> [    9.105989] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.0: qcom_pdm_probe
>>
>>   - pd-mapper probing
>>
>> [    9.112983] qcom-snps-eusb2-hsphy fd3000.phy: Registered Qcom-eUSB2 phy
>> [    9.296879] remoteproc remoteproc0: remote processor 30000000.remoteproc is now up
>>
>>   - adsp is up
>>
>> [    9.300086] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 7fffffff
>>
>>   - SERVREG_SERVICE_STATE_UNINIT
>>
>> [    9.301878] qcom-snps-eusb2-hsphy fd9000.phy: Registered Qcom-eUSB2 phy
>> [    9.306985] qcom,fastrpc 30000000.remoteproc:glink-edge.fastrpcglink-apps-dsp.-1.-1: no reserved DMA memory for FAST
>> RPC
>> [    9.309924] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@3: Adding to iommu group 5
>> [    9.311367] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@4: Adding to iommu group 6
>> [    9.318330] PDR: Indication received from msm/adsp/charger_pd, state: 0x1fffffff, trans-id: 1
>>
>>   - This looks suspicious
>>
>> [    9.323924] qcom-snps-eusb2-hsphy fde000.phy: Registered Qcom-eUSB2 phy
>> [    9.325275] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@5: Adding to iommu group 7
>> [    9.326008] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@6: Adding to iommu group 8
>> [    9.326733] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@7: Adding to iommu group 9
>> [    9.336582] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 1fffffff
>>
>>   - SERVREG_SERVICE_STATE_UP
>>
>> [    9.345544] dwc3 a000000.usb: Adding to iommu group 10
>> [    9.361410] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
>> [    9.362803] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>> [    9.362882] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>>
>>   - -ECANCELED errors I reported earlier
> 
> 
> The qcom_glink_request_intent() looks like the only place which can
> return ECANCELED here. Not sure why GLINK_CMD_RX_INTENT_REQ_ACK() would
> return failure here.
> 
> It might be that the ADSP has been running the preliminary firmware,
> then it is shut down and then restarted with the proper firmware (and
> Linux fails to track that). But in this case the same error can happen
> if the pd-mapper has been running before starting the ADSP.
>

The default behavior for a GLINK channel on the firmware is to reject 
intent requests. The PMIC glink channel on the firmware may not have 
implemented an intent request handler so any intent requests would be 
rejected and result in an ECANCELED. When the firmware implements the 
glink channel this way, the linux driver has to wait until the firmware 
queues the intents for linux to use.

I guess with userspace pd-mapper, by the time the pd state is 
broadcasted to the pmic glink driver, the firmware glink channel has 
already queued intents for pmic_glink driver to use.

>>
>> [    9.364298] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
>> ...
>> [    9.364339] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> [    9.364395] CPU: 6 UID: 0 PID: 111 Comm: kworker/6:4 Not tainted 6.11.0-rc1 #70
>> [    9.364397] Hardware name: Qualcomm CRD, BIOS 6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
>> [    9.364398] Workqueue: events qcom_battmgr_enable_worker [qcom_battmgr]
>> [    9.364401] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>> [    9.364403] pc : pmic_glink_send+0xc/0x24 [pmic_glink]
>> [    9.364405] lr : qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
>> ...
>> [    9.364427] Call trace:
>> [    9.364428]  pmic_glink_send+0xc/0x24 [pmic_glink]
> 
> It looks like pmic_glink_send might need to hold pg->state_lock.
> 
>> [    9.364429]  qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
>> [    9.364430]  process_one_work+0x210/0x614
>> [    9.364435]  worker_thread+0x244/0x388
>> [    9.364436]  kthread+0x124/0x128
>> [    9.364437]  ret_from_fork+0x10/0x20
>> [    9.364439] Code: 17fffff7 d503233f a9bf7bfd 910003fd (f9400800)
>> [    9.364441] ---[ end trace 0000000000000000 ]---
>>
>> [    9.365205] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
>>
>> Johan
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-30 23:50         ` Chris Lew
@ 2024-07-31  1:41           ` Dmitry Baryshkov
  2024-07-31  6:04             ` Chris Lew
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Baryshkov @ 2024-07-31  1:41 UTC (permalink / raw)
  To: Chris Lew
  Cc: Johan Hovold, Stephan Gerhold, Bjorn Andersson, Konrad Dybcio,
	linux-arm-msm, linux-kernel, Abel Vesa

On Wed, 31 Jul 2024 at 02:50, Chris Lew <quic_clew@quicinc.com> wrote:
>
>
>
> On 7/30/2024 4:06 AM, Dmitry Baryshkov wrote:
> > On Tue, Jul 30, 2024 at 11:28:14AM GMT, Johan Hovold wrote:
> >> On Mon, Jul 29, 2024 at 04:57:54PM +0200, Johan Hovold wrote:
> >>> On Mon, Jul 08, 2024 at 06:22:09PM +0200, Stephan Gerhold wrote:
> >>>> X1E80100 has the same protection domains as SM8550, except that MPSS is
> >>>> missing. Add it to the in-kernel pd-mapper to avoid having to run the
> >>>> daemon in userspace for charging and audio functionality.
> >>>
> >>> I'm seeing a bunch of new errors when running with this patch applied on
> >>> top of 6.11-rc1. I'm assuming it is due to changes in timing that are
> >>> either exposing existing bugs or there is a general problem with the
> >>> in-kernel pd-mapper implementation.
> >>>
> >>> In any case, this does does not seem to be specific to x1e80100 even if
> >>> I'm not seeing as many issues on sc8280xp (there is one new error there
> >>> too however).
> >>>
> >>> It doesn't happen on every boot, but with the in-kernel pd-mapper I
> >>> often (every 3-4 boots) see the following errors on the x1e80100 CRD
> >>> during boot:
> >>>
> >>>     [    9.799719] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> >>>          [    9.812446] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> >>>          [    9.831796] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> >>>
> >>>     [    9.269230] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> >>>
> >>> I've also seen the following, which may also be related:
> >>>
> >>>     [   14.565059] PDR: avs/audio get domain list txn wait failed: -110
> >>>          [   14.571943] PDR: service lookup for avs/audio failed: -110
> >>>
> >>> I haven't seen the -ECANCELED (-125) errors in 30 reboots with the patch
> >>> reverted again.
> >>
> >> Here's another bug, a NULL deref in the battery driver, that is
> >> apparently exposed by the in-kernel pd-mapper. This is also on the
> >> x1e80100 CRD with a couple of added printks to indicate when the
> >> pd-mapper probes and when the pmic glink services are up:
> >
> > The backtrace looks like an issue in the battmgr / pmic_glink core. Yes,
> > maybe pd-mapper exposes that. But most likely nobody have seen those
> > because userspace pd-mapper usually starts much later (thanks udevadm
> > trigger for triggering all the drivers).
> >
> > The pd-mapper server is fine to be started early. Even the userspace
> > one.  I think we went over these discussions during reviews of earlier
> > series. The net result was that it is fine, provided that the response
> > don't change later on (e.g.  some of the firmware might save the state
> > and won't re-query it later on if servreg restarts).
> >
> >> [    8.933775] remoteproc remoteproc1: powering up 32300000.remoteproc
> >> [    8.934623] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fd5000.phy
> >> [    8.945244] remoteproc remoteproc1: Booting fw image qcom/x1e80100/cdsp.mbn, size 3027368
> >> [    8.965537] remoteproc remoteproc0: powering up 30000000.remoteproc
> >> [    8.971075] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fda000.phy
> >> [    8.974299] remoteproc remoteproc0: Booting fw image qcom/x1e80100/adsp.mbn, size 21424472
> >> [    8.999726] msm-mdss ae00000.display-subsystem: Adding to iommu group 4
> >> [    9.007697] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fdf000.phy
> >> [    9.101196] remoteproc remoteproc1: remote processor 32300000.remoteproc is now up
> >> [    9.103860] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.1: qcom_pdm_probe
> >> [    9.105989] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.0: qcom_pdm_probe
> >>
> >>   - pd-mapper probing
> >>
> >> [    9.112983] qcom-snps-eusb2-hsphy fd3000.phy: Registered Qcom-eUSB2 phy
> >> [    9.296879] remoteproc remoteproc0: remote processor 30000000.remoteproc is now up
> >>
> >>   - adsp is up
> >>
> >> [    9.300086] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 7fffffff
> >>
> >>   - SERVREG_SERVICE_STATE_UNINIT
> >>
> >> [    9.301878] qcom-snps-eusb2-hsphy fd9000.phy: Registered Qcom-eUSB2 phy
> >> [    9.306985] qcom,fastrpc 30000000.remoteproc:glink-edge.fastrpcglink-apps-dsp.-1.-1: no reserved DMA memory for FAST
> >> RPC
> >> [    9.309924] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@3: Adding to iommu group 5
> >> [    9.311367] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@4: Adding to iommu group 6
> >> [    9.318330] PDR: Indication received from msm/adsp/charger_pd, state: 0x1fffffff, trans-id: 1
> >>
> >>   - This looks suspicious
> >>
> >> [    9.323924] qcom-snps-eusb2-hsphy fde000.phy: Registered Qcom-eUSB2 phy
> >> [    9.325275] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@5: Adding to iommu group 7
> >> [    9.326008] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@6: Adding to iommu group 8
> >> [    9.326733] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@7: Adding to iommu group 9
> >> [    9.336582] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 1fffffff
> >>
> >>   - SERVREG_SERVICE_STATE_UP
> >>
> >> [    9.345544] dwc3 a000000.usb: Adding to iommu group 10
> >> [    9.361410] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
> >> [    9.362803] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> >> [    9.362882] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> >>
> >>   - -ECANCELED errors I reported earlier
> >
> >
> > The qcom_glink_request_intent() looks like the only place which can
> > return ECANCELED here. Not sure why GLINK_CMD_RX_INTENT_REQ_ACK() would
> > return failure here.
> >
> > It might be that the ADSP has been running the preliminary firmware,
> > then it is shut down and then restarted with the proper firmware (and
> > Linux fails to track that). But in this case the same error can happen
> > if the pd-mapper has been running before starting the ADSP.
> >
>
> The default behavior for a GLINK channel on the firmware is to reject
> intent requests. The PMIC glink channel on the firmware may not have
> implemented an intent request handler so any intent requests would be
> rejected and result in an ECANCELED. When the firmware implements the
> glink channel this way, the linux driver has to wait until the firmware
> queues the intents for linux to use.

Could you please describe the actual code path that would result in
-ECANCELED? Is that qcom_glink_handle_intent_req_ack(granted = 0) ?
My theory about unregistration is based upon a call path that leads to
qcom_glink_intent_req_abort(). But if I'm not mistaken, it only
happens on the subdev stop event.

Johan, Stephan, could you please trace, what leads to the ECANCELED?

Also could you please be more specific whether the described behaviour
is tied to X1E80100 platform or it is common to some / all other
platforms?

> I guess with userspace pd-mapper, by the time the pd state is
> broadcasted to the pmic glink driver, the firmware glink channel has
> already queued intents for pmic_glink driver to use.
>
> >>
> >> [    9.364298] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> >> ...
> >> [    9.364339] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> [    9.364395] CPU: 6 UID: 0 PID: 111 Comm: kworker/6:4 Not tainted 6.11.0-rc1 #70
> >> [    9.364397] Hardware name: Qualcomm CRD, BIOS 6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
> >> [    9.364398] Workqueue: events qcom_battmgr_enable_worker [qcom_battmgr]
> >> [    9.364401] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> >> [    9.364403] pc : pmic_glink_send+0xc/0x24 [pmic_glink]
> >> [    9.364405] lr : qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
> >> ...
> >> [    9.364427] Call trace:
> >> [    9.364428]  pmic_glink_send+0xc/0x24 [pmic_glink]
> >
> > It looks like pmic_glink_send might need to hold pg->state_lock.
> >
> >> [    9.364429]  qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
> >> [    9.364430]  process_one_work+0x210/0x614
> >> [    9.364435]  worker_thread+0x244/0x388
> >> [    9.364436]  kthread+0x124/0x128
> >> [    9.364437]  ret_from_fork+0x10/0x20
> >> [    9.364439] Code: 17fffff7 d503233f a9bf7bfd 910003fd (f9400800)
> >> [    9.364441] ---[ end trace 0000000000000000 ]---
> >>
> >> [    9.365205] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-31  1:41           ` Dmitry Baryshkov
@ 2024-07-31  6:04             ` Chris Lew
  2024-08-22  7:28               ` Johan Hovold
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Lew @ 2024-07-31  6:04 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Johan Hovold, Stephan Gerhold, Bjorn Andersson, Konrad Dybcio,
	linux-arm-msm, linux-kernel, Abel Vesa



On 7/30/2024 6:41 PM, Dmitry Baryshkov wrote:
> On Wed, 31 Jul 2024 at 02:50, Chris Lew <quic_clew@quicinc.com> wrote:
>>
>>
>>
>> On 7/30/2024 4:06 AM, Dmitry Baryshkov wrote:
>>> On Tue, Jul 30, 2024 at 11:28:14AM GMT, Johan Hovold wrote:
>>>> On Mon, Jul 29, 2024 at 04:57:54PM +0200, Johan Hovold wrote:
>>>>> On Mon, Jul 08, 2024 at 06:22:09PM +0200, Stephan Gerhold wrote:
>>>>>> X1E80100 has the same protection domains as SM8550, except that MPSS is
>>>>>> missing. Add it to the in-kernel pd-mapper to avoid having to run the
>>>>>> daemon in userspace for charging and audio functionality.
>>>>>
>>>>> I'm seeing a bunch of new errors when running with this patch applied on
>>>>> top of 6.11-rc1. I'm assuming it is due to changes in timing that are
>>>>> either exposing existing bugs or there is a general problem with the
>>>>> in-kernel pd-mapper implementation.
>>>>>
>>>>> In any case, this does does not seem to be specific to x1e80100 even if
>>>>> I'm not seeing as many issues on sc8280xp (there is one new error there
>>>>> too however).
>>>>>
>>>>> It doesn't happen on every boot, but with the in-kernel pd-mapper I
>>>>> often (every 3-4 boots) see the following errors on the x1e80100 CRD
>>>>> during boot:
>>>>>
>>>>>      [    9.799719] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>>>>>           [    9.812446] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>>>>>           [    9.831796] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
>>>>>
>>>>>      [    9.269230] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
>>>>>
>>>>> I've also seen the following, which may also be related:
>>>>>
>>>>>      [   14.565059] PDR: avs/audio get domain list txn wait failed: -110
>>>>>           [   14.571943] PDR: service lookup for avs/audio failed: -110
>>>>>
>>>>> I haven't seen the -ECANCELED (-125) errors in 30 reboots with the patch
>>>>> reverted again.
>>>>
>>>> Here's another bug, a NULL deref in the battery driver, that is
>>>> apparently exposed by the in-kernel pd-mapper. This is also on the
>>>> x1e80100 CRD with a couple of added printks to indicate when the
>>>> pd-mapper probes and when the pmic glink services are up:
>>>
>>> The backtrace looks like an issue in the battmgr / pmic_glink core. Yes,
>>> maybe pd-mapper exposes that. But most likely nobody have seen those
>>> because userspace pd-mapper usually starts much later (thanks udevadm
>>> trigger for triggering all the drivers).
>>>
>>> The pd-mapper server is fine to be started early. Even the userspace
>>> one.  I think we went over these discussions during reviews of earlier
>>> series. The net result was that it is fine, provided that the response
>>> don't change later on (e.g.  some of the firmware might save the state
>>> and won't re-query it later on if servreg restarts).
>>>
>>>> [    8.933775] remoteproc remoteproc1: powering up 32300000.remoteproc
>>>> [    8.934623] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fd5000.phy
>>>> [    8.945244] remoteproc remoteproc1: Booting fw image qcom/x1e80100/cdsp.mbn, size 3027368
>>>> [    8.965537] remoteproc remoteproc0: powering up 30000000.remoteproc
>>>> [    8.971075] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fda000.phy
>>>> [    8.974299] remoteproc remoteproc0: Booting fw image qcom/x1e80100/adsp.mbn, size 21424472
>>>> [    8.999726] msm-mdss ae00000.display-subsystem: Adding to iommu group 4
>>>> [    9.007697] qcom_pmic_glink pmic-glink: Failed to create device link (0x180) with fdf000.phy
>>>> [    9.101196] remoteproc remoteproc1: remote processor 32300000.remoteproc is now up
>>>> [    9.103860] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.1: qcom_pdm_probe
>>>> [    9.105989] qcom_pd_mapper.qcom-pdm-mapper qcom_common.pd-mapper.0: qcom_pdm_probe
>>>>
>>>>    - pd-mapper probing
>>>>
>>>> [    9.112983] qcom-snps-eusb2-hsphy fd3000.phy: Registered Qcom-eUSB2 phy
>>>> [    9.296879] remoteproc remoteproc0: remote processor 30000000.remoteproc is now up
>>>>
>>>>    - adsp is up
>>>>
>>>> [    9.300086] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 7fffffff
>>>>
>>>>    - SERVREG_SERVICE_STATE_UNINIT
>>>>
>>>> [    9.301878] qcom-snps-eusb2-hsphy fd9000.phy: Registered Qcom-eUSB2 phy
>>>> [    9.306985] qcom,fastrpc 30000000.remoteproc:glink-edge.fastrpcglink-apps-dsp.-1.-1: no reserved DMA memory for FAST
>>>> RPC
>>>> [    9.309924] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@3: Adding to iommu group 5
>>>> [    9.311367] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@4: Adding to iommu group 6
>>>> [    9.318330] PDR: Indication received from msm/adsp/charger_pd, state: 0x1fffffff, trans-id: 1
>>>>
>>>>    - This looks suspicious
>>>>
>>>> [    9.323924] qcom-snps-eusb2-hsphy fde000.phy: Registered Qcom-eUSB2 phy
>>>> [    9.325275] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@5: Adding to iommu group 7
>>>> [    9.326008] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@6: Adding to iommu group 8
>>>> [    9.326733] qcom,fastrpc-cb 30000000.remoteproc:glink-edge:fastrpc:compute-cb@7: Adding to iommu group 9
>>>> [    9.336582] qcom_pmic_glink pmic-glink: pmic_glink_pdr_callback - state = 1fffffff
>>>>
>>>>    - SERVREG_SERVICE_STATE_UP
>>>>
>>>> [    9.345544] dwc3 a000000.usb: Adding to iommu group 10
>>>> [    9.361410] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
>>>> [    9.362803] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>>>> [    9.362882] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>>>>
>>>>    - -ECANCELED errors I reported earlier
>>>
>>>
>>> The qcom_glink_request_intent() looks like the only place which can
>>> return ECANCELED here. Not sure why GLINK_CMD_RX_INTENT_REQ_ACK() would
>>> return failure here.
>>>
>>> It might be that the ADSP has been running the preliminary firmware,
>>> then it is shut down and then restarted with the proper firmware (and
>>> Linux fails to track that). But in this case the same error can happen
>>> if the pd-mapper has been running before starting the ADSP.
>>>
>>
>> The default behavior for a GLINK channel on the firmware is to reject
>> intent requests. The PMIC glink channel on the firmware may not have
>> implemented an intent request handler so any intent requests would be
>> rejected and result in an ECANCELED. When the firmware implements the
>> glink channel this way, the linux driver has to wait until the firmware
>> queues the intents for linux to use.
> 
> Could you please describe the actual code path that would result in
> -ECANCELED? Is that qcom_glink_handle_intent_req_ack(granted = 0) ?
> My theory about unregistration is based upon a call path that leads to
> qcom_glink_intent_req_abort(). But if I'm not mistaken, it only
> happens on the subdev stop event.
> 
> Johan, Stephan, could you please trace, what leads to the ECANCELED?
> 
> Also could you please be more specific whether the described behaviour
> is tied to X1E80100 platform or it is common to some / all other
> platforms?
> 

Ah sorry, wasn't being clear.

GLINK has a concept that is called "intents". An intent is an object 
that signifies that a remote channel is ready to receive a packet 
through GLINK. Intents can be pre-emptively queued, or they can be 
requested by the sending entity. GLINK will not try to send or it will 
block until there is an intent available.

Intents are exchanged with GLINK_CMD_INTENT packets. When Linux receives 
one of these packets we add it to an idr "riids".

Example sending call:
     pmic_glink_send() --> rpmsg_send() --> qcom_glink_send() --> 
__qcom_glink_send() --> qcom_glink_request_intent()

In __qcom_glink_send(), we check if there are any available intents in 
"riids", if there aren't any intents we request an intent through 
qcom_glink_request_intent(). This sends a GLINK_CMD_RX_INTENT_REQ packet 
to the remote and waits for a GLINK_CMD_RX_INTENT_REQ_ACK packet in 
return. This ack packet will have a field that says whether the intent 
has been granted or not. When linux gets this ack packet, we will wake 
up the thread waiting in qcom_glink_request_intent().

The ECANCELED comes from qcom_glink_request_intent() when we receive a 
GLINK_CMD_RX_INTENT_REQ_ACK that has granted == false.

On the firmware, when a glink channel is registered they can optionally 
fill in a handler for GLINK_CMD_RX_INTENT_REQ packets. If this handler 
is not configured, then a default one will be used where all 
GLINK_CMD_RX_INTENT_REQ packets will be responded with 
GLINK_CMD_RX_INTENT_REQ_ACK and granted == false. If a channel is 
implemented this way, then the only thing Linux can do is wait and retry 
until the remote queues the intents on its own accord.

This would be my current guess as to what's happening based on this not 
being consistent and only seen every couple of reboots. A stop path 
problem sounds like it should happen every time, and we should also see 
the remoteproc prints related to powering down the adsp. The above race 
should be applicable to all platforms but depends on the speed of the 
ADSP vs the CPU.

>> I guess with userspace pd-mapper, by the time the pd state is
>> broadcasted to the pmic glink driver, the firmware glink channel has
>> already queued intents for pmic_glink driver to use.
>>
>>>>
>>>> [    9.364298] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
>>>> ...
>>>> [    9.364339] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>> [    9.364395] CPU: 6 UID: 0 PID: 111 Comm: kworker/6:4 Not tainted 6.11.0-rc1 #70
>>>> [    9.364397] Hardware name: Qualcomm CRD, BIOS 6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
>>>> [    9.364398] Workqueue: events qcom_battmgr_enable_worker [qcom_battmgr]
>>>> [    9.364401] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>>>> [    9.364403] pc : pmic_glink_send+0xc/0x24 [pmic_glink]
>>>> [    9.364405] lr : qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
>>>> ...
>>>> [    9.364427] Call trace:
>>>> [    9.364428]  pmic_glink_send+0xc/0x24 [pmic_glink]
>>>
>>> It looks like pmic_glink_send might need to hold pg->state_lock.
>>>
>>>> [    9.364429]  qcom_battmgr_enable_worker+0x60/0xbc [qcom_battmgr]
>>>> [    9.364430]  process_one_work+0x210/0x614
>>>> [    9.364435]  worker_thread+0x244/0x388
>>>> [    9.364436]  kthread+0x124/0x128
>>>> [    9.364437]  ret_from_fork+0x10/0x20
>>>> [    9.364439] Code: 17fffff7 d503233f a9bf7bfd 910003fd (f9400800)
>>>> [    9.364441] ---[ end trace 0000000000000000 ]---
>>>>
>>>> [    9.365205] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms
  2024-07-08 16:22 [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms Stephan Gerhold
  2024-07-08 16:22 ` [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100 Stephan Gerhold
  2024-07-08 16:22 ` [PATCH 2/2] soc: qcom: pd_mapper: Add more older platforms without domains Stephan Gerhold
@ 2024-08-15 20:40 ` Bjorn Andersson
  2 siblings, 0 replies; 15+ messages in thread
From: Bjorn Andersson @ 2024-08-15 20:40 UTC (permalink / raw)
  To: Konrad Dybcio, Stephan Gerhold
  Cc: linux-arm-msm, linux-kernel, Dmitry Baryshkov, Abel Vesa,
	Johan Hovold


On Mon, 08 Jul 2024 18:22:08 +0200, Stephan Gerhold wrote:
> Add X1E80100 to the in-kernel pd-mapper to avoid having to run the
> userspace daemon for charging and audio functionality. Also add entries for
> some missing older platforms without protection domains.
> 
> 

Applied, thanks!

[1/2] soc: qcom: pd_mapper: Add X1E80100
      commit: bd6db1f1486eedb8460647b3fbe9b5ae8fd09207
[2/2] soc: qcom: pd_mapper: Add more older platforms without domains
      commit: ed2c37520893427dfb8b615fcc6263af796a0ab1

Best regards,
-- 
Bjorn Andersson <andersson@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-07-31  6:04             ` Chris Lew
@ 2024-08-22  7:28               ` Johan Hovold
  2024-09-04 12:40                 ` Johan Hovold
  0 siblings, 1 reply; 15+ messages in thread
From: Johan Hovold @ 2024-08-22  7:28 UTC (permalink / raw)
  To: Chris Lew
  Cc: Dmitry Baryshkov, Stephan Gerhold, Bjorn Andersson, Konrad Dybcio,
	linux-arm-msm, linux-kernel, Abel Vesa

On Tue, Jul 30, 2024 at 11:04:58PM -0700, Chris Lew wrote:

> GLINK has a concept that is called "intents". An intent is an object 
> that signifies that a remote channel is ready to receive a packet 
> through GLINK. Intents can be pre-emptively queued, or they can be 
> requested by the sending entity. GLINK will not try to send or it will 
> block until there is an intent available.
> 
> Intents are exchanged with GLINK_CMD_INTENT packets. When Linux receives 
> one of these packets we add it to an idr "riids".
> 
> Example sending call:
>      pmic_glink_send() --> rpmsg_send() --> qcom_glink_send() --> 
> __qcom_glink_send() --> qcom_glink_request_intent()
> 
> In __qcom_glink_send(), we check if there are any available intents in 
> "riids", if there aren't any intents we request an intent through 
> qcom_glink_request_intent(). This sends a GLINK_CMD_RX_INTENT_REQ packet 
> to the remote and waits for a GLINK_CMD_RX_INTENT_REQ_ACK packet in 
> return. This ack packet will have a field that says whether the intent 
> has been granted or not. When linux gets this ack packet, we will wake 
> up the thread waiting in qcom_glink_request_intent().
> 
> The ECANCELED comes from qcom_glink_request_intent() when we receive a 
> GLINK_CMD_RX_INTENT_REQ_ACK that has granted == false.
> 
> On the firmware, when a glink channel is registered they can optionally 
> fill in a handler for GLINK_CMD_RX_INTENT_REQ packets. If this handler 
> is not configured, then a default one will be used where all 
> GLINK_CMD_RX_INTENT_REQ packets will be responded with 
> GLINK_CMD_RX_INTENT_REQ_ACK and granted == false. If a channel is 
> implemented this way, then the only thing Linux can do is wait and retry 
> until the remote queues the intents on its own accord.
> 
> This would be my current guess as to what's happening based on this not 
> being consistent and only seen every couple of reboots. A stop path 
> problem sounds like it should happen every time, and we should also see 
> the remoteproc prints related to powering down the adsp. The above race 
> should be applicable to all platforms but depends on the speed of the 
> ADSP vs the CPU.

Thanks for the above. This indeed seems to match what I'm seeing as I
also reported here [1]:

[    9.539415]  30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0
[    9.561750] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications

[    9.448945]  30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0
[    9.461267] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
[    9.469241] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
[    9.478968] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125

I assume we do not want to have every client driver implement a retry
loop for the first communication with the remote end, so can this be
handled by the pmic_glink driver somehow? For example, by not forwarding
state changes until some generic request has gone through?

And what about the audio service errors:

	[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
        [   14.571943] PDR: service lookup for avs/audio failed: -110

Does this seem to be a separate (but related) issue or just a different
symptom?

Johan


[1] https://lore.kernel.org/lkml/ZsRGV4hplvidpYji@hovoldconsulting.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-08-22  7:28               ` Johan Hovold
@ 2024-09-04 12:40                 ` Johan Hovold
  2024-09-04 13:27                   ` Dmitry Baryshkov
  0 siblings, 1 reply; 15+ messages in thread
From: Johan Hovold @ 2024-09-04 12:40 UTC (permalink / raw)
  To: Chris Lew, Dmitry Baryshkov, Bjorn Andersson
  Cc: Stephan Gerhold, Konrad Dybcio, linux-arm-msm, linux-kernel,
	Abel Vesa

On Thu, Aug 22, 2024 at 09:28:26AM +0200, Johan Hovold wrote:
> On Tue, Jul 30, 2024 at 11:04:58PM -0700, Chris Lew wrote:
> 
> > GLINK has a concept that is called "intents". An intent is an object 
> > that signifies that a remote channel is ready to receive a packet 
> > through GLINK. Intents can be pre-emptively queued, or they can be 
> > requested by the sending entity. GLINK will not try to send or it will 
> > block until there is an intent available.
> > 
> > Intents are exchanged with GLINK_CMD_INTENT packets. When Linux receives 
> > one of these packets we add it to an idr "riids".
> > 
> > Example sending call:
> >      pmic_glink_send() --> rpmsg_send() --> qcom_glink_send() --> 
> > __qcom_glink_send() --> qcom_glink_request_intent()
> > 
> > In __qcom_glink_send(), we check if there are any available intents in 
> > "riids", if there aren't any intents we request an intent through 
> > qcom_glink_request_intent(). This sends a GLINK_CMD_RX_INTENT_REQ packet 
> > to the remote and waits for a GLINK_CMD_RX_INTENT_REQ_ACK packet in 
> > return. This ack packet will have a field that says whether the intent 
> > has been granted or not. When linux gets this ack packet, we will wake 
> > up the thread waiting in qcom_glink_request_intent().
> > 
> > The ECANCELED comes from qcom_glink_request_intent() when we receive a 
> > GLINK_CMD_RX_INTENT_REQ_ACK that has granted == false.
> > 
> > On the firmware, when a glink channel is registered they can optionally 
> > fill in a handler for GLINK_CMD_RX_INTENT_REQ packets. If this handler 
> > is not configured, then a default one will be used where all 
> > GLINK_CMD_RX_INTENT_REQ packets will be responded with 
> > GLINK_CMD_RX_INTENT_REQ_ACK and granted == false. If a channel is 
> > implemented this way, then the only thing Linux can do is wait and retry 
> > until the remote queues the intents on its own accord.
> > 
> > This would be my current guess as to what's happening based on this not 
> > being consistent and only seen every couple of reboots. A stop path 
> > problem sounds like it should happen every time, and we should also see 
> > the remoteproc prints related to powering down the adsp. The above race 
> > should be applicable to all platforms but depends on the speed of the 
> > ADSP vs the CPU.
> 
> Thanks for the above. This indeed seems to match what I'm seeing as I
> also reported here [1]:
> 
> [    9.539415]  30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0
> [    9.561750] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> 
> [    9.448945]  30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0
> [    9.461267] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> [    9.469241] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
> [    9.478968] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> 
> I assume we do not want to have every client driver implement a retry
> loop for the first communication with the remote end, so can this be
> handled by the pmic_glink driver somehow? For example, by not forwarding
> state changes until some generic request has gone through?

Has there been any progress on this issue? It's already been five weeks
since my initial report and we're closing in on the merge window for
6.12. If this isn't resolved soon, I'll send patch to disable the
in-kernel pd-mapper by marking it as broken.

> And what about the audio service errors:
> 
> 	[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
>	[   14.571943] PDR: service lookup for avs/audio failed: -110
> 
> Does this seem to be a separate (but related) issue or just a different
> symptom?

I can confirm that the audio breakage is also related to the in-kernel
pd-mapper. I hit it after 30 reboots with the in-kernel pd-mapper, but
have not seen it with the user space service (e.g. after 100+ reboots).

> [1] https://lore.kernel.org/lkml/ZsRGV4hplvidpYji@hovoldconsulting.com/

Johan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100
  2024-09-04 12:40                 ` Johan Hovold
@ 2024-09-04 13:27                   ` Dmitry Baryshkov
  0 siblings, 0 replies; 15+ messages in thread
From: Dmitry Baryshkov @ 2024-09-04 13:27 UTC (permalink / raw)
  To: Johan Hovold, Chris Lew, Bjorn Andersson
  Cc: Stephan Gerhold, Konrad Dybcio, linux-arm-msm, linux-kernel,
	Abel Vesa

On 4 September 2024 15:40:44 EEST, Johan Hovold <johan@kernel.org> wrote:
>On Thu, Aug 22, 2024 at 09:28:26AM +0200, Johan Hovold wrote:
>> On Tue, Jul 30, 2024 at 11:04:58PM -0700, Chris Lew wrote:
>> 
>> > GLINK has a concept that is called "intents". An intent is an object 
>> > that signifies that a remote channel is ready to receive a packet 
>> > through GLINK. Intents can be pre-emptively queued, or they can be 
>> > requested by the sending entity. GLINK will not try to send or it will 
>> > block until there is an intent available.
>> > 
>> > Intents are exchanged with GLINK_CMD_INTENT packets. When Linux receives 
>> > one of these packets we add it to an idr "riids".
>> > 
>> > Example sending call:
>> >      pmic_glink_send() --> rpmsg_send() --> qcom_glink_send() --> 
>> > __qcom_glink_send() --> qcom_glink_request_intent()
>> > 
>> > In __qcom_glink_send(), we check if there are any available intents in 
>> > "riids", if there aren't any intents we request an intent through 
>> > qcom_glink_request_intent(). This sends a GLINK_CMD_RX_INTENT_REQ packet 
>> > to the remote and waits for a GLINK_CMD_RX_INTENT_REQ_ACK packet in 
>> > return. This ack packet will have a field that says whether the intent 
>> > has been granted or not. When linux gets this ack packet, we will wake 
>> > up the thread waiting in qcom_glink_request_intent().
>> > 
>> > The ECANCELED comes from qcom_glink_request_intent() when we receive a 
>> > GLINK_CMD_RX_INTENT_REQ_ACK that has granted == false.
>> > 
>> > On the firmware, when a glink channel is registered they can optionally 
>> > fill in a handler for GLINK_CMD_RX_INTENT_REQ packets. If this handler 
>> > is not configured, then a default one will be used where all 
>> > GLINK_CMD_RX_INTENT_REQ packets will be responded with 
>> > GLINK_CMD_RX_INTENT_REQ_ACK and granted == false. If a channel is 
>> > implemented this way, then the only thing Linux can do is wait and retry 
>> > until the remote queues the intents on its own accord.
>> > 
>> > This would be my current guess as to what's happening based on this not 
>> > being consistent and only seen every couple of reboots. A stop path 
>> > problem sounds like it should happen every time, and we should also see 
>> > the remoteproc prints related to powering down the adsp. The above race 
>> > should be applicable to all platforms but depends on the speed of the 
>> > ADSP vs the CPU.
>> 
>> Thanks for the above. This indeed seems to match what I'm seeing as I
>> also reported here [1]:
>> 
>> [    9.539415]  30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0
>> [    9.561750] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
>> 
>> [    9.448945]  30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0
>> [    9.461267] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>> [    9.469241] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
>> [    9.478968] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>> 
>> I assume we do not want to have every client driver implement a retry
>> loop for the first communication with the remote end, so can this be
>> handled by the pmic_glink driver somehow? For example, by not forwarding
>> state changes until some generic request has gone through?
>
>Has there been any progress on this issue? It's already been five weeks
>since my initial report and we're closing in on the merge window for
>6.12.

I have been on vacation for 3 weeks. If I can reproduce it on the hardware I have access to, I will work on it during the next few days.

> If this isn't resolved soon, I'll send patch to disable the
>in-kernel pd-mapper by marking it as broken.
>
>> And what about the audio service errors:
>> 
>> 	[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
>>	[   14.571943] PDR: service lookup for avs/audio failed: -110
>> 
>> Does this seem to be a separate (but related) issue or just a different
>> symptom?
>
>I can confirm that the audio breakage is also related to the in-kernel
>pd-mapper. I hit it after 30 reboots with the in-kernel pd-mapper, but
>have not seen it with the user space service (e.g. after 100+ reboots).
>
>> [1] https://lore.kernel.org/lkml/ZsRGV4hplvidpYji@hovoldconsulting.com/
>
>Johan


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-09-04 13:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-08 16:22 [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms Stephan Gerhold
2024-07-08 16:22 ` [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100 Stephan Gerhold
2024-07-08 20:22   ` Dmitry Baryshkov
2024-07-29 14:57   ` Johan Hovold
2024-07-30  9:28     ` Johan Hovold
2024-07-30 11:06       ` Dmitry Baryshkov
2024-07-30 23:50         ` Chris Lew
2024-07-31  1:41           ` Dmitry Baryshkov
2024-07-31  6:04             ` Chris Lew
2024-08-22  7:28               ` Johan Hovold
2024-09-04 12:40                 ` Johan Hovold
2024-09-04 13:27                   ` Dmitry Baryshkov
2024-07-08 16:22 ` [PATCH 2/2] soc: qcom: pd_mapper: Add more older platforms without domains Stephan Gerhold
2024-07-08 20:22   ` Dmitry Baryshkov
2024-08-15 20:40 ` [PATCH 0/2] soc: qcom: pd_mapper: Add X1E80100 and older platforms Bjorn Andersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox