stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] soc: qcom: mark pd-mapper as broken
@ 2024-10-10  7:42 Johan Hovold
  2024-10-10  9:55 ` Dmitry Baryshkov
  2024-10-11 10:01 ` Stephan Gerhold
  0 siblings, 2 replies; 19+ messages in thread
From: Johan Hovold @ 2024-10-10  7:42 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio
  Cc: Dmitry Baryshkov, Chris Lew, Stephan Gerhold, Abel Vesa,
	linux-arm-msm, linux-kernel, regressions, Johan Hovold, stable

When using the in-kernel pd-mapper on x1e80100, client drivers often
fail to communicate with the firmware during boot, which specifically
breaks battery and USB-C altmode notifications. This has been observed
to happen on almost every second boot (41%) but likely depends on probe
order:

    pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
    pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125

    ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125

    qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications

In the same setup audio also fails to probe albeit much more rarely:

    PDR: avs/audio get domain list txn wait failed: -110
    PDR: service lookup for avs/audio failed: -110

Chris Lew has provided an analysis and is working on a fix for the
ECANCELED (125) errors, but it is not yet clear whether this will also
address the audio regression.

Even if this was first observed on x1e80100 there is currently no reason
to believe that these issues are specific to that platform.

Disable the in-kernel pd-mapper for now, and make sure to backport this
to stable to prevent users and distros from migrating away from the
user-space service.

Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
Cc: stable@vger.kernel.org	# 6.11
Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---

It's now been over two months since I reported this regression, and even
if we seem to be making some progress on at least some of these issues I
think we need disable the pd-mapper temporarily until the fixes are in
place (e.g. to prevent distros from dropping the user-space service).

Johan


#regzbot introduced: 1ebcde047c54


 drivers/soc/qcom/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 74b9121240f8..35ddab9338d4 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -78,6 +78,7 @@ config QCOM_PD_MAPPER
 	select QCOM_PDR_MSG
 	select AUXILIARY_BUS
 	depends on NET && QRTR && (ARCH_QCOM || COMPILE_TEST)
+	depends on BROKEN
 	default QCOM_RPROC_COMMON
 	help
 	  The Protection Domain Mapper maps registered services to the domains
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10  7:42 [PATCH] soc: qcom: mark pd-mapper as broken Johan Hovold
@ 2024-10-10  9:55 ` Dmitry Baryshkov
  2024-10-10 10:11   ` Johan Hovold
  2024-10-11 10:01 ` Stephan Gerhold
  1 sibling, 1 reply; 19+ messages in thread
From: Dmitry Baryshkov @ 2024-10-10  9:55 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Bjorn Andersson, Konrad Dybcio, Chris Lew, Stephan Gerhold,
	Abel Vesa, linux-arm-msm, linux-kernel, regressions, stable

On Thu, 10 Oct 2024 at 10:44, Johan Hovold <johan+linaro@kernel.org> wrote:
>
> When using the in-kernel pd-mapper on x1e80100, client drivers often
> fail to communicate with the firmware during boot, which specifically
> breaks battery and USB-C altmode notifications. This has been observed
> to happen on almost every second boot (41%) but likely depends on probe
> order:
>
>     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>
>     ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
>
>     qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
>
> In the same setup audio also fails to probe albeit much more rarely:
>
>     PDR: avs/audio get domain list txn wait failed: -110
>     PDR: service lookup for avs/audio failed: -110
>
> Chris Lew has provided an analysis and is working on a fix for the
> ECANCELED (125) errors, but it is not yet clear whether this will also
> address the audio regression.
>
> Even if this was first observed on x1e80100 there is currently no reason
> to believe that these issues are specific to that platform.
>
> Disable the in-kernel pd-mapper for now, and make sure to backport this
> to stable to prevent users and distros from migrating away from the
> user-space service.
>
> Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
> Cc: stable@vger.kernel.org      # 6.11
> Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>

Please don't break what is working. pd_mapper is working on all
previous platforms. I suggest reverting commit bd6db1f1486e ("soc:
qcom: pd_mapper: Add X1E80100") instead.

> ---
>
> It's now been over two months since I reported this regression, and even
> if we seem to be making some progress on at least some of these issues I
> think we need disable the pd-mapper temporarily until the fixes are in
> place (e.g. to prevent distros from dropping the user-space service).
>
> Johan
>
>
> #regzbot introduced: 1ebcde047c54
>
>
>  drivers/soc/qcom/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
> index 74b9121240f8..35ddab9338d4 100644
> --- a/drivers/soc/qcom/Kconfig
> +++ b/drivers/soc/qcom/Kconfig
> @@ -78,6 +78,7 @@ config QCOM_PD_MAPPER
>         select QCOM_PDR_MSG
>         select AUXILIARY_BUS
>         depends on NET && QRTR && (ARCH_QCOM || COMPILE_TEST)
> +       depends on BROKEN
>         default QCOM_RPROC_COMMON
>         help
>           The Protection Domain Mapper maps registered services to the domains
> --
> 2.45.2
>


-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10  9:55 ` Dmitry Baryshkov
@ 2024-10-10 10:11   ` Johan Hovold
  2024-10-10 10:55     ` Dmitry Baryshkov
  0 siblings, 1 reply; 19+ messages in thread
From: Johan Hovold @ 2024-10-10 10:11 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Johan Hovold, Bjorn Andersson, Konrad Dybcio, Chris Lew,
	Stephan Gerhold, Abel Vesa, linux-arm-msm, linux-kernel,
	regressions, stable

On Thu, Oct 10, 2024 at 12:55:48PM +0300, Dmitry Baryshkov wrote:
> On Thu, 10 Oct 2024 at 10:44, Johan Hovold <johan+linaro@kernel.org> wrote:
> >
> > When using the in-kernel pd-mapper on x1e80100, client drivers often
> > fail to communicate with the firmware during boot, which specifically
> > breaks battery and USB-C altmode notifications. This has been observed
> > to happen on almost every second boot (41%) but likely depends on probe
> > order:
> >
> >     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> >     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> >
> >     ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> >
> >     qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> >
> > In the same setup audio also fails to probe albeit much more rarely:
> >
> >     PDR: avs/audio get domain list txn wait failed: -110
> >     PDR: service lookup for avs/audio failed: -110
> >
> > Chris Lew has provided an analysis and is working on a fix for the
> > ECANCELED (125) errors, but it is not yet clear whether this will also
> > address the audio regression.
> >
> > Even if this was first observed on x1e80100 there is currently no reason
> > to believe that these issues are specific to that platform.
> >
> > Disable the in-kernel pd-mapper for now, and make sure to backport this
> > to stable to prevent users and distros from migrating away from the
> > user-space service.
> >
> > Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
> > Cc: stable@vger.kernel.org      # 6.11
> > Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> 
> Please don't break what is working. pd_mapper is working on all
> previous platforms. I suggest reverting commit bd6db1f1486e ("soc:
> qcom: pd_mapper: Add X1E80100") instead.

As I tried to explain in the commit message, there is currently nothing
indicating that these issues are specific to x1e80100 (even if you may
not hit them in your setup depending on things like probe order).

Let's disable it until the underlying bugs have been addressed.

Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 10:11   ` Johan Hovold
@ 2024-10-10 10:55     ` Dmitry Baryshkov
  2024-10-10 11:44       ` Johan Hovold
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Baryshkov @ 2024-10-10 10:55 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Johan Hovold, Bjorn Andersson, Konrad Dybcio, Chris Lew,
	Stephan Gerhold, Abel Vesa, linux-arm-msm, linux-kernel,
	regressions, stable

On Thu, 10 Oct 2024 at 13:11, Johan Hovold <johan@kernel.org> wrote:
>
> On Thu, Oct 10, 2024 at 12:55:48PM +0300, Dmitry Baryshkov wrote:
> > On Thu, 10 Oct 2024 at 10:44, Johan Hovold <johan+linaro@kernel.org> wrote:
> > >
> > > When using the in-kernel pd-mapper on x1e80100, client drivers often
> > > fail to communicate with the firmware during boot, which specifically
> > > breaks battery and USB-C altmode notifications. This has been observed
> > > to happen on almost every second boot (41%) but likely depends on probe
> > > order:
> > >
> > >     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> > >     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> > >
> > >     ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> > >
> > >     qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> > >
> > > In the same setup audio also fails to probe albeit much more rarely:
> > >
> > >     PDR: avs/audio get domain list txn wait failed: -110
> > >     PDR: service lookup for avs/audio failed: -110
> > >
> > > Chris Lew has provided an analysis and is working on a fix for the
> > > ECANCELED (125) errors, but it is not yet clear whether this will also
> > > address the audio regression.
> > >
> > > Even if this was first observed on x1e80100 there is currently no reason
> > > to believe that these issues are specific to that platform.
> > >
> > > Disable the in-kernel pd-mapper for now, and make sure to backport this
> > > to stable to prevent users and distros from migrating away from the
> > > user-space service.
> > >
> > > Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
> > > Cc: stable@vger.kernel.org      # 6.11
> > > Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> > > Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> >
> > Please don't break what is working. pd_mapper is working on all
> > previous platforms. I suggest reverting commit bd6db1f1486e ("soc:
> > qcom: pd_mapper: Add X1E80100") instead.
>
> As I tried to explain in the commit message, there is currently nothing
> indicating that these issues are specific to x1e80100 (even if you may
> not hit them in your setup depending on things like probe order).

I have the understanding that the issues are related to the ADSP
switching the firmware on the fly, which is only used on X1E8.

>
> Let's disable it until the underlying bugs have been addressed.
>
> Johan



-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 10:55     ` Dmitry Baryshkov
@ 2024-10-10 11:44       ` Johan Hovold
  2024-10-10 11:46         ` neil.armstrong
  0 siblings, 1 reply; 19+ messages in thread
From: Johan Hovold @ 2024-10-10 11:44 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Johan Hovold, Bjorn Andersson, Konrad Dybcio, Chris Lew,
	Stephan Gerhold, Abel Vesa, linux-arm-msm, linux-kernel,
	regressions, stable

On Thu, Oct 10, 2024 at 01:55:11PM +0300, Dmitry Baryshkov wrote:
> On Thu, 10 Oct 2024 at 13:11, Johan Hovold <johan@kernel.org> wrote:
> > On Thu, Oct 10, 2024 at 12:55:48PM +0300, Dmitry Baryshkov wrote:

> > > Please don't break what is working. pd_mapper is working on all
> > > previous platforms. I suggest reverting commit bd6db1f1486e ("soc:
> > > qcom: pd_mapper: Add X1E80100") instead.
> >
> > As I tried to explain in the commit message, there is currently nothing
> > indicating that these issues are specific to x1e80100 (even if you may
> > not hit them in your setup depending on things like probe order).
> 
> I have the understanding that the issues are related to the ADSP
> switching the firmware on the fly, which is only used on X1E8.

Is this speculation on your part or something that has recently been
confirmed to be the case? AFAIK, there is nothing SoC specific about the
ECANCELED issue, and we also still do not know what is causing the audio
regression.

The thing is, we have a working and well-tested solution in the
user-space service so there is no rush to switch to the in-kernel one
(and risk distros removing the user-space service) before this has been
fixed.

Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 11:44       ` Johan Hovold
@ 2024-10-10 11:46         ` neil.armstrong
  2024-10-10 13:24           ` Johan Hovold
  0 siblings, 1 reply; 19+ messages in thread
From: neil.armstrong @ 2024-10-10 11:46 UTC (permalink / raw)
  To: Johan Hovold, Dmitry Baryshkov
  Cc: Johan Hovold, Bjorn Andersson, Konrad Dybcio, Chris Lew,
	Stephan Gerhold, Abel Vesa, linux-arm-msm, linux-kernel,
	regressions, stable

On 10/10/2024 13:44, Johan Hovold wrote:
> On Thu, Oct 10, 2024 at 01:55:11PM +0300, Dmitry Baryshkov wrote:
>> On Thu, 10 Oct 2024 at 13:11, Johan Hovold <johan@kernel.org> wrote:
>>> On Thu, Oct 10, 2024 at 12:55:48PM +0300, Dmitry Baryshkov wrote:
> 
>>>> Please don't break what is working. pd_mapper is working on all
>>>> previous platforms. I suggest reverting commit bd6db1f1486e ("soc:
>>>> qcom: pd_mapper: Add X1E80100") instead.
>>>
>>> As I tried to explain in the commit message, there is currently nothing
>>> indicating that these issues are specific to x1e80100 (even if you may
>>> not hit them in your setup depending on things like probe order).
>>
>> I have the understanding that the issues are related to the ADSP
>> switching the firmware on the fly, which is only used on X1E8.
> 
> Is this speculation on your part or something that has recently been
> confirmed to be the case? AFAIK, there is nothing SoC specific about the
> ECANCELED issue, and we also still do not know what is causing the audio
> regression.
> 
> The thing is, we have a working and well-tested solution in the
> user-space service so there is no rush to switch to the in-kernel one
> (and risk distros removing the user-space service) before this has been
> fixed.

The in-kernel pd-mapper works fine on SM8550 and SM8650, please just revert
the X1E8 patch as suggested by Dmitry.

Neil

> 
> Johan
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 11:46         ` neil.armstrong
@ 2024-10-10 13:24           ` Johan Hovold
  2024-10-10 13:45             ` Dmitry Baryshkov
  0 siblings, 1 reply; 19+ messages in thread
From: Johan Hovold @ 2024-10-10 13:24 UTC (permalink / raw)
  To: neil.armstrong
  Cc: Dmitry Baryshkov, Johan Hovold, Bjorn Andersson, Konrad Dybcio,
	Chris Lew, Stephan Gerhold, Abel Vesa, linux-arm-msm,
	linux-kernel, regressions, stable

On Thu, Oct 10, 2024 at 01:46:48PM +0200, neil.armstrong@linaro.org wrote:
> >> On Thu, 10 Oct 2024 at 13:11, Johan Hovold <johan@kernel.org> wrote:

> >>> As I tried to explain in the commit message, there is currently nothing
> >>> indicating that these issues are specific to x1e80100 (even if you may
> >>> not hit them in your setup depending on things like probe order).

> The in-kernel pd-mapper works fine on SM8550 and SM8650, please just revert
> the X1E8 patch as suggested by Dmitry.

Again, you may just be lucky, we have x1e users that also don't hit
these issues due to how things are timed during boot in their setups.

If there's some actual evidence that suggests that this is limited to
x1e, then that would of course be a different matter, but I'm not aware
of anything like that currently.

Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 13:24           ` Johan Hovold
@ 2024-10-10 13:45             ` Dmitry Baryshkov
  2024-10-10 14:07               ` Johan Hovold
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Baryshkov @ 2024-10-10 13:45 UTC (permalink / raw)
  To: Johan Hovold
  Cc: neil.armstrong, Johan Hovold, Bjorn Andersson, Konrad Dybcio,
	Chris Lew, Stephan Gerhold, Abel Vesa, linux-arm-msm,
	linux-kernel, regressions, stable

On Thu, Oct 10, 2024 at 03:24:19PM GMT, Johan Hovold wrote:
> On Thu, Oct 10, 2024 at 01:46:48PM +0200, neil.armstrong@linaro.org wrote:
> > >> On Thu, 10 Oct 2024 at 13:11, Johan Hovold <johan@kernel.org> wrote:
> 
> > >>> As I tried to explain in the commit message, there is currently nothing
> > >>> indicating that these issues are specific to x1e80100 (even if you may
> > >>> not hit them in your setup depending on things like probe order).
> 
> > The in-kernel pd-mapper works fine on SM8550 and SM8650, please just revert
> > the X1E8 patch as suggested by Dmitry.
> 
> Again, you may just be lucky, we have x1e users that also don't hit
> these issues due to how things are timed during boot in their setups.
> 
> If there's some actual evidence that suggests that this is limited to
> x1e, then that would of course be a different matter, but I'm not aware
> of anything like that currently.

Is there an evidence that it is broken on other platforms? I have been
daily driving the pd-mapper in my testing kernels for a long period of
time.

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 13:45             ` Dmitry Baryshkov
@ 2024-10-10 14:07               ` Johan Hovold
  2024-10-10 14:13                 ` Dmitry Baryshkov
  0 siblings, 1 reply; 19+ messages in thread
From: Johan Hovold @ 2024-10-10 14:07 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: neil.armstrong, Johan Hovold, Bjorn Andersson, Konrad Dybcio,
	Chris Lew, Stephan Gerhold, Abel Vesa, linux-arm-msm,
	linux-kernel, regressions, stable

On Thu, Oct 10, 2024 at 04:45:57PM +0300, Dmitry Baryshkov wrote:
> On Thu, Oct 10, 2024 at 03:24:19PM GMT, Johan Hovold wrote:

> > Again, you may just be lucky, we have x1e users that also don't hit
> > these issues due to how things are timed during boot in their setups.
> > 
> > If there's some actual evidence that suggests that this is limited to
> > x1e, then that would of course be a different matter, but I'm not aware
> > of anything like that currently.
> 
> Is there an evidence that it is broken on other platforms? I have been
> daily driving the pd-mapper in my testing kernels for a long period of
> time.

Yes, Chris's analysis of the ECANCELED issue suggests that this is not
SoC specific.

Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 14:07               ` Johan Hovold
@ 2024-10-10 14:13                 ` Dmitry Baryshkov
  2024-10-10 14:20                   ` Johan Hovold
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Baryshkov @ 2024-10-10 14:13 UTC (permalink / raw)
  To: Johan Hovold
  Cc: neil.armstrong, Johan Hovold, Bjorn Andersson, Konrad Dybcio,
	Chris Lew, Stephan Gerhold, Abel Vesa, linux-arm-msm,
	linux-kernel, regressions, stable

On Thu, 10 Oct 2024 at 17:07, Johan Hovold <johan@kernel.org> wrote:
>
> On Thu, Oct 10, 2024 at 04:45:57PM +0300, Dmitry Baryshkov wrote:
> > On Thu, Oct 10, 2024 at 03:24:19PM GMT, Johan Hovold wrote:
>
> > > Again, you may just be lucky, we have x1e users that also don't hit
> > > these issues due to how things are timed during boot in their setups.
> > >
> > > If there's some actual evidence that suggests that this is limited to
> > > x1e, then that would of course be a different matter, but I'm not aware
> > > of anything like that currently.
> >
> > Is there an evidence that it is broken on other platforms? I have been
> > daily driving the pd-mapper in my testing kernels for a long period of
> > time.
>
> Yes, Chris's analysis of the ECANCELED issue suggests that this is not
> SoC specific.

"When the firmware implements the glink channel this way...", etc.
Yes, it doesn't sound like being SoC-specific, but we don't know which
SoC use this implementation.

>
> Johan



-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 14:13                 ` Dmitry Baryshkov
@ 2024-10-10 14:20                   ` Johan Hovold
  2024-10-10 14:42                     ` Dmitry Baryshkov
  0 siblings, 1 reply; 19+ messages in thread
From: Johan Hovold @ 2024-10-10 14:20 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: neil.armstrong, Johan Hovold, Bjorn Andersson, Konrad Dybcio,
	Chris Lew, Stephan Gerhold, Abel Vesa, linux-arm-msm,
	linux-kernel, regressions, stable

On Thu, Oct 10, 2024 at 05:13:44PM +0300, Dmitry Baryshkov wrote:
> On Thu, 10 Oct 2024 at 17:07, Johan Hovold <johan@kernel.org> wrote:

> > Yes, Chris's analysis of the ECANCELED issue suggests that this is not
> > SoC specific.
> 
> "When the firmware implements the glink channel this way...", etc.
> Yes, it doesn't sound like being SoC-specific, but we don't know which
> SoC use this implementation.

So let's err on the safe side until we have more information and avoid
having distros drop the user-space daemon until these known bugs exposed
by the in-kernel pd-mapper have been fixed.

Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10 14:20                   ` Johan Hovold
@ 2024-10-10 14:42                     ` Dmitry Baryshkov
  0 siblings, 0 replies; 19+ messages in thread
From: Dmitry Baryshkov @ 2024-10-10 14:42 UTC (permalink / raw)
  To: Johan Hovold
  Cc: neil.armstrong, Johan Hovold, Bjorn Andersson, Konrad Dybcio,
	Chris Lew, Stephan Gerhold, Abel Vesa, linux-arm-msm,
	linux-kernel, regressions, stable

On Thu, Oct 10, 2024 at 04:20:40PM GMT, Johan Hovold wrote:
> On Thu, Oct 10, 2024 at 05:13:44PM +0300, Dmitry Baryshkov wrote:
> > On Thu, 10 Oct 2024 at 17:07, Johan Hovold <johan@kernel.org> wrote:
> 
> > > Yes, Chris's analysis of the ECANCELED issue suggests that this is not
> > > SoC specific.
> > 
> > "When the firmware implements the glink channel this way...", etc.
> > Yes, it doesn't sound like being SoC-specific, but we don't know which
> > SoC use this implementation.
> 
> So let's err on the safe side until we have more information and avoid
> having distros drop the user-space daemon until these known bugs exposed
> by the in-kernel pd-mapper have been fixed.

Then default n + revert X1E sounds like a better approach?

> 
> Johan

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-10  7:42 [PATCH] soc: qcom: mark pd-mapper as broken Johan Hovold
  2024-10-10  9:55 ` Dmitry Baryshkov
@ 2024-10-11 10:01 ` Stephan Gerhold
  2025-01-06 19:10   ` Frank Oltmanns
  1 sibling, 1 reply; 19+ messages in thread
From: Stephan Gerhold @ 2024-10-11 10:01 UTC (permalink / raw)
  To: Johan Hovold, Dmitry Baryshkov
  Cc: Bjorn Andersson, Konrad Dybcio, Chris Lew, Abel Vesa,
	linux-arm-msm, linux-kernel, regressions, stable

On Thu, Oct 10, 2024 at 09:42:46AM +0200, Johan Hovold wrote:
> When using the in-kernel pd-mapper on x1e80100, client drivers often
> fail to communicate with the firmware during boot, which specifically
> breaks battery and USB-C altmode notifications. This has been observed
> to happen on almost every second boot (41%) but likely depends on probe
> order:
> 
>     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> 
>     ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> 
>     qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> 
> In the same setup audio also fails to probe albeit much more rarely:
> 
>     PDR: avs/audio get domain list txn wait failed: -110
>     PDR: service lookup for avs/audio failed: -110
> 
> Chris Lew has provided an analysis and is working on a fix for the
> ECANCELED (125) errors, but it is not yet clear whether this will also
> address the audio regression.
> 
> Even if this was first observed on x1e80100 there is currently no reason
> to believe that these issues are specific to that platform.
> 
> Disable the in-kernel pd-mapper for now, and make sure to backport this
> to stable to prevent users and distros from migrating away from the
> user-space service.
> 
> Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
> Cc: stable@vger.kernel.org	# 6.11
> Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
> 
> It's now been over two months since I reported this regression, and even
> if we seem to be making some progress on at least some of these issues I
> think we need disable the pd-mapper temporarily until the fixes are in
> place (e.g. to prevent distros from dropping the user-space service).
> 

This is just a random thought, but I wonder if we could insert a delay
somewhere as temporary workaround to make the in-kernel pd-mapper more
reliable. I just tried replicating the userspace pd-mapper timing on
X1E80100 CRD by:

 1. Disabling auto-loading of qcom_pd_mapper
    (modprobe.blacklist=qcom_pd_mapper)
 2. Adding a systemd service that does nothing except running
    "modprobe qcom_pd_mapper" at the same point in time where the 
    userspace pd-mapper would usually be started.

This seems to work quite well for me, I haven't seen any of the
mentioned errors anymore in a couple of boot tests. Clearly, there is no
actual bug in the in-kernel pd-mapper, only worse timing.

Thanks,
Stephan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2024-10-11 10:01 ` Stephan Gerhold
@ 2025-01-06 19:10   ` Frank Oltmanns
  2025-01-07 10:02     ` Johan Hovold
  0 siblings, 1 reply; 19+ messages in thread
From: Frank Oltmanns @ 2025-01-06 19:10 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Johan Hovold, Dmitry Baryshkov, Bjorn Andersson, Konrad Dybcio,
	Chris Lew, Abel Vesa, linux-arm-msm, linux-kernel, regressions,
	stable

On 2024-10-11 at 12:01:48 +0200, Stephan Gerhold <stephan.gerhold@linaro.org> wrote:
> On Thu, Oct 10, 2024 at 09:42:46AM +0200, Johan Hovold wrote:
>> When using the in-kernel pd-mapper on x1e80100, client drivers often
>> fail to communicate with the firmware during boot, which specifically
>> breaks battery and USB-C altmode notifications. This has been observed
>> to happen on almost every second boot (41%) but likely depends on probe
>> order:
>>
>>     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>>     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>>
>>     ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
>>
>>     qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
>>
>> In the same setup audio also fails to probe albeit much more rarely:
>>
>>     PDR: avs/audio get domain list txn wait failed: -110
>>     PDR: service lookup for avs/audio failed: -110
>>
>> Chris Lew has provided an analysis and is working on a fix for the
>> ECANCELED (125) errors, but it is not yet clear whether this will also
>> address the audio regression.
>>
>> Even if this was first observed on x1e80100 there is currently no reason
>> to believe that these issues are specific to that platform.
>>
>> Disable the in-kernel pd-mapper for now, and make sure to backport this
>> to stable to prevent users and distros from migrating away from the
>> user-space service.
>>
>> Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
>> Cc: stable@vger.kernel.org	# 6.11
>> Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
>> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
>> ---
>>
>> It's now been over two months since I reported this regression, and even
>> if we seem to be making some progress on at least some of these issues I
>> think we need disable the pd-mapper temporarily until the fixes are in
>> place (e.g. to prevent distros from dropping the user-space service).
>>
>
> This is just a random thought, but I wonder if we could insert a delay
> somewhere as temporary workaround to make the in-kernel pd-mapper more
> reliable. I just tried replicating the userspace pd-mapper timing on
> X1E80100 CRD by:
>
>  1. Disabling auto-loading of qcom_pd_mapper
>     (modprobe.blacklist=qcom_pd_mapper)
>  2. Adding a systemd service that does nothing except running
>     "modprobe qcom_pd_mapper" at the same point in time where the
>     userspace pd-mapper would usually be started.

Thank you so much for this idea. I'm currently using this workaround on
my sdm845 device (where the in-kernel pd-mapper is breaking the
out-of-tree call audio functionality).

Is there any work going on on making the timing of the in-kernel
pd-mapper more reliable?

Cheers,
  Frank

> This seems to work quite well for me, I haven't seen any of the
> mentioned errors anymore in a couple of boot tests. Clearly, there is no
> actual bug in the in-kernel pd-mapper, only worse timing.
>
> Thanks,
> Stephan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2025-01-06 19:10   ` Frank Oltmanns
@ 2025-01-07 10:02     ` Johan Hovold
  2025-01-08 14:06       ` Johan Hovold
  0 siblings, 1 reply; 19+ messages in thread
From: Johan Hovold @ 2025-01-07 10:02 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: Stephan Gerhold, Johan Hovold, Dmitry Baryshkov, Bjorn Andersson,
	Konrad Dybcio, Chris Lew, Abel Vesa, linux-arm-msm, linux-kernel,
	regressions, stable

On Mon, Jan 06, 2025 at 08:10:52PM +0100, Frank Oltmanns wrote:
> On 2024-10-11 at 12:01:48 +0200, Stephan Gerhold <stephan.gerhold@linaro.org> wrote:
> > On Thu, Oct 10, 2024 at 09:42:46AM +0200, Johan Hovold wrote:
> >> When using the in-kernel pd-mapper on x1e80100, client drivers often
> >> fail to communicate with the firmware during boot, which specifically
> >> breaks battery and USB-C altmode notifications. This has been observed
> >> to happen on almost every second boot (41%) but likely depends on probe
> >> order:
> >>
> >>     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> >>     pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> >>
> >>     ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
> >>
> >>     qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> >>
> >> In the same setup audio also fails to probe albeit much more rarely:
> >>
> >>     PDR: avs/audio get domain list txn wait failed: -110
> >>     PDR: service lookup for avs/audio failed: -110
> >>
> >> Chris Lew has provided an analysis and is working on a fix for the
> >> ECANCELED (125) errors, but it is not yet clear whether this will also
> >> address the audio regression.
> >>
> >> Even if this was first observed on x1e80100 there is currently no reason
> >> to believe that these issues are specific to that platform.
> >>
> >> Disable the in-kernel pd-mapper for now, and make sure to backport this
> >> to stable to prevent users and distros from migrating away from the
> >> user-space service.
> >>
> >> Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
> >> Cc: stable@vger.kernel.org	# 6.11
> >> Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> >> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> >> ---
> >>
> >> It's now been over two months since I reported this regression, and even
> >> if we seem to be making some progress on at least some of these issues I
> >> think we need disable the pd-mapper temporarily until the fixes are in
> >> place (e.g. to prevent distros from dropping the user-space service).
> >>
> >
> > This is just a random thought, but I wonder if we could insert a delay
> > somewhere as temporary workaround to make the in-kernel pd-mapper more
> > reliable. I just tried replicating the userspace pd-mapper timing on
> > X1E80100 CRD by:
> >
> >  1. Disabling auto-loading of qcom_pd_mapper
> >     (modprobe.blacklist=qcom_pd_mapper)
> >  2. Adding a systemd service that does nothing except running
> >     "modprobe qcom_pd_mapper" at the same point in time where the
> >     userspace pd-mapper would usually be started.
> 
> Thank you so much for this idea. I'm currently using this workaround on
> my sdm845 device (where the in-kernel pd-mapper is breaking the
> out-of-tree call audio functionality).

Thanks for letting us know that the audio issue affects sdm845 as well
(I don't seem to hit it on sc8280xp and the X13s).

> Is there any work going on on making the timing of the in-kernel
> pd-mapper more reliable?

The ECANCELLED regression has now been fixed, but the audio issue
remains to be addressed (I think Bjorn has done some preliminary
investigation).

There is also a NULL-deref in an MHI path that is triggered by the
in-kernel pd-mapper for which Chris has posted a workaround here:

	https://lore.kernel.org/r/20241104-qrtr_mhi-v1-1-79adf7e3bba5@quicinc.com

Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2025-01-07 10:02     ` Johan Hovold
@ 2025-01-08 14:06       ` Johan Hovold
  2025-01-11 14:21         ` Frank Oltmanns
  0 siblings, 1 reply; 19+ messages in thread
From: Johan Hovold @ 2025-01-08 14:06 UTC (permalink / raw)
  To: Frank Oltmanns, Bjorn Andersson, Chris Lew
  Cc: Stephan Gerhold, Johan Hovold, Dmitry Baryshkov, Konrad Dybcio,
	Abel Vesa, linux-arm-msm, linux-kernel, regressions, stable

On Tue, Jan 07, 2025 at 11:02:24AM +0100, Johan Hovold wrote:
> On Mon, Jan 06, 2025 at 08:10:52PM +0100, Frank Oltmanns wrote:

> > Thank you so much for this idea. I'm currently using this workaround on
> > my sdm845 device (where the in-kernel pd-mapper is breaking the
> > out-of-tree call audio functionality).
> 
> Thanks for letting us know that the audio issue affects sdm845 as well
> (I don't seem to hit it on sc8280xp and the X13s).

And today I also hit this on the sc8280xp CRD reference design, so as
expected, there is nothing SoC specific about the audio service
regression either:

[   11.235564] PDR: avs/audio get domain list txn wait failed: -110
[   11.241976] PDR: service lookup for avs/audio failed: -110

even if it may be masked by random changes in timing.

These means it affects also machines like the X13s which already have
audio enabled.

> > Is there any work going on on making the timing of the in-kernel
> > pd-mapper more reliable?
> 
> The ECANCELLED regression has now been fixed, but the audio issue
> remains to be addressed (I think Bjorn has done some preliminary
> investigation).

Hopefully Bjorn or Chris have some plan on how to address the audio
regression.

Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2025-01-08 14:06       ` Johan Hovold
@ 2025-01-11 14:21         ` Frank Oltmanns
  2025-01-13  9:07           ` Johan Hovold
  0 siblings, 1 reply; 19+ messages in thread
From: Frank Oltmanns @ 2025-01-11 14:21 UTC (permalink / raw)
  To: Johan Hovold, Bjorn Andersson, Chris Lew
  Cc: Stephan Gerhold, Johan Hovold, Dmitry Baryshkov, Konrad Dybcio,
	Abel Vesa, linux-arm-msm, linux-kernel, regressions, stable

On 2025-01-08 at 15:06:34 +0100, Johan Hovold <johan@kernel.org> wrote:
> On Tue, Jan 07, 2025 at 11:02:24AM +0100, Johan Hovold wrote:
>> On Mon, Jan 06, 2025 at 08:10:52PM +0100, Frank Oltmanns wrote:
>
>> > Thank you so much for this idea. I'm currently using this workaround on
>> > my sdm845 device (where the in-kernel pd-mapper is breaking the
>> > out-of-tree call audio functionality).
>>
>> Thanks for letting us know that the audio issue affects sdm845 as well
>> (I don't seem to hit it on sc8280xp and the X13s).
>
> And today I also hit this on the sc8280xp CRD reference design, so as
> expected, there is nothing SoC specific about the audio service
> regression either:
>
> [   11.235564] PDR: avs/audio get domain list txn wait failed: -110
> [   11.241976] PDR: service lookup for avs/audio failed: -110
>
> even if it may be masked by random changes in timing.
>
> These means it affects also machines like the X13s which already have
> audio enabled.

I've blocklisted the in-kernel pd-mapper module for now and have
switched back to the userspace pd-mapper.

I don't know if it's helpful or not, but I don't get these error logs
when using to the in-kernel pd-mapper. It's just that the phone's mic
only works on approximately every third boot (unless I defer loading the
module).

>
>> > Is there any work going on on making the timing of the in-kernel
>> > pd-mapper more reliable?
>>
>> The ECANCELLED regression has now been fixed, but the audio issue
>> remains to be addressed (I think Bjorn has done some preliminary
>> investigation).
>
> Hopefully Bjorn or Chris have some plan on how to address the audio
> regression.

If you come up with a patch, I'd be glad to test it on my
xiaomi-beryllium device.

Thanks again,
  Frank

>
> Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2025-01-11 14:21         ` Frank Oltmanns
@ 2025-01-13  9:07           ` Johan Hovold
  2025-02-05 22:23             ` Frank Oltmanns
  0 siblings, 1 reply; 19+ messages in thread
From: Johan Hovold @ 2025-01-13  9:07 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: Bjorn Andersson, Chris Lew, Stephan Gerhold, Johan Hovold,
	Dmitry Baryshkov, Konrad Dybcio, Abel Vesa, linux-arm-msm,
	linux-kernel, regressions, stable

On Sat, Jan 11, 2025 at 03:21:35PM +0100, Frank Oltmanns wrote:
> On 2025-01-08 at 15:06:34 +0100, Johan Hovold <johan@kernel.org> wrote:

> > And today I also hit this on the sc8280xp CRD reference design, so as
> > expected, there is nothing SoC specific about the audio service
> > regression either:
> >
> > [   11.235564] PDR: avs/audio get domain list txn wait failed: -110
> > [   11.241976] PDR: service lookup for avs/audio failed: -110
> >
> > even if it may be masked by random changes in timing.
> >
> > These means it affects also machines like the X13s which already have
> > audio enabled.
> 
> I've blocklisted the in-kernel pd-mapper module for now and have
> switched back to the userspace pd-mapper.
> 
> I don't know if it's helpful or not, but I don't get these error logs
> when using to the in-kernel pd-mapper. It's just that the phone's mic
> only works on approximately every third boot (unless I defer loading the
> module).

Ok, then it sounds like you're hitting a separate bug that is also
triggered by the changed timings with the in-kernel pd-mapper.

Are there any hints in the logs about what goes wrong in your setup? And
the speakers are still working, it's just affecting the mic?

Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] soc: qcom: mark pd-mapper as broken
  2025-01-13  9:07           ` Johan Hovold
@ 2025-02-05 22:23             ` Frank Oltmanns
  0 siblings, 0 replies; 19+ messages in thread
From: Frank Oltmanns @ 2025-02-05 22:23 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Bjorn Andersson, Chris Lew, Stephan Gerhold, Johan Hovold,
	Dmitry Baryshkov, Konrad Dybcio, Abel Vesa, linux-arm-msm,
	linux-kernel, regressions, stable, Caleb Connolly, Joel Selvaraj,
	Alexey Minnekhanov

On 2025-01-13 at 10:07:15 +0100, Johan Hovold <johan@kernel.org> wrote:
> On Sat, Jan 11, 2025 at 03:21:35PM +0100, Frank Oltmanns wrote:
>> On 2025-01-08 at 15:06:34 +0100, Johan Hovold <johan@kernel.org> wrote:
>
>> > And today I also hit this on the sc8280xp CRD reference design, so as
>> > expected, there is nothing SoC specific about the audio service
>> > regression either:
>> >
>> > [   11.235564] PDR: avs/audio get domain list txn wait failed: -110
>> > [   11.241976] PDR: service lookup for avs/audio failed: -110
>> >
>> > even if it may be masked by random changes in timing.
>> >
>> > These means it affects also machines like the X13s which already have
>> > audio enabled.
>>
>> I've blocklisted the in-kernel pd-mapper module for now and have
>> switched back to the userspace pd-mapper.
>>
>> I don't know if it's helpful or not, but I don't get these error logs
>> when using to the in-kernel pd-mapper. It's just that the phone's mic
>> only works on approximately every third boot (unless I defer loading the
>> module).
>
> Ok, then it sounds like you're hitting a separate bug that is also
> triggered by the changed timings with the in-kernel pd-mapper.

I worked on finding out what's causing the issue on sdm845 and I've
submitted a patch here: [1]

> Are there any hints in the logs about what goes wrong in your setup?

Unfortunately not, see [1].

> And
> the speakers are still working, it's just affecting the mic?

Yes, it's only affecting the mic in my setup on xiaomi-beryllium, but
there seem to be issues with the speaker on oneplus-enchilada that can
be fixed with the same approach.

Best regards,
  Frank

[1]: https://lore.kernel.org/all/20250205-qcom_pdm_defer-v1-1-a2e9a39ea9b9@oltmanns.dev/

>
> Johan

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-02-05 22:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-10  7:42 [PATCH] soc: qcom: mark pd-mapper as broken Johan Hovold
2024-10-10  9:55 ` Dmitry Baryshkov
2024-10-10 10:11   ` Johan Hovold
2024-10-10 10:55     ` Dmitry Baryshkov
2024-10-10 11:44       ` Johan Hovold
2024-10-10 11:46         ` neil.armstrong
2024-10-10 13:24           ` Johan Hovold
2024-10-10 13:45             ` Dmitry Baryshkov
2024-10-10 14:07               ` Johan Hovold
2024-10-10 14:13                 ` Dmitry Baryshkov
2024-10-10 14:20                   ` Johan Hovold
2024-10-10 14:42                     ` Dmitry Baryshkov
2024-10-11 10:01 ` Stephan Gerhold
2025-01-06 19:10   ` Frank Oltmanns
2025-01-07 10:02     ` Johan Hovold
2025-01-08 14:06       ` Johan Hovold
2025-01-11 14:21         ` Frank Oltmanns
2025-01-13  9:07           ` Johan Hovold
2025-02-05 22:23             ` Frank Oltmanns

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).