linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pmdomian: core: don't unset stay_on during sync_state
@ 2025-09-02 18:23 Nicolas Frattaroli
  2025-09-02 18:25 ` Nicolas Frattaroli
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Nicolas Frattaroli @ 2025-09-02 18:23 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner,
	Nicolas Frattaroli

This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.

On RK3576, the UFS controller's power domain has a quirk that requires
it to stay enabled, infrastricture for which was added in Commit
cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").

Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
genpds on until sync_state") appears to break this quirk wholesale. The
result is that RK3576 devices with the UFS controller enabled but unused
will freeze once pmdomain shuts off unused domains.

Revert it until a better fix can be found.

Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
---
 drivers/pmdomain/core.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
--- a/drivers/pmdomain/core.c
+++ b/drivers/pmdomain/core.c
@@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
 	return ret;
 }
 
-#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
 static bool pd_ignore_unused;
 static int __init pd_ignore_unused_setup(char *__unused)
 {
@@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
 	return 0;
 }
 late_initcall_sync(genpd_power_off_unused);
-#endif
 
 #ifdef CONFIG_PM_SLEEP
 
@@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
 	list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
 		if (genpd->provider == of_fwnode_handle(np)) {
 			genpd_lock(genpd);
-			genpd->stay_on = false;
 			genpd_power_off(genpd, false, 0);
 			genpd_unlock(genpd);
 		}
@@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
 
 	case GENPD_SYNC_STATE_SIMPLE:
 		genpd_lock(genpd);
-		genpd->stay_on = false;
 		genpd_power_off(genpd, false, 0);
 		genpd_unlock(genpd);
 		break;

---
base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21

Best regards,
-- 
Nicolas Frattaroli <nicolas.frattaroli@collabora.com>


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-02 18:23 [PATCH] pmdomian: core: don't unset stay_on during sync_state Nicolas Frattaroli
@ 2025-09-02 18:25 ` Nicolas Frattaroli
  2025-09-04  9:17 ` Ulf Hansson
  2025-09-04 15:49 ` Heiko Stübner
  2 siblings, 0 replies; 10+ messages in thread
From: Nicolas Frattaroli @ 2025-09-02 18:25 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner

On Tuesday, 2 September 2025 20:23:04 Central European Summer Time Nicolas Frattaroli wrote:
> This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> 
> On RK3576, the UFS controller's power domain has a quirk that requires
> it to stay enabled, infrastricture for which was added in Commit
> cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> 
> Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> genpds on until sync_state") appears to break this quirk wholesale. The
> result is that RK3576 devices with the UFS controller enabled but unused
> will freeze once pmdomain shuts off unused domains.
> 
> Revert it until a better fix can be found.
> 
> Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
>  drivers/pmdomain/core.c | 4 ----
>  1 file changed, 4 deletions(-)
> 

And of course I notice the typo in the subject (pmdomian) as soon as I hit
enter to confirm send. D'oh!

Ulf, can you fix that if this gets applied? Thank you!

Kind regards,
Nicolas Frattaroli



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-02 18:23 [PATCH] pmdomian: core: don't unset stay_on during sync_state Nicolas Frattaroli
  2025-09-02 18:25 ` Nicolas Frattaroli
@ 2025-09-04  9:17 ` Ulf Hansson
  2025-09-04 12:50   ` Nicolas Frattaroli
  2025-09-04 15:49 ` Heiko Stübner
  2 siblings, 1 reply; 10+ messages in thread
From: Ulf Hansson @ 2025-09-04  9:17 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner

On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
>
> This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.

I can't find this commit hash. What tree are you using when testing this?

Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?

>
> On RK3576, the UFS controller's power domain has a quirk that requires
> it to stay enabled, infrastricture for which was added in Commit
> cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
>
> Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> genpds on until sync_state") appears to break this quirk wholesale. The
> result is that RK3576 devices with the UFS controller enabled but unused
> will freeze once pmdomain shuts off unused domains.
>
> Revert it until a better fix can be found.

This sounds a bit vague to me, can you please clarify and elaborate a
bit more so I can try to help.

What does "UFS controller enabled but unused" actually mean? Has the
UFS controller driver been probed successfully and thus its
corresponding device been attached to its PM domain?

Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
until sync_state") brings along with its corresponding sync_state
series for genpd [1]. Again, more information is needed to understand
what goes wrong.

Kind regards
Uffe

[1]
https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/

>
> Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
>  drivers/pmdomain/core.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
> --- a/drivers/pmdomain/core.c
> +++ b/drivers/pmdomain/core.c
> @@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
>         return ret;
>  }
>
> -#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
>  static bool pd_ignore_unused;
>  static int __init pd_ignore_unused_setup(char *__unused)
>  {
> @@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
>         return 0;
>  }
>  late_initcall_sync(genpd_power_off_unused);
> -#endif
>
>  #ifdef CONFIG_PM_SLEEP
>
> @@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
>         list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
>                 if (genpd->provider == of_fwnode_handle(np)) {
>                         genpd_lock(genpd);
> -                       genpd->stay_on = false;
>                         genpd_power_off(genpd, false, 0);
>                         genpd_unlock(genpd);
>                 }
> @@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
>
>         case GENPD_SYNC_STATE_SIMPLE:
>                 genpd_lock(genpd);
> -               genpd->stay_on = false;
>                 genpd_power_off(genpd, false, 0);
>                 genpd_unlock(genpd);
>                 break;
>
> ---
> base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
> change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21
>
> Best regards,
> --
> Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-04  9:17 ` Ulf Hansson
@ 2025-09-04 12:50   ` Nicolas Frattaroli
  2025-09-04 15:43     ` Nicolas Frattaroli
  2025-09-04 16:13     ` Ulf Hansson
  0 siblings, 2 replies; 10+ messages in thread
From: Nicolas Frattaroli @ 2025-09-04 12:50 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner

On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> <nicolas.frattaroli@collabora.com> wrote:
> >
> > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> 
> I can't find this commit hash. What tree are you using when testing this?
> 
> Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?

Probably, I did the revert on a rebased branch and then rebased the revert
onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
there is a v2 (it might actually become a different patch, see huge text
below, sorry!)

> 
> >
> > On RK3576, the UFS controller's power domain has a quirk that requires
> > it to stay enabled, infrastricture for which was added in Commit
> > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> >
> > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > genpds on until sync_state") appears to break this quirk wholesale. The
> > result is that RK3576 devices with the UFS controller enabled but unused
> > will freeze once pmdomain shuts off unused domains.
> >
> > Revert it until a better fix can be found.
> 
> This sounds a bit vague to me, can you please clarify and elaborate a
> bit more so I can try to help.
> 
> What does "UFS controller enabled but unused" actually mean? Has the
> UFS controller driver been probed successfully and thus its
> corresponding device been attached to its PM domain?

It means the UFS controller driver has probed, but does not find a
UFS storage chip connected to it, and therefore reports a probe
failure. This is a possibility on single-board computers like the
Radxa ROCK 4D, where the UFS storage is a separate module that plugs
into some headers.

> Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> until sync_state") brings along with its corresponding sync_state
> series for genpd [1]. Again, more information is needed to understand
> what goes wrong.

The reason why Rockchip's UFS driver needs this function is that once
the RK3576_PD_USB power domain is turned on on the RK3576 SoC, it should
not be toggled off again until a whole SoC suspend/resume cycle because
the off/on operation is seemingly not idempotent. This does not preclude
turning off the power domain if the device isn't used at all, e.g. the DT
node is absent. This is why the affected PDs are not marked as always-on
in the PD init data for the SoC, as the device driver is the best place to
set this during runtime.

The way I got to this commit is through a bisect between the UFS node
being enabled on the ROCK 4D (commit 00abee2b18342d6c2f6f37225682fa7ca0d33142)
and v6.17-rc4. The bisect landed on the pmdomain merge commit
(commit fc8f5028eb0cc5aee0501a99f59a04f748fbff1c) as the first bad commit,
so I checked out v6.16, cherry-picked the UFS node enablement, made sure it
works fine with that, and then rebased the 44 pmdomain commits that this
merge commit merged onto this base. I verified that the tip of that then
exhibited the faulty behaviour, namely that my ROCK 4D was locking up
some time after boot, right after the kernel log message

    [   33.756516] vdd_npu_s0: disabling

so presumably when unused regulators and domains were being disabled.
Aside note: setting vdd_npu_s0 to always-on also works to work around
the issue, and I'm not quite sure why, because this regulator is not
used for anything right now so this may be some peculiar SoC silicon
design where VDD_NPU is leaking into the part that the UFS PD actually
should be gating, preventing the lockup on accident.

Anyway, so I did a bisect between the UFS introduction and the rebased
tip of the pmdomain branch, and landed on the commit I'm reverting.

The problem exhibits itself not when the affect PD is first turned off,
but when the NPU regulator is turned off. How could this be relevant to
power domains at all? I have no idea.

I admit that I don't understand the commit I'm reverting, as it talks of
keeping powered-on genpds on, but the code sets a member called "stay_on"
to false.

However, reverting it 100% reproducibly fixes the observed lockup. The
lockup does not occur if an UFS module is connected to the SBC.

It seems `dev_pm_genpd_rpm_always_on` is never run if UFS experiences
a probe failure, so I'm not entirely sure how this specific commit
changes the behaviour in a way that makes it unhappy. I agree the
solution is probably not a revert here. Also, adding an unconditional
`dev_pm_genpd_rpm_always_on` in the failing UFS probe path doesn't
work, likely because the driver is unbound. Maybe this is a complete
red herring and `dev_pm_genpd_rpm_always_on` is unrelated.

The only other device that uses RK3576_PD_USB is the usb_drd0_dwc3
usb controller. This node is not enabled in my rebase-pd-onto-ufs-enable
branch, so even assuming the problem is that the usb driver is missing
the same `dev_pm_genpd_rpm_always_on` call, that shouldn't matter
becuase nothing else is using that power domain. Unless, of course, our
description of that power domain is incomplete, which is possible.

The problem may be that this was always racey and the genpd changes
just made the race go wrong more often.

Another observation: my kernel log during afflicted bootups contains

    rockchip-pm-domain 27380000.power-management:power-controller: sync_state() pending due to 2a2d0000.ufshc

A further observation: pd_ignore_unused does not fix it. Looking at
the revert and experimenting, removing the `stay_on = false` is not
what actually fixes my problem, but removing the
`#ifndef CONFIG_PM_GENERIC_DOMAINS_OF`, even without setting
pd_ignore_unused, fixes my issue.

This is probably a big enough clue to suss out what's going on; that
I *need* "[    2.868987] PM: genpd: Disabling unused power domains"
to run quite early before the regulator that feeds VDD_NPU is disabled
after the bootup completes unless an UFS module is present or the UFS
controller is disabled makes me think this is another peculiarity of
the RK3576 power domains hardware, because none of this I just wrote
sounds like the words of a sane human.

Kind regards,
Nicolas Frattaroli

> 
> Kind regards
> Uffe
> 
> [1]
> https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
> 
> >
> > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > ---
> >  drivers/pmdomain/core.c | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
> > --- a/drivers/pmdomain/core.c
> > +++ b/drivers/pmdomain/core.c
> > @@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
> >         return ret;
> >  }
> >
> > -#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
> >  static bool pd_ignore_unused;
> >  static int __init pd_ignore_unused_setup(char *__unused)
> >  {
> > @@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
> >         return 0;
> >  }
> >  late_initcall_sync(genpd_power_off_unused);
> > -#endif
> >
> >  #ifdef CONFIG_PM_SLEEP
> >
> > @@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
> >         list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
> >                 if (genpd->provider == of_fwnode_handle(np)) {
> >                         genpd_lock(genpd);
> > -                       genpd->stay_on = false;
> >                         genpd_power_off(genpd, false, 0);
> >                         genpd_unlock(genpd);
> >                 }
> > @@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
> >
> >         case GENPD_SYNC_STATE_SIMPLE:
> >                 genpd_lock(genpd);
> > -               genpd->stay_on = false;
> >                 genpd_power_off(genpd, false, 0);
> >                 genpd_unlock(genpd);
> >                 break;
> >
> > ---
> > base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
> > change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21
> >
> > Best regards,
> > --
> > Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> >
> 





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-04 12:50   ` Nicolas Frattaroli
@ 2025-09-04 15:43     ` Nicolas Frattaroli
  2025-09-04 16:13     ` Ulf Hansson
  1 sibling, 0 replies; 10+ messages in thread
From: Nicolas Frattaroli @ 2025-09-04 15:43 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner

On Thursday, 4 September 2025 14:50:13 Central European Summer Time you wrote:
> On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> > On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> > <nicolas.frattaroli@collabora.com> wrote:
> > >
> > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> > 
> > I can't find this commit hash. What tree are you using when testing this?
> > 
> > Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
> 
> Probably, I did the revert on a rebased branch and then rebased the revert
> onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
> there is a v2 (it might actually become a different patch, see huge text
> below, sorry!)
> 
> > 
> > >
> > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > it to stay enabled, infrastricture for which was added in Commit
> > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > >
> > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > result is that RK3576 devices with the UFS controller enabled but unused
> > > will freeze once pmdomain shuts off unused domains.
> > >
> > > Revert it until a better fix can be found.
> > 
> > This sounds a bit vague to me, can you please clarify and elaborate a
> > bit more so I can try to help.
> > 
> > What does "UFS controller enabled but unused" actually mean? Has the
> > UFS controller driver been probed successfully and thus its
> > corresponding device been attached to its PM domain?
> 
> It means the UFS controller driver has probed, but does not find a
> UFS storage chip connected to it, and therefore reports a probe
> failure. This is a possibility on single-board computers like the
> Radxa ROCK 4D, where the UFS storage is a separate module that plugs
> into some headers.
> 
> > Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> > to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> > until sync_state") brings along with its corresponding sync_state
> > series for genpd [1]. Again, more information is needed to understand
> > what goes wrong.
> 
> [snip very long rubber duck debugging session]

Okay so I believe I have found the root cause of the regression. UFS is
innocent, disabling UFS just happens to avoid it due to how the timing of
things works out.

The real issue is that the NPU power domains on the RK3576, which are
currently unused, have an undeclared dependency on vdd_npu_s0.

Declaring this dependency with a `domain-supply` and adding the
necessary flag in the rockchip PD controller to use it does not solve
he problem. This is because the rockchip PD controller cannot acquire
those supplies during probe, as they're not available yet and their
availability depends on the PD controller finishing probe.

That's why it acquires them in the PD enable callback, but the NPU
PDs are never enabled because they're unused.

This worked fine when unused PDs were still turned off quite early, as
this meant they were turned off before regulators. Now the unused
regulators are turned off before turning off the unused PDs happens.

I don't really see an easy way to fix this with a patch that's fit for
an rc cycle. We can't request the regulator early or even just add a
device link, as the regulator is not around yet.

Marking vdd_npu_s0 as always-on would be abusing DT to work around a
Linux kernel shortcoming, which is a no-no.

What we need is either a way to register with pmdomain core that
certain PDs need a late init for additional supplies, which is then
called before any of the unused regulator power off functionality is
invoked by the regulator core.

Any ideas?

Kind regards,
Nicolas Frattaroli


> 
> Kind regards,
> Nicolas Frattaroli
> 
> > 
> > Kind regards
> > Uffe
> > 
> > [1]
> > https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
> > 
> > >
> > > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > > ---
> > >  drivers/pmdomain/core.c | 4 ----
> > >  1 file changed, 4 deletions(-)
> > >
> > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
> > > --- a/drivers/pmdomain/core.c
> > > +++ b/drivers/pmdomain/core.c
> > > @@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
> > >         return ret;
> > >  }
> > >
> > > -#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
> > >  static bool pd_ignore_unused;
> > >  static int __init pd_ignore_unused_setup(char *__unused)
> > >  {
> > > @@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
> > >         return 0;
> > >  }
> > >  late_initcall_sync(genpd_power_off_unused);
> > > -#endif
> > >
> > >  #ifdef CONFIG_PM_SLEEP
> > >
> > > @@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
> > >         list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
> > >                 if (genpd->provider == of_fwnode_handle(np)) {
> > >                         genpd_lock(genpd);
> > > -                       genpd->stay_on = false;
> > >                         genpd_power_off(genpd, false, 0);
> > >                         genpd_unlock(genpd);
> > >                 }
> > > @@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
> > >
> > >         case GENPD_SYNC_STATE_SIMPLE:
> > >                 genpd_lock(genpd);
> > > -               genpd->stay_on = false;
> > >                 genpd_power_off(genpd, false, 0);
> > >                 genpd_unlock(genpd);
> > >                 break;
> > >
> > > ---
> > > base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
> > > change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21
> > >
> > > Best regards,
> > > --
> > > Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > >
> > 
> 
> 





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-02 18:23 [PATCH] pmdomian: core: don't unset stay_on during sync_state Nicolas Frattaroli
  2025-09-02 18:25 ` Nicolas Frattaroli
  2025-09-04  9:17 ` Ulf Hansson
@ 2025-09-04 15:49 ` Heiko Stübner
  2025-09-04 15:55   ` Heiko Stübner
  2 siblings, 1 reply; 10+ messages in thread
From: Heiko Stübner @ 2025-09-04 15:49 UTC (permalink / raw)
  To: Ulf Hansson, Nicolas Frattaroli
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Nicolas Frattaroli

Hi,

Am Dienstag, 2. September 2025, 20:23:04 Mitteleuropäische Sommerzeit schrieb Nicolas Frattaroli:
> This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> 
> On RK3576, the UFS controller's power domain has a quirk that requires
> it to stay enabled, infrastricture for which was added in Commit
> cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> 
> Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> genpds on until sync_state") appears to break this quirk wholesale. The
> result is that RK3576 devices with the UFS controller enabled but unused
> will freeze once pmdomain shuts off unused domains.
> 
> Revert it until a better fix can be found.
> 
> Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>

just an observation independent of the conversation in the other thread.
This patch/revert whatever fixes an actual issue for me.

On the rk3588 the NPU power-domains are a hirarchy of

PD_NPU
	PD_NPUTOP (core0)
		PD_NPU1 (core1)
		PD_NPU2 (core2)

and the PD_NPU does need a supply regulator.

(1) With "just" v6.17-rc + the rocket driver probing and then idling, I get:

# cat /sys/kernel/debug/regulator/regulator_summary
 regulator                      use open bypass  opmode voltage current     min     max
---------------------------------------------------------------------------------------
 dc_12v                           4    3      0 unknown 12000mV     0mA 12000mV 12000mV 
[...]
    vcc5v0_baseboard              2    1      0 unknown  5000mV     0mA  5000mV  5000mV 
       vcc5v0_sys                18   18      0 unknown  5000mV     0mA  5000mV  5000mV 
[...]
          vdd_npu_s0              0    0      0  normal   800mV     0mA   550mV   950mV 
          vcc_1v2_s3              2    1      0 unknown  1200mV     0mA  1200mV  1200mV 
             fe1b0000.ethernet-phy   1                                 0mA     0mV     0mV
          vdd_gpu_s0              1    2      0  normal   675mV     0mA   550mV   950mV 
             fb000000.gpu-mali    1                                 0mA   675mV   850mV
             fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
[...]

#  cat /sys/kernel/debug/pm_genpd/pm_genpd_summary 
domain                          status          children        performance
    /device                         runtime status                  managed by
------------------------------------------------------------------------------
[...]
gpu                             off-0                           0
    fb000000.gpu                    suspended                   0           SW
npu2                            off-0                           0
    fdada000.iommu                  suspended                   0           SW
    fdad0000.npu                    suspended                   0           SW
npu1                            off-0                           0
    fdaca000.iommu                  suspended                   0           SW
    fdac0000.npu                    suspended                   0           SW
nputop                          off-0                           0
                                                npu1, npu2
    fdab9000.iommu                  suspended                   0           SW
    fdab0000.npu                    suspended                   0           SW
npu                             on                              0
                                                nputop

Observe that the PD_NPU never got its regulator and the domain also
never actually gets turned off. While the domains directly attached to
the cores get turned off.


(2) with Nicolas's patch applied on top, I get the correct behaviour,
that was also happening with v6.16 before

# cat /sys/kernel/debug/regulator/regulator_summary
 regulator                      use open bypass  opmode voltage current     min     max
---------------------------------------------------------------------------------------
 dc_12v                           4    3      0 unknown 12000mV     0mA 12000mV 12000mV 
[...]
    vcc5v0_baseboard              2    1      0 unknown  5000mV     0mA  5000mV  5000mV 
       vcc5v0_sys                18   18      0 unknown  5000mV     0mA  5000mV  5000mV 
[...]
          vdd_npu_s0              0    1      0  normal   800mV     0mA   550mV   950mV 
             fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
          vdd_cpu_big1_s0         2    1      0  normal  1000mV     0mA   550mV  1050mV 
             cpu6-cpu             1                                 0mA  1000mV  1000mV
          vdd_gpu_s0              1    2      0  normal   675mV     0mA   550mV   950mV 
             fb000000.gpu-mali    1                                 0mA   675mV   850mV
             fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
[...]

# cat /sys/kernel/debug/pm_genpd/pm_genpd_summary 
domain                          status          children        performance
    /device                         runtime status                  managed by
------------------------------------------------------------------------------
[...]
gpu                             off-0                           0
    fb000000.gpu                    suspended                   0           SW
npu2                            off-0                           0
    fdada000.iommu                  suspended                   0           SW
    fdad0000.npu                    suspended                   0           SW
npu1                            off-0                           0
    fdaca000.iommu                  suspended                   0           SW
    fdac0000.npu                    suspended                   0           SW
nputop                          off-0                           0
                                                npu1, npu2
    fdab9000.iommu                  suspended                   0           SW
    fdab0000.npu                    suspended                   0           SW
npu                             off-0                           0
                                                nputop

The regulator handling is working correctly and also the parent PD_NPU
domain gets turned off when its children are off.


Heiko



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-04 15:49 ` Heiko Stübner
@ 2025-09-04 15:55   ` Heiko Stübner
  2025-09-04 16:07     ` Ulf Hansson
  0 siblings, 1 reply; 10+ messages in thread
From: Heiko Stübner @ 2025-09-04 15:55 UTC (permalink / raw)
  To: Ulf Hansson, Nicolas Frattaroli
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Nicolas Frattaroli

Am Donnerstag, 4. September 2025, 17:49:16 Mitteleuropäische Sommerzeit schrieb Heiko Stübner:
> Hi,
> 
> Am Dienstag, 2. September 2025, 20:23:04 Mitteleuropäische Sommerzeit schrieb Nicolas Frattaroli:
> > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> > 
> > On RK3576, the UFS controller's power domain has a quirk that requires
> > it to stay enabled, infrastricture for which was added in Commit
> > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > 
> > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > genpds on until sync_state") appears to break this quirk wholesale. The
> > result is that RK3576 devices with the UFS controller enabled but unused
> > will freeze once pmdomain shuts off unused domains.
> > 
> > Revert it until a better fix can be found.
> > 
> > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> 
> just an observation independent of the conversation in the other thread.
> This patch/revert whatever fixes an actual issue for me.

ah and just saw Nicolas' mail from 6 minutes earlier.

So I guess what saves me here is the rocket driver being built as a
module, power-domain getting turned off early, rocket probes with
pd off and then gets its supplies as expected.


Heiko


> 
> On the rk3588 the NPU power-domains are a hirarchy of
> 
> PD_NPU
> 	PD_NPUTOP (core0)
> 		PD_NPU1 (core1)
> 		PD_NPU2 (core2)
> 
> and the PD_NPU does need a supply regulator.
> 
> (1) With "just" v6.17-rc + the rocket driver probing and then idling, I get:
> 
> # cat /sys/kernel/debug/regulator/regulator_summary
>  regulator                      use open bypass  opmode voltage current     min     max
> ---------------------------------------------------------------------------------------
>  dc_12v                           4    3      0 unknown 12000mV     0mA 12000mV 12000mV 
> [...]
>     vcc5v0_baseboard              2    1      0 unknown  5000mV     0mA  5000mV  5000mV 
>        vcc5v0_sys                18   18      0 unknown  5000mV     0mA  5000mV  5000mV 
> [...]
>           vdd_npu_s0              0    0      0  normal   800mV     0mA   550mV   950mV 
>           vcc_1v2_s3              2    1      0 unknown  1200mV     0mA  1200mV  1200mV 
>              fe1b0000.ethernet-phy   1                                 0mA     0mV     0mV
>           vdd_gpu_s0              1    2      0  normal   675mV     0mA   550mV   950mV 
>              fb000000.gpu-mali    1                                 0mA   675mV   850mV
>              fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
> [...]
> 
> #  cat /sys/kernel/debug/pm_genpd/pm_genpd_summary 
> domain                          status          children        performance
>     /device                         runtime status                  managed by
> ------------------------------------------------------------------------------
> [...]
> gpu                             off-0                           0
>     fb000000.gpu                    suspended                   0           SW
> npu2                            off-0                           0
>     fdada000.iommu                  suspended                   0           SW
>     fdad0000.npu                    suspended                   0           SW
> npu1                            off-0                           0
>     fdaca000.iommu                  suspended                   0           SW
>     fdac0000.npu                    suspended                   0           SW
> nputop                          off-0                           0
>                                                 npu1, npu2
>     fdab9000.iommu                  suspended                   0           SW
>     fdab0000.npu                    suspended                   0           SW
> npu                             on                              0
>                                                 nputop
> 
> Observe that the PD_NPU never got its regulator and the domain also
> never actually gets turned off. While the domains directly attached to
> the cores get turned off.
> 
> 
> (2) with Nicolas's patch applied on top, I get the correct behaviour,
> that was also happening with v6.16 before
> 
> # cat /sys/kernel/debug/regulator/regulator_summary
>  regulator                      use open bypass  opmode voltage current     min     max
> ---------------------------------------------------------------------------------------
>  dc_12v                           4    3      0 unknown 12000mV     0mA 12000mV 12000mV 
> [...]
>     vcc5v0_baseboard              2    1      0 unknown  5000mV     0mA  5000mV  5000mV 
>        vcc5v0_sys                18   18      0 unknown  5000mV     0mA  5000mV  5000mV 
> [...]
>           vdd_npu_s0              0    1      0  normal   800mV     0mA   550mV   950mV 
>              fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
>           vdd_cpu_big1_s0         2    1      0  normal  1000mV     0mA   550mV  1050mV 
>              cpu6-cpu             1                                 0mA  1000mV  1000mV
>           vdd_gpu_s0              1    2      0  normal   675mV     0mA   550mV   950mV 
>              fb000000.gpu-mali    1                                 0mA   675mV   850mV
>              fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
> [...]
> 
> # cat /sys/kernel/debug/pm_genpd/pm_genpd_summary 
> domain                          status          children        performance
>     /device                         runtime status                  managed by
> ------------------------------------------------------------------------------
> [...]
> gpu                             off-0                           0
>     fb000000.gpu                    suspended                   0           SW
> npu2                            off-0                           0
>     fdada000.iommu                  suspended                   0           SW
>     fdad0000.npu                    suspended                   0           SW
> npu1                            off-0                           0
>     fdaca000.iommu                  suspended                   0           SW
>     fdac0000.npu                    suspended                   0           SW
> nputop                          off-0                           0
>                                                 npu1, npu2
>     fdab9000.iommu                  suspended                   0           SW
>     fdab0000.npu                    suspended                   0           SW
> npu                             off-0                           0
>                                                 nputop
> 
> The regulator handling is working correctly and also the parent PD_NPU
> domain gets turned off when its children are off.
> 
> 
> Heiko
> 





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-04 15:55   ` Heiko Stübner
@ 2025-09-04 16:07     ` Ulf Hansson
  0 siblings, 0 replies; 10+ messages in thread
From: Ulf Hansson @ 2025-09-04 16:07 UTC (permalink / raw)
  To: Nicolas Frattaroli, Heiko Stübner
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel

On Thu, 4 Sept 2025 at 17:55, Heiko Stübner <heiko@sntech.de> wrote:
>
> Am Donnerstag, 4. September 2025, 17:49:16 Mitteleuropäische Sommerzeit schrieb Heiko Stübner:
> > Hi,
> >
> > Am Dienstag, 2. September 2025, 20:23:04 Mitteleuropäische Sommerzeit schrieb Nicolas Frattaroli:
> > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> > >
> > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > it to stay enabled, infrastricture for which was added in Commit
> > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > >
> > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > result is that RK3576 devices with the UFS controller enabled but unused
> > > will freeze once pmdomain shuts off unused domains.
> > >
> > > Revert it until a better fix can be found.
> > >
> > > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> >
> > just an observation independent of the conversation in the other thread.
> > This patch/revert whatever fixes an actual issue for me.
>
> ah and just saw Nicolas' mail from 6 minutes earlier.
>
> So I guess what saves me here is the rocket driver being built as a
> module, power-domain getting turned off early, rocket probes with
> pd off and then gets its supplies as expected.

Okay, so to follow up on your analysis.

I am thinking that the PM domain "looks like" it's properly powered-on
during boot/init, while in-fact it may not.

In other words, calling pm_genpd_init() and stating that the genpd is
powered-on seems to be the problem here. Could a proper power-off be
done before calling pm_genpd_init() and then leaving it off until
later?

Kind regards
Uffe

>
>
> Heiko
>
>
> >
> > On the rk3588 the NPU power-domains are a hirarchy of
> >
> > PD_NPU
> >       PD_NPUTOP (core0)
> >               PD_NPU1 (core1)
> >               PD_NPU2 (core2)
> >
> > and the PD_NPU does need a supply regulator.
> >
> > (1) With "just" v6.17-rc + the rocket driver probing and then idling, I get:
> >
> > # cat /sys/kernel/debug/regulator/regulator_summary
> >  regulator                      use open bypass  opmode voltage current     min     max
> > ---------------------------------------------------------------------------------------
> >  dc_12v                           4    3      0 unknown 12000mV     0mA 12000mV 12000mV
> > [...]
> >     vcc5v0_baseboard              2    1      0 unknown  5000mV     0mA  5000mV  5000mV
> >        vcc5v0_sys                18   18      0 unknown  5000mV     0mA  5000mV  5000mV
> > [...]
> >           vdd_npu_s0              0    0      0  normal   800mV     0mA   550mV   950mV
> >           vcc_1v2_s3              2    1      0 unknown  1200mV     0mA  1200mV  1200mV
> >              fe1b0000.ethernet-phy   1                                 0mA     0mV     0mV
> >           vdd_gpu_s0              1    2      0  normal   675mV     0mA   550mV   950mV
> >              fb000000.gpu-mali    1                                 0mA   675mV   850mV
> >              fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
> > [...]
> >
> > #  cat /sys/kernel/debug/pm_genpd/pm_genpd_summary
> > domain                          status          children        performance
> >     /device                         runtime status                  managed by
> > ------------------------------------------------------------------------------
> > [...]
> > gpu                             off-0                           0
> >     fb000000.gpu                    suspended                   0           SW
> > npu2                            off-0                           0
> >     fdada000.iommu                  suspended                   0           SW
> >     fdad0000.npu                    suspended                   0           SW
> > npu1                            off-0                           0
> >     fdaca000.iommu                  suspended                   0           SW
> >     fdac0000.npu                    suspended                   0           SW
> > nputop                          off-0                           0
> >                                                 npu1, npu2
> >     fdab9000.iommu                  suspended                   0           SW
> >     fdab0000.npu                    suspended                   0           SW
> > npu                             on                              0
> >                                                 nputop
> >
> > Observe that the PD_NPU never got its regulator and the domain also
> > never actually gets turned off. While the domains directly attached to
> > the cores get turned off.
> >
> >
> > (2) with Nicolas's patch applied on top, I get the correct behaviour,
> > that was also happening with v6.16 before
> >
> > # cat /sys/kernel/debug/regulator/regulator_summary
> >  regulator                      use open bypass  opmode voltage current     min     max
> > ---------------------------------------------------------------------------------------
> >  dc_12v                           4    3      0 unknown 12000mV     0mA 12000mV 12000mV
> > [...]
> >     vcc5v0_baseboard              2    1      0 unknown  5000mV     0mA  5000mV  5000mV
> >        vcc5v0_sys                18   18      0 unknown  5000mV     0mA  5000mV  5000mV
> > [...]
> >           vdd_npu_s0              0    1      0  normal   800mV     0mA   550mV   950mV
> >              fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
> >           vdd_cpu_big1_s0         2    1      0  normal  1000mV     0mA   550mV  1050mV
> >              cpu6-cpu             1                                 0mA  1000mV  1000mV
> >           vdd_gpu_s0              1    2      0  normal   675mV     0mA   550mV   950mV
> >              fb000000.gpu-mali    1                                 0mA   675mV   850mV
> >              fd8d8000.power-management:power-controller-domain   0                                 0mA     0mV     0mV
> > [...]
> >
> > # cat /sys/kernel/debug/pm_genpd/pm_genpd_summary
> > domain                          status          children        performance
> >     /device                         runtime status                  managed by
> > ------------------------------------------------------------------------------
> > [...]
> > gpu                             off-0                           0
> >     fb000000.gpu                    suspended                   0           SW
> > npu2                            off-0                           0
> >     fdada000.iommu                  suspended                   0           SW
> >     fdad0000.npu                    suspended                   0           SW
> > npu1                            off-0                           0
> >     fdaca000.iommu                  suspended                   0           SW
> >     fdac0000.npu                    suspended                   0           SW
> > nputop                          off-0                           0
> >                                                 npu1, npu2
> >     fdab9000.iommu                  suspended                   0           SW
> >     fdab0000.npu                    suspended                   0           SW
> > npu                             off-0                           0
> >                                                 nputop
> >
> > The regulator handling is working correctly and also the parent PD_NPU
> > domain gets turned off when its children are off.
> >
> >
> > Heiko
> >
>
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-04 12:50   ` Nicolas Frattaroli
  2025-09-04 15:43     ` Nicolas Frattaroli
@ 2025-09-04 16:13     ` Ulf Hansson
  2025-09-04 17:41       ` Nicolas Frattaroli
  1 sibling, 1 reply; 10+ messages in thread
From: Ulf Hansson @ 2025-09-04 16:13 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner

On Thu, 4 Sept 2025 at 14:50, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
>
> On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> > On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> > <nicolas.frattaroli@collabora.com> wrote:
> > >
> > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> >
> > I can't find this commit hash. What tree are you using when testing this?
> >
> > Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
>
> Probably, I did the revert on a rebased branch and then rebased the revert
> onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
> there is a v2 (it might actually become a different patch, see huge text
> below, sorry!)
>
> >
> > >
> > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > it to stay enabled, infrastricture for which was added in Commit
> > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > >
> > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > result is that RK3576 devices with the UFS controller enabled but unused
> > > will freeze once pmdomain shuts off unused domains.
> > >
> > > Revert it until a better fix can be found.
> >
> > This sounds a bit vague to me, can you please clarify and elaborate a
> > bit more so I can try to help.
> >
> > What does "UFS controller enabled but unused" actually mean? Has the
> > UFS controller driver been probed successfully and thus its
> > corresponding device been attached to its PM domain?
>
> It means the UFS controller driver has probed, but does not find a
> UFS storage chip connected to it, and therefore reports a probe
> failure. This is a possibility on single-board computers like the
> Radxa ROCK 4D, where the UFS storage is a separate module that plugs
> into some headers.
>
> > Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> > to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> > until sync_state") brings along with its corresponding sync_state
> > series for genpd [1]. Again, more information is needed to understand
> > what goes wrong.
>
> The reason why Rockchip's UFS driver needs this function is that once
> the RK3576_PD_USB power domain is turned on on the RK3576 SoC, it should
> not be toggled off again until a whole SoC suspend/resume cycle because
> the off/on operation is seemingly not idempotent.

So how about adding some prints in the genpd->power_on|off() callbacks
to see what goes on during boot. Along with some prints in the UFS
driver's ->probe().

In particular, what is the difference before and after the revert.

> This does not preclude
> turning off the power domain if the device isn't used at all, e.g. the DT
> node is absent. This is why the affected PDs are not marked as always-on
> in the PD init data for the SoC, as the device driver is the best place to
> set this during runtime.
>
> The way I got to this commit is through a bisect between the UFS node
> being enabled on the ROCK 4D (commit 00abee2b18342d6c2f6f37225682fa7ca0d33142)
> and v6.17-rc4. The bisect landed on the pmdomain merge commit
> (commit fc8f5028eb0cc5aee0501a99f59a04f748fbff1c) as the first bad commit,
> so I checked out v6.16, cherry-picked the UFS node enablement, made sure it
> works fine with that, and then rebased the 44 pmdomain commits that this
> merge commit merged onto this base. I verified that the tip of that then
> exhibited the faulty behaviour, namely that my ROCK 4D was locking up
> some time after boot, right after the kernel log message
>
>     [   33.756516] vdd_npu_s0: disabling
>
> so presumably when unused regulators and domains were being disabled.
> Aside note: setting vdd_npu_s0 to always-on also works to work around
> the issue, and I'm not quite sure why, because this regulator is not
> used for anything right now so this may be some peculiar SoC silicon
> design where VDD_NPU is leaking into the part that the UFS PD actually
> should be gating, preventing the lockup on accident.
>
> Anyway, so I did a bisect between the UFS introduction and the rebased
> tip of the pmdomain branch, and landed on the commit I'm reverting.
>
> The problem exhibits itself not when the affect PD is first turned off,
> but when the NPU regulator is turned off. How could this be relevant to
> power domains at all? I have no idea.
>
> I admit that I don't understand the commit I'm reverting, as it talks of
> keeping powered-on genpds on, but the code sets a member called "stay_on"
> to false.
>
> However, reverting it 100% reproducibly fixes the observed lockup. The
> lockup does not occur if an UFS module is connected to the SBC.
>
> It seems `dev_pm_genpd_rpm_always_on` is never run if UFS experiences
> a probe failure, so I'm not entirely sure how this specific commit
> changes the behaviour in a way that makes it unhappy. I agree the
> solution is probably not a revert here. Also, adding an unconditional
> `dev_pm_genpd_rpm_always_on` in the failing UFS probe path doesn't
> work, likely because the driver is unbound. Maybe this is a complete
> red herring and `dev_pm_genpd_rpm_always_on` is unrelated.
>
> The only other device that uses RK3576_PD_USB is the usb_drd0_dwc3
> usb controller. This node is not enabled in my rebase-pd-onto-ufs-enable
> branch, so even assuming the problem is that the usb driver is missing
> the same `dev_pm_genpd_rpm_always_on` call, that shouldn't matter
> becuase nothing else is using that power domain. Unless, of course, our
> description of that power domain is incomplete, which is possible.
>
> The problem may be that this was always racey and the genpd changes
> just made the race go wrong more often.
>
> Another observation: my kernel log during afflicted bootups contains
>
>     rockchip-pm-domain 27380000.power-management:power-controller: sync_state() pending due to 2a2d0000.ufshc
>
> A further observation: pd_ignore_unused does not fix it. Looking at
> the revert and experimenting, removing the `stay_on = false` is not
> what actually fixes my problem, but removing the
> `#ifndef CONFIG_PM_GENERIC_DOMAINS_OF`, even without setting
> pd_ignore_unused, fixes my issue.
>
> This is probably a big enough clue to suss out what's going on; that
> I *need* "[    2.868987] PM: genpd: Disabling unused power domains"
> to run quite early before the regulator that feeds VDD_NPU is disabled
> after the bootup completes unless an UFS module is present or the UFS
> controller is disabled makes me think this is another peculiarity of
> the RK3576 power domains hardware, because none of this I just wrote
> sounds like the words of a sane human.

Thanks a lot for sharing more information!

As I just responded to Heiko's email, my guess is that PM domain needs
the genpd->power_off() callback to be invoked first, before it can be
properly powered-on via genpd->power_on().

In some way we need to make sure the PM domain (genpd) is in a correct
state before we call pm_genpd_init(). Exactly how, I think you can
explore with different approaches.

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
  2025-09-04 16:13     ` Ulf Hansson
@ 2025-09-04 17:41       ` Nicolas Frattaroli
  0 siblings, 0 replies; 10+ messages in thread
From: Nicolas Frattaroli @ 2025-09-04 17:41 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
	Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner

On Thursday, 4 September 2025 18:13:22 Central European Summer Time Ulf Hansson wrote:
> On Thu, 4 Sept 2025 at 14:50, Nicolas Frattaroli
> <nicolas.frattaroli@collabora.com> wrote:
> >
> > On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> > > On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> > > <nicolas.frattaroli@collabora.com> wrote:
> > > >
> > > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> > >
> > > I can't find this commit hash. What tree are you using when testing this?
> > >
> > > Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
> >
> > Probably, I did the revert on a rebased branch and then rebased the revert
> > onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
> > there is a v2 (it might actually become a different patch, see huge text
> > below, sorry!)
> >
> > >
> > > >
> > > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > > it to stay enabled, infrastricture for which was added in Commit
> > > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > > >
> > > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > > result is that RK3576 devices with the UFS controller enabled but unused
> > > > will freeze once pmdomain shuts off unused domains.
> > > >
> > > > Revert it until a better fix can be found.
> > >
> > > This sounds a bit vague to me, can you please clarify and elaborate a
> > > bit more so I can try to help.
> > >
> > > What does "UFS controller enabled but unused" actually mean? Has the
> > > UFS controller driver been probed successfully and thus its
> > > corresponding device been attached to its PM domain?
> >
> > It means the UFS controller driver has probed, but does not find a
> > UFS storage chip connected to it, and therefore reports a probe
> > failure. This is a possibility on single-board computers like the
> > Radxa ROCK 4D, where the UFS storage is a separate module that plugs
> > into some headers.
> >
> > > Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> > > to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> > > until sync_state") brings along with its corresponding sync_state
> > > series for genpd [1]. Again, more information is needed to understand
> > > what goes wrong.
> >
> > The reason why Rockchip's UFS driver needs this function is that once
> > the RK3576_PD_USB power domain is turned on on the RK3576 SoC, it should
> > not be toggled off again until a whole SoC suspend/resume cycle because
> > the off/on operation is seemingly not idempotent.
> 
> So how about adding some prints in the genpd->power_on|off() callbacks
> to see what goes on during boot. Along with some prints in the UFS
> driver's ->probe().
> 
> In particular, what is the difference before and after the revert.

See my reply to myself, the problem is that power_off() is never called
before the revert. UFS was a red herring, it just changes the timings,
the real culprit is the NPU PD.

After the revert, the NPU domains do get powered off way before the regulator
ever gets touched. I've narrowed it down to just the npu PDs by doing an strstr
in the unused power domains function on the PD name for "npu".

> 
> > This does not preclude
> > turning off the power domain if the device isn't used at all, e.g. the DT
> > node is absent. This is why the affected PDs are not marked as always-on
> > in the PD init data for the SoC, as the device driver is the best place to
> > set this during runtime.
> >
> > The way I got to this commit is through a bisect between the UFS node
> > being enabled on the ROCK 4D (commit 00abee2b18342d6c2f6f37225682fa7ca0d33142)
> > and v6.17-rc4. The bisect landed on the pmdomain merge commit
> > (commit fc8f5028eb0cc5aee0501a99f59a04f748fbff1c) as the first bad commit,
> > so I checked out v6.16, cherry-picked the UFS node enablement, made sure it
> > works fine with that, and then rebased the 44 pmdomain commits that this
> > merge commit merged onto this base. I verified that the tip of that then
> > exhibited the faulty behaviour, namely that my ROCK 4D was locking up
> > some time after boot, right after the kernel log message
> >
> >     [   33.756516] vdd_npu_s0: disabling
> >
> > so presumably when unused regulators and domains were being disabled.
> > Aside note: setting vdd_npu_s0 to always-on also works to work around
> > the issue, and I'm not quite sure why, because this regulator is not
> > used for anything right now so this may be some peculiar SoC silicon
> > design where VDD_NPU is leaking into the part that the UFS PD actually
> > should be gating, preventing the lockup on accident.
> >
> > Anyway, so I did a bisect between the UFS introduction and the rebased
> > tip of the pmdomain branch, and landed on the commit I'm reverting.
> >
> > The problem exhibits itself not when the affect PD is first turned off,
> > but when the NPU regulator is turned off. How could this be relevant to
> > power domains at all? I have no idea.
> >
> > I admit that I don't understand the commit I'm reverting, as it talks of
> > keeping powered-on genpds on, but the code sets a member called "stay_on"
> > to false.
> >
> > However, reverting it 100% reproducibly fixes the observed lockup. The
> > lockup does not occur if an UFS module is connected to the SBC.
> >
> > It seems `dev_pm_genpd_rpm_always_on` is never run if UFS experiences
> > a probe failure, so I'm not entirely sure how this specific commit
> > changes the behaviour in a way that makes it unhappy. I agree the
> > solution is probably not a revert here. Also, adding an unconditional
> > `dev_pm_genpd_rpm_always_on` in the failing UFS probe path doesn't
> > work, likely because the driver is unbound. Maybe this is a complete
> > red herring and `dev_pm_genpd_rpm_always_on` is unrelated.
> >
> > The only other device that uses RK3576_PD_USB is the usb_drd0_dwc3
> > usb controller. This node is not enabled in my rebase-pd-onto-ufs-enable
> > branch, so even assuming the problem is that the usb driver is missing
> > the same `dev_pm_genpd_rpm_always_on` call, that shouldn't matter
> > becuase nothing else is using that power domain. Unless, of course, our
> > description of that power domain is incomplete, which is possible.
> >
> > The problem may be that this was always racey and the genpd changes
> > just made the race go wrong more often.
> >
> > Another observation: my kernel log during afflicted bootups contains
> >
> >     rockchip-pm-domain 27380000.power-management:power-controller: sync_state() pending due to 2a2d0000.ufshc
> >
> > A further observation: pd_ignore_unused does not fix it. Looking at
> > the revert and experimenting, removing the `stay_on = false` is not
> > what actually fixes my problem, but removing the
> > `#ifndef CONFIG_PM_GENERIC_DOMAINS_OF`, even without setting
> > pd_ignore_unused, fixes my issue.
> >
> > This is probably a big enough clue to suss out what's going on; that
> > I *need* "[    2.868987] PM: genpd: Disabling unused power domains"
> > to run quite early before the regulator that feeds VDD_NPU is disabled
> > after the bootup completes unless an UFS module is present or the UFS
> > controller is disabled makes me think this is another peculiarity of
> > the RK3576 power domains hardware, because none of this I just wrote
> > sounds like the words of a sane human.
> 
> Thanks a lot for sharing more information!
> 
> As I just responded to Heiko's email, my guess is that PM domain needs
> the genpd->power_off() callback to be invoked first, before it can be
> properly powered-on via genpd->power_on().

It's never powered off or powered on. In my case, the SoC dies before
the affected NPU domains are ever touched apart from the probe function.

> In some way we need to make sure the PM domain (genpd) is in a correct
> state before we call pm_genpd_init(). Exactly how, I think you can
> explore with different approaches.

I'm not sure we're able to really do this, as we don't know before drivers
have probed whether a PD shouldn't be turned off, due to the
aforementioned rpm_always_on thing.

What we really need is a way to add regulators as suppliers to PDs after
the initial power controller probe function, but before the the regulator
unused suspension is run, regardless of whether a PD is used by any device
that's got a driver bound.

Alternatively, the regulator core needs to somehow synchronise with the
genpd core to have it turn off its unused domains first.

Both of these don't seem like solutions that can be snuck in during an
rc, but there really isn't any way right now to fix this from within
Rockchip's power domain controller unless we just hardcode another flag
that says a certain PD should be turned off during probe, which is a
really dirty hack.

Kind regards,
Nicolas Frattaroli

> 
> [...]
> 
> Kind regards
> Uffe
> 





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-09-04 17:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-02 18:23 [PATCH] pmdomian: core: don't unset stay_on during sync_state Nicolas Frattaroli
2025-09-02 18:25 ` Nicolas Frattaroli
2025-09-04  9:17 ` Ulf Hansson
2025-09-04 12:50   ` Nicolas Frattaroli
2025-09-04 15:43     ` Nicolas Frattaroli
2025-09-04 16:13     ` Ulf Hansson
2025-09-04 17:41       ` Nicolas Frattaroli
2025-09-04 15:49 ` Heiko Stübner
2025-09-04 15:55   ` Heiko Stübner
2025-09-04 16:07     ` Ulf Hansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).