* [PATCH] pmdomian: core: don't unset stay_on during sync_state
@ 2025-09-02 18:23 Nicolas Frattaroli
2025-09-02 18:25 ` Nicolas Frattaroli
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Nicolas Frattaroli @ 2025-09-02 18:23 UTC (permalink / raw)
To: Ulf Hansson
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner,
Nicolas Frattaroli
This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
On RK3576, the UFS controller's power domain has a quirk that requires
it to stay enabled, infrastricture for which was added in Commit
cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
genpds on until sync_state") appears to break this quirk wholesale. The
result is that RK3576 devices with the UFS controller enabled but unused
will freeze once pmdomain shuts off unused domains.
Revert it until a better fix can be found.
Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
---
drivers/pmdomain/core.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
--- a/drivers/pmdomain/core.c
+++ b/drivers/pmdomain/core.c
@@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
return ret;
}
-#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
static bool pd_ignore_unused;
static int __init pd_ignore_unused_setup(char *__unused)
{
@@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
return 0;
}
late_initcall_sync(genpd_power_off_unused);
-#endif
#ifdef CONFIG_PM_SLEEP
@@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
if (genpd->provider == of_fwnode_handle(np)) {
genpd_lock(genpd);
- genpd->stay_on = false;
genpd_power_off(genpd, false, 0);
genpd_unlock(genpd);
}
@@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
case GENPD_SYNC_STATE_SIMPLE:
genpd_lock(genpd);
- genpd->stay_on = false;
genpd_power_off(genpd, false, 0);
genpd_unlock(genpd);
break;
---
base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21
Best regards,
--
Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-02 18:23 [PATCH] pmdomian: core: don't unset stay_on during sync_state Nicolas Frattaroli
@ 2025-09-02 18:25 ` Nicolas Frattaroli
2025-09-04 9:17 ` Ulf Hansson
2025-09-04 15:49 ` Heiko Stübner
2 siblings, 0 replies; 15+ messages in thread
From: Nicolas Frattaroli @ 2025-09-02 18:25 UTC (permalink / raw)
To: Ulf Hansson
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
On Tuesday, 2 September 2025 20:23:04 Central European Summer Time Nicolas Frattaroli wrote:
> This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
>
> On RK3576, the UFS controller's power domain has a quirk that requires
> it to stay enabled, infrastricture for which was added in Commit
> cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
>
> Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> genpds on until sync_state") appears to break this quirk wholesale. The
> result is that RK3576 devices with the UFS controller enabled but unused
> will freeze once pmdomain shuts off unused domains.
>
> Revert it until a better fix can be found.
>
> Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> drivers/pmdomain/core.c | 4 ----
> 1 file changed, 4 deletions(-)
>
And of course I notice the typo in the subject (pmdomian) as soon as I hit
enter to confirm send. D'oh!
Ulf, can you fix that if this gets applied? Thank you!
Kind regards,
Nicolas Frattaroli
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-02 18:23 [PATCH] pmdomian: core: don't unset stay_on during sync_state Nicolas Frattaroli
2025-09-02 18:25 ` Nicolas Frattaroli
@ 2025-09-04 9:17 ` Ulf Hansson
2025-09-04 12:50 ` Nicolas Frattaroli
2025-09-04 15:49 ` Heiko Stübner
2 siblings, 1 reply; 15+ messages in thread
From: Ulf Hansson @ 2025-09-04 9:17 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
>
> This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
I can't find this commit hash. What tree are you using when testing this?
Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
>
> On RK3576, the UFS controller's power domain has a quirk that requires
> it to stay enabled, infrastricture for which was added in Commit
> cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
>
> Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> genpds on until sync_state") appears to break this quirk wholesale. The
> result is that RK3576 devices with the UFS controller enabled but unused
> will freeze once pmdomain shuts off unused domains.
>
> Revert it until a better fix can be found.
This sounds a bit vague to me, can you please clarify and elaborate a
bit more so I can try to help.
What does "UFS controller enabled but unused" actually mean? Has the
UFS controller driver been probed successfully and thus its
corresponding device been attached to its PM domain?
Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
until sync_state") brings along with its corresponding sync_state
series for genpd [1]. Again, more information is needed to understand
what goes wrong.
Kind regards
Uffe
[1]
https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
>
> Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> drivers/pmdomain/core.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
> --- a/drivers/pmdomain/core.c
> +++ b/drivers/pmdomain/core.c
> @@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
> return ret;
> }
>
> -#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
> static bool pd_ignore_unused;
> static int __init pd_ignore_unused_setup(char *__unused)
> {
> @@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
> return 0;
> }
> late_initcall_sync(genpd_power_off_unused);
> -#endif
>
> #ifdef CONFIG_PM_SLEEP
>
> @@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
> list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
> if (genpd->provider == of_fwnode_handle(np)) {
> genpd_lock(genpd);
> - genpd->stay_on = false;
> genpd_power_off(genpd, false, 0);
> genpd_unlock(genpd);
> }
> @@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
>
> case GENPD_SYNC_STATE_SIMPLE:
> genpd_lock(genpd);
> - genpd->stay_on = false;
> genpd_power_off(genpd, false, 0);
> genpd_unlock(genpd);
> break;
>
> ---
> base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
> change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21
>
> Best regards,
> --
> Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-04 9:17 ` Ulf Hansson
@ 2025-09-04 12:50 ` Nicolas Frattaroli
2025-09-04 15:43 ` Nicolas Frattaroli
2025-09-04 16:13 ` Ulf Hansson
0 siblings, 2 replies; 15+ messages in thread
From: Nicolas Frattaroli @ 2025-09-04 12:50 UTC (permalink / raw)
To: Ulf Hansson
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> <nicolas.frattaroli@collabora.com> wrote:
> >
> > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
>
> I can't find this commit hash. What tree are you using when testing this?
>
> Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
Probably, I did the revert on a rebased branch and then rebased the revert
onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
there is a v2 (it might actually become a different patch, see huge text
below, sorry!)
>
> >
> > On RK3576, the UFS controller's power domain has a quirk that requires
> > it to stay enabled, infrastricture for which was added in Commit
> > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> >
> > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > genpds on until sync_state") appears to break this quirk wholesale. The
> > result is that RK3576 devices with the UFS controller enabled but unused
> > will freeze once pmdomain shuts off unused domains.
> >
> > Revert it until a better fix can be found.
>
> This sounds a bit vague to me, can you please clarify and elaborate a
> bit more so I can try to help.
>
> What does "UFS controller enabled but unused" actually mean? Has the
> UFS controller driver been probed successfully and thus its
> corresponding device been attached to its PM domain?
It means the UFS controller driver has probed, but does not find a
UFS storage chip connected to it, and therefore reports a probe
failure. This is a possibility on single-board computers like the
Radxa ROCK 4D, where the UFS storage is a separate module that plugs
into some headers.
> Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> until sync_state") brings along with its corresponding sync_state
> series for genpd [1]. Again, more information is needed to understand
> what goes wrong.
The reason why Rockchip's UFS driver needs this function is that once
the RK3576_PD_USB power domain is turned on on the RK3576 SoC, it should
not be toggled off again until a whole SoC suspend/resume cycle because
the off/on operation is seemingly not idempotent. This does not preclude
turning off the power domain if the device isn't used at all, e.g. the DT
node is absent. This is why the affected PDs are not marked as always-on
in the PD init data for the SoC, as the device driver is the best place to
set this during runtime.
The way I got to this commit is through a bisect between the UFS node
being enabled on the ROCK 4D (commit 00abee2b18342d6c2f6f37225682fa7ca0d33142)
and v6.17-rc4. The bisect landed on the pmdomain merge commit
(commit fc8f5028eb0cc5aee0501a99f59a04f748fbff1c) as the first bad commit,
so I checked out v6.16, cherry-picked the UFS node enablement, made sure it
works fine with that, and then rebased the 44 pmdomain commits that this
merge commit merged onto this base. I verified that the tip of that then
exhibited the faulty behaviour, namely that my ROCK 4D was locking up
some time after boot, right after the kernel log message
[ 33.756516] vdd_npu_s0: disabling
so presumably when unused regulators and domains were being disabled.
Aside note: setting vdd_npu_s0 to always-on also works to work around
the issue, and I'm not quite sure why, because this regulator is not
used for anything right now so this may be some peculiar SoC silicon
design where VDD_NPU is leaking into the part that the UFS PD actually
should be gating, preventing the lockup on accident.
Anyway, so I did a bisect between the UFS introduction and the rebased
tip of the pmdomain branch, and landed on the commit I'm reverting.
The problem exhibits itself not when the affect PD is first turned off,
but when the NPU regulator is turned off. How could this be relevant to
power domains at all? I have no idea.
I admit that I don't understand the commit I'm reverting, as it talks of
keeping powered-on genpds on, but the code sets a member called "stay_on"
to false.
However, reverting it 100% reproducibly fixes the observed lockup. The
lockup does not occur if an UFS module is connected to the SBC.
It seems `dev_pm_genpd_rpm_always_on` is never run if UFS experiences
a probe failure, so I'm not entirely sure how this specific commit
changes the behaviour in a way that makes it unhappy. I agree the
solution is probably not a revert here. Also, adding an unconditional
`dev_pm_genpd_rpm_always_on` in the failing UFS probe path doesn't
work, likely because the driver is unbound. Maybe this is a complete
red herring and `dev_pm_genpd_rpm_always_on` is unrelated.
The only other device that uses RK3576_PD_USB is the usb_drd0_dwc3
usb controller. This node is not enabled in my rebase-pd-onto-ufs-enable
branch, so even assuming the problem is that the usb driver is missing
the same `dev_pm_genpd_rpm_always_on` call, that shouldn't matter
becuase nothing else is using that power domain. Unless, of course, our
description of that power domain is incomplete, which is possible.
The problem may be that this was always racey and the genpd changes
just made the race go wrong more often.
Another observation: my kernel log during afflicted bootups contains
rockchip-pm-domain 27380000.power-management:power-controller: sync_state() pending due to 2a2d0000.ufshc
A further observation: pd_ignore_unused does not fix it. Looking at
the revert and experimenting, removing the `stay_on = false` is not
what actually fixes my problem, but removing the
`#ifndef CONFIG_PM_GENERIC_DOMAINS_OF`, even without setting
pd_ignore_unused, fixes my issue.
This is probably a big enough clue to suss out what's going on; that
I *need* "[ 2.868987] PM: genpd: Disabling unused power domains"
to run quite early before the regulator that feeds VDD_NPU is disabled
after the bootup completes unless an UFS module is present or the UFS
controller is disabled makes me think this is another peculiarity of
the RK3576 power domains hardware, because none of this I just wrote
sounds like the words of a sane human.
Kind regards,
Nicolas Frattaroli
>
> Kind regards
> Uffe
>
> [1]
> https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
>
> >
> > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > ---
> > drivers/pmdomain/core.c | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
> > --- a/drivers/pmdomain/core.c
> > +++ b/drivers/pmdomain/core.c
> > @@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
> > return ret;
> > }
> >
> > -#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
> > static bool pd_ignore_unused;
> > static int __init pd_ignore_unused_setup(char *__unused)
> > {
> > @@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
> > return 0;
> > }
> > late_initcall_sync(genpd_power_off_unused);
> > -#endif
> >
> > #ifdef CONFIG_PM_SLEEP
> >
> > @@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
> > list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
> > if (genpd->provider == of_fwnode_handle(np)) {
> > genpd_lock(genpd);
> > - genpd->stay_on = false;
> > genpd_power_off(genpd, false, 0);
> > genpd_unlock(genpd);
> > }
> > @@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
> >
> > case GENPD_SYNC_STATE_SIMPLE:
> > genpd_lock(genpd);
> > - genpd->stay_on = false;
> > genpd_power_off(genpd, false, 0);
> > genpd_unlock(genpd);
> > break;
> >
> > ---
> > base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
> > change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21
> >
> > Best regards,
> > --
> > Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> >
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-04 12:50 ` Nicolas Frattaroli
@ 2025-09-04 15:43 ` Nicolas Frattaroli
2025-09-05 14:27 ` Ulf Hansson
2025-09-04 16:13 ` Ulf Hansson
1 sibling, 1 reply; 15+ messages in thread
From: Nicolas Frattaroli @ 2025-09-04 15:43 UTC (permalink / raw)
To: Ulf Hansson
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
On Thursday, 4 September 2025 14:50:13 Central European Summer Time you wrote:
> On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> > On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> > <nicolas.frattaroli@collabora.com> wrote:
> > >
> > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> >
> > I can't find this commit hash. What tree are you using when testing this?
> >
> > Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
>
> Probably, I did the revert on a rebased branch and then rebased the revert
> onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
> there is a v2 (it might actually become a different patch, see huge text
> below, sorry!)
>
> >
> > >
> > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > it to stay enabled, infrastricture for which was added in Commit
> > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > >
> > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > result is that RK3576 devices with the UFS controller enabled but unused
> > > will freeze once pmdomain shuts off unused domains.
> > >
> > > Revert it until a better fix can be found.
> >
> > This sounds a bit vague to me, can you please clarify and elaborate a
> > bit more so I can try to help.
> >
> > What does "UFS controller enabled but unused" actually mean? Has the
> > UFS controller driver been probed successfully and thus its
> > corresponding device been attached to its PM domain?
>
> It means the UFS controller driver has probed, but does not find a
> UFS storage chip connected to it, and therefore reports a probe
> failure. This is a possibility on single-board computers like the
> Radxa ROCK 4D, where the UFS storage is a separate module that plugs
> into some headers.
>
> > Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> > to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> > until sync_state") brings along with its corresponding sync_state
> > series for genpd [1]. Again, more information is needed to understand
> > what goes wrong.
>
> [snip very long rubber duck debugging session]
Okay so I believe I have found the root cause of the regression. UFS is
innocent, disabling UFS just happens to avoid it due to how the timing of
things works out.
The real issue is that the NPU power domains on the RK3576, which are
currently unused, have an undeclared dependency on vdd_npu_s0.
Declaring this dependency with a `domain-supply` and adding the
necessary flag in the rockchip PD controller to use it does not solve
he problem. This is because the rockchip PD controller cannot acquire
those supplies during probe, as they're not available yet and their
availability depends on the PD controller finishing probe.
That's why it acquires them in the PD enable callback, but the NPU
PDs are never enabled because they're unused.
This worked fine when unused PDs were still turned off quite early, as
this meant they were turned off before regulators. Now the unused
regulators are turned off before turning off the unused PDs happens.
I don't really see an easy way to fix this with a patch that's fit for
an rc cycle. We can't request the regulator early or even just add a
device link, as the regulator is not around yet.
Marking vdd_npu_s0 as always-on would be abusing DT to work around a
Linux kernel shortcoming, which is a no-no.
What we need is either a way to register with pmdomain core that
certain PDs need a late init for additional supplies, which is then
called before any of the unused regulator power off functionality is
invoked by the regulator core.
Any ideas?
Kind regards,
Nicolas Frattaroli
>
> Kind regards,
> Nicolas Frattaroli
>
> >
> > Kind regards
> > Uffe
> >
> > [1]
> > https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
> >
> > >
> > > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > > ---
> > > drivers/pmdomain/core.c | 4 ----
> > > 1 file changed, 4 deletions(-)
> > >
> > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > index 0006ab3d078972cc72a6dd22a2144fb31443e3da..4eba30c7c2fabcb250444fee27d7554473a4d0c2 100644
> > > --- a/drivers/pmdomain/core.c
> > > +++ b/drivers/pmdomain/core.c
> > > @@ -1357,7 +1357,6 @@ static int genpd_runtime_resume(struct device *dev)
> > > return ret;
> > > }
> > >
> > > -#ifndef CONFIG_PM_GENERIC_DOMAINS_OF
> > > static bool pd_ignore_unused;
> > > static int __init pd_ignore_unused_setup(char *__unused)
> > > {
> > > @@ -1393,7 +1392,6 @@ static int __init genpd_power_off_unused(void)
> > > return 0;
> > > }
> > > late_initcall_sync(genpd_power_off_unused);
> > > -#endif
> > >
> > > #ifdef CONFIG_PM_SLEEP
> > >
> > > @@ -3494,7 +3492,6 @@ void of_genpd_sync_state(struct device_node *np)
> > > list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
> > > if (genpd->provider == of_fwnode_handle(np)) {
> > > genpd_lock(genpd);
> > > - genpd->stay_on = false;
> > > genpd_power_off(genpd, false, 0);
> > > genpd_unlock(genpd);
> > > }
> > > @@ -3522,7 +3519,6 @@ static void genpd_provider_sync_state(struct device *dev)
> > >
> > > case GENPD_SYNC_STATE_SIMPLE:
> > > genpd_lock(genpd);
> > > - genpd->stay_on = false;
> > > genpd_power_off(genpd, false, 0);
> > > genpd_unlock(genpd);
> > > break;
> > >
> > > ---
> > > base-commit: 5cc61f86dff464a63b6a6e4758f26557fda4d494
> > > change-id: 20250902-rk3576-lockup-regression-5b1f1fb7ff21
> > >
> > > Best regards,
> > > --
> > > Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > >
> >
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-02 18:23 [PATCH] pmdomian: core: don't unset stay_on during sync_state Nicolas Frattaroli
2025-09-02 18:25 ` Nicolas Frattaroli
2025-09-04 9:17 ` Ulf Hansson
@ 2025-09-04 15:49 ` Heiko Stübner
2025-09-04 15:55 ` Heiko Stübner
2 siblings, 1 reply; 15+ messages in thread
From: Heiko Stübner @ 2025-09-04 15:49 UTC (permalink / raw)
To: Ulf Hansson, Nicolas Frattaroli
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Nicolas Frattaroli
Hi,
Am Dienstag, 2. September 2025, 20:23:04 Mitteleuropäische Sommerzeit schrieb Nicolas Frattaroli:
> This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
>
> On RK3576, the UFS controller's power domain has a quirk that requires
> it to stay enabled, infrastricture for which was added in Commit
> cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
>
> Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> genpds on until sync_state") appears to break this quirk wholesale. The
> result is that RK3576 devices with the UFS controller enabled but unused
> will freeze once pmdomain shuts off unused domains.
>
> Revert it until a better fix can be found.
>
> Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
just an observation independent of the conversation in the other thread.
This patch/revert whatever fixes an actual issue for me.
On the rk3588 the NPU power-domains are a hirarchy of
PD_NPU
PD_NPUTOP (core0)
PD_NPU1 (core1)
PD_NPU2 (core2)
and the PD_NPU does need a supply regulator.
(1) With "just" v6.17-rc + the rocket driver probing and then idling, I get:
# cat /sys/kernel/debug/regulator/regulator_summary
regulator use open bypass opmode voltage current min max
---------------------------------------------------------------------------------------
dc_12v 4 3 0 unknown 12000mV 0mA 12000mV 12000mV
[...]
vcc5v0_baseboard 2 1 0 unknown 5000mV 0mA 5000mV 5000mV
vcc5v0_sys 18 18 0 unknown 5000mV 0mA 5000mV 5000mV
[...]
vdd_npu_s0 0 0 0 normal 800mV 0mA 550mV 950mV
vcc_1v2_s3 2 1 0 unknown 1200mV 0mA 1200mV 1200mV
fe1b0000.ethernet-phy 1 0mA 0mV 0mV
vdd_gpu_s0 1 2 0 normal 675mV 0mA 550mV 950mV
fb000000.gpu-mali 1 0mA 675mV 850mV
fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
[...]
# cat /sys/kernel/debug/pm_genpd/pm_genpd_summary
domain status children performance
/device runtime status managed by
------------------------------------------------------------------------------
[...]
gpu off-0 0
fb000000.gpu suspended 0 SW
npu2 off-0 0
fdada000.iommu suspended 0 SW
fdad0000.npu suspended 0 SW
npu1 off-0 0
fdaca000.iommu suspended 0 SW
fdac0000.npu suspended 0 SW
nputop off-0 0
npu1, npu2
fdab9000.iommu suspended 0 SW
fdab0000.npu suspended 0 SW
npu on 0
nputop
Observe that the PD_NPU never got its regulator and the domain also
never actually gets turned off. While the domains directly attached to
the cores get turned off.
(2) with Nicolas's patch applied on top, I get the correct behaviour,
that was also happening with v6.16 before
# cat /sys/kernel/debug/regulator/regulator_summary
regulator use open bypass opmode voltage current min max
---------------------------------------------------------------------------------------
dc_12v 4 3 0 unknown 12000mV 0mA 12000mV 12000mV
[...]
vcc5v0_baseboard 2 1 0 unknown 5000mV 0mA 5000mV 5000mV
vcc5v0_sys 18 18 0 unknown 5000mV 0mA 5000mV 5000mV
[...]
vdd_npu_s0 0 1 0 normal 800mV 0mA 550mV 950mV
fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
vdd_cpu_big1_s0 2 1 0 normal 1000mV 0mA 550mV 1050mV
cpu6-cpu 1 0mA 1000mV 1000mV
vdd_gpu_s0 1 2 0 normal 675mV 0mA 550mV 950mV
fb000000.gpu-mali 1 0mA 675mV 850mV
fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
[...]
# cat /sys/kernel/debug/pm_genpd/pm_genpd_summary
domain status children performance
/device runtime status managed by
------------------------------------------------------------------------------
[...]
gpu off-0 0
fb000000.gpu suspended 0 SW
npu2 off-0 0
fdada000.iommu suspended 0 SW
fdad0000.npu suspended 0 SW
npu1 off-0 0
fdaca000.iommu suspended 0 SW
fdac0000.npu suspended 0 SW
nputop off-0 0
npu1, npu2
fdab9000.iommu suspended 0 SW
fdab0000.npu suspended 0 SW
npu off-0 0
nputop
The regulator handling is working correctly and also the parent PD_NPU
domain gets turned off when its children are off.
Heiko
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-04 15:49 ` Heiko Stübner
@ 2025-09-04 15:55 ` Heiko Stübner
2025-09-04 16:07 ` Ulf Hansson
0 siblings, 1 reply; 15+ messages in thread
From: Heiko Stübner @ 2025-09-04 15:55 UTC (permalink / raw)
To: Ulf Hansson, Nicolas Frattaroli
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Nicolas Frattaroli
Am Donnerstag, 4. September 2025, 17:49:16 Mitteleuropäische Sommerzeit schrieb Heiko Stübner:
> Hi,
>
> Am Dienstag, 2. September 2025, 20:23:04 Mitteleuropäische Sommerzeit schrieb Nicolas Frattaroli:
> > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> >
> > On RK3576, the UFS controller's power domain has a quirk that requires
> > it to stay enabled, infrastricture for which was added in Commit
> > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> >
> > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > genpds on until sync_state") appears to break this quirk wholesale. The
> > result is that RK3576 devices with the UFS controller enabled but unused
> > will freeze once pmdomain shuts off unused domains.
> >
> > Revert it until a better fix can be found.
> >
> > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
>
> just an observation independent of the conversation in the other thread.
> This patch/revert whatever fixes an actual issue for me.
ah and just saw Nicolas' mail from 6 minutes earlier.
So I guess what saves me here is the rocket driver being built as a
module, power-domain getting turned off early, rocket probes with
pd off and then gets its supplies as expected.
Heiko
>
> On the rk3588 the NPU power-domains are a hirarchy of
>
> PD_NPU
> PD_NPUTOP (core0)
> PD_NPU1 (core1)
> PD_NPU2 (core2)
>
> and the PD_NPU does need a supply regulator.
>
> (1) With "just" v6.17-rc + the rocket driver probing and then idling, I get:
>
> # cat /sys/kernel/debug/regulator/regulator_summary
> regulator use open bypass opmode voltage current min max
> ---------------------------------------------------------------------------------------
> dc_12v 4 3 0 unknown 12000mV 0mA 12000mV 12000mV
> [...]
> vcc5v0_baseboard 2 1 0 unknown 5000mV 0mA 5000mV 5000mV
> vcc5v0_sys 18 18 0 unknown 5000mV 0mA 5000mV 5000mV
> [...]
> vdd_npu_s0 0 0 0 normal 800mV 0mA 550mV 950mV
> vcc_1v2_s3 2 1 0 unknown 1200mV 0mA 1200mV 1200mV
> fe1b0000.ethernet-phy 1 0mA 0mV 0mV
> vdd_gpu_s0 1 2 0 normal 675mV 0mA 550mV 950mV
> fb000000.gpu-mali 1 0mA 675mV 850mV
> fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
> [...]
>
> # cat /sys/kernel/debug/pm_genpd/pm_genpd_summary
> domain status children performance
> /device runtime status managed by
> ------------------------------------------------------------------------------
> [...]
> gpu off-0 0
> fb000000.gpu suspended 0 SW
> npu2 off-0 0
> fdada000.iommu suspended 0 SW
> fdad0000.npu suspended 0 SW
> npu1 off-0 0
> fdaca000.iommu suspended 0 SW
> fdac0000.npu suspended 0 SW
> nputop off-0 0
> npu1, npu2
> fdab9000.iommu suspended 0 SW
> fdab0000.npu suspended 0 SW
> npu on 0
> nputop
>
> Observe that the PD_NPU never got its regulator and the domain also
> never actually gets turned off. While the domains directly attached to
> the cores get turned off.
>
>
> (2) with Nicolas's patch applied on top, I get the correct behaviour,
> that was also happening with v6.16 before
>
> # cat /sys/kernel/debug/regulator/regulator_summary
> regulator use open bypass opmode voltage current min max
> ---------------------------------------------------------------------------------------
> dc_12v 4 3 0 unknown 12000mV 0mA 12000mV 12000mV
> [...]
> vcc5v0_baseboard 2 1 0 unknown 5000mV 0mA 5000mV 5000mV
> vcc5v0_sys 18 18 0 unknown 5000mV 0mA 5000mV 5000mV
> [...]
> vdd_npu_s0 0 1 0 normal 800mV 0mA 550mV 950mV
> fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
> vdd_cpu_big1_s0 2 1 0 normal 1000mV 0mA 550mV 1050mV
> cpu6-cpu 1 0mA 1000mV 1000mV
> vdd_gpu_s0 1 2 0 normal 675mV 0mA 550mV 950mV
> fb000000.gpu-mali 1 0mA 675mV 850mV
> fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
> [...]
>
> # cat /sys/kernel/debug/pm_genpd/pm_genpd_summary
> domain status children performance
> /device runtime status managed by
> ------------------------------------------------------------------------------
> [...]
> gpu off-0 0
> fb000000.gpu suspended 0 SW
> npu2 off-0 0
> fdada000.iommu suspended 0 SW
> fdad0000.npu suspended 0 SW
> npu1 off-0 0
> fdaca000.iommu suspended 0 SW
> fdac0000.npu suspended 0 SW
> nputop off-0 0
> npu1, npu2
> fdab9000.iommu suspended 0 SW
> fdab0000.npu suspended 0 SW
> npu off-0 0
> nputop
>
> The regulator handling is working correctly and also the parent PD_NPU
> domain gets turned off when its children are off.
>
>
> Heiko
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-04 15:55 ` Heiko Stübner
@ 2025-09-04 16:07 ` Ulf Hansson
0 siblings, 0 replies; 15+ messages in thread
From: Ulf Hansson @ 2025-09-04 16:07 UTC (permalink / raw)
To: Nicolas Frattaroli, Heiko Stübner
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel
On Thu, 4 Sept 2025 at 17:55, Heiko Stübner <heiko@sntech.de> wrote:
>
> Am Donnerstag, 4. September 2025, 17:49:16 Mitteleuropäische Sommerzeit schrieb Heiko Stübner:
> > Hi,
> >
> > Am Dienstag, 2. September 2025, 20:23:04 Mitteleuropäische Sommerzeit schrieb Nicolas Frattaroli:
> > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> > >
> > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > it to stay enabled, infrastricture for which was added in Commit
> > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > >
> > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > result is that RK3576 devices with the UFS controller enabled but unused
> > > will freeze once pmdomain shuts off unused domains.
> > >
> > > Revert it until a better fix can be found.
> > >
> > > Fixes: de141a9aa52d ("pmdomain: core: Leave powered-on genpds on until sync_state")
> > > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> >
> > just an observation independent of the conversation in the other thread.
> > This patch/revert whatever fixes an actual issue for me.
>
> ah and just saw Nicolas' mail from 6 minutes earlier.
>
> So I guess what saves me here is the rocket driver being built as a
> module, power-domain getting turned off early, rocket probes with
> pd off and then gets its supplies as expected.
Okay, so to follow up on your analysis.
I am thinking that the PM domain "looks like" it's properly powered-on
during boot/init, while in-fact it may not.
In other words, calling pm_genpd_init() and stating that the genpd is
powered-on seems to be the problem here. Could a proper power-off be
done before calling pm_genpd_init() and then leaving it off until
later?
Kind regards
Uffe
>
>
> Heiko
>
>
> >
> > On the rk3588 the NPU power-domains are a hirarchy of
> >
> > PD_NPU
> > PD_NPUTOP (core0)
> > PD_NPU1 (core1)
> > PD_NPU2 (core2)
> >
> > and the PD_NPU does need a supply regulator.
> >
> > (1) With "just" v6.17-rc + the rocket driver probing and then idling, I get:
> >
> > # cat /sys/kernel/debug/regulator/regulator_summary
> > regulator use open bypass opmode voltage current min max
> > ---------------------------------------------------------------------------------------
> > dc_12v 4 3 0 unknown 12000mV 0mA 12000mV 12000mV
> > [...]
> > vcc5v0_baseboard 2 1 0 unknown 5000mV 0mA 5000mV 5000mV
> > vcc5v0_sys 18 18 0 unknown 5000mV 0mA 5000mV 5000mV
> > [...]
> > vdd_npu_s0 0 0 0 normal 800mV 0mA 550mV 950mV
> > vcc_1v2_s3 2 1 0 unknown 1200mV 0mA 1200mV 1200mV
> > fe1b0000.ethernet-phy 1 0mA 0mV 0mV
> > vdd_gpu_s0 1 2 0 normal 675mV 0mA 550mV 950mV
> > fb000000.gpu-mali 1 0mA 675mV 850mV
> > fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
> > [...]
> >
> > # cat /sys/kernel/debug/pm_genpd/pm_genpd_summary
> > domain status children performance
> > /device runtime status managed by
> > ------------------------------------------------------------------------------
> > [...]
> > gpu off-0 0
> > fb000000.gpu suspended 0 SW
> > npu2 off-0 0
> > fdada000.iommu suspended 0 SW
> > fdad0000.npu suspended 0 SW
> > npu1 off-0 0
> > fdaca000.iommu suspended 0 SW
> > fdac0000.npu suspended 0 SW
> > nputop off-0 0
> > npu1, npu2
> > fdab9000.iommu suspended 0 SW
> > fdab0000.npu suspended 0 SW
> > npu on 0
> > nputop
> >
> > Observe that the PD_NPU never got its regulator and the domain also
> > never actually gets turned off. While the domains directly attached to
> > the cores get turned off.
> >
> >
> > (2) with Nicolas's patch applied on top, I get the correct behaviour,
> > that was also happening with v6.16 before
> >
> > # cat /sys/kernel/debug/regulator/regulator_summary
> > regulator use open bypass opmode voltage current min max
> > ---------------------------------------------------------------------------------------
> > dc_12v 4 3 0 unknown 12000mV 0mA 12000mV 12000mV
> > [...]
> > vcc5v0_baseboard 2 1 0 unknown 5000mV 0mA 5000mV 5000mV
> > vcc5v0_sys 18 18 0 unknown 5000mV 0mA 5000mV 5000mV
> > [...]
> > vdd_npu_s0 0 1 0 normal 800mV 0mA 550mV 950mV
> > fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
> > vdd_cpu_big1_s0 2 1 0 normal 1000mV 0mA 550mV 1050mV
> > cpu6-cpu 1 0mA 1000mV 1000mV
> > vdd_gpu_s0 1 2 0 normal 675mV 0mA 550mV 950mV
> > fb000000.gpu-mali 1 0mA 675mV 850mV
> > fd8d8000.power-management:power-controller-domain 0 0mA 0mV 0mV
> > [...]
> >
> > # cat /sys/kernel/debug/pm_genpd/pm_genpd_summary
> > domain status children performance
> > /device runtime status managed by
> > ------------------------------------------------------------------------------
> > [...]
> > gpu off-0 0
> > fb000000.gpu suspended 0 SW
> > npu2 off-0 0
> > fdada000.iommu suspended 0 SW
> > fdad0000.npu suspended 0 SW
> > npu1 off-0 0
> > fdaca000.iommu suspended 0 SW
> > fdac0000.npu suspended 0 SW
> > nputop off-0 0
> > npu1, npu2
> > fdab9000.iommu suspended 0 SW
> > fdab0000.npu suspended 0 SW
> > npu off-0 0
> > nputop
> >
> > The regulator handling is working correctly and also the parent PD_NPU
> > domain gets turned off when its children are off.
> >
> >
> > Heiko
> >
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-04 12:50 ` Nicolas Frattaroli
2025-09-04 15:43 ` Nicolas Frattaroli
@ 2025-09-04 16:13 ` Ulf Hansson
2025-09-04 17:41 ` Nicolas Frattaroli
1 sibling, 1 reply; 15+ messages in thread
From: Ulf Hansson @ 2025-09-04 16:13 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
On Thu, 4 Sept 2025 at 14:50, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
>
> On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> > On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> > <nicolas.frattaroli@collabora.com> wrote:
> > >
> > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> >
> > I can't find this commit hash. What tree are you using when testing this?
> >
> > Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
>
> Probably, I did the revert on a rebased branch and then rebased the revert
> onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
> there is a v2 (it might actually become a different patch, see huge text
> below, sorry!)
>
> >
> > >
> > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > it to stay enabled, infrastricture for which was added in Commit
> > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > >
> > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > result is that RK3576 devices with the UFS controller enabled but unused
> > > will freeze once pmdomain shuts off unused domains.
> > >
> > > Revert it until a better fix can be found.
> >
> > This sounds a bit vague to me, can you please clarify and elaborate a
> > bit more so I can try to help.
> >
> > What does "UFS controller enabled but unused" actually mean? Has the
> > UFS controller driver been probed successfully and thus its
> > corresponding device been attached to its PM domain?
>
> It means the UFS controller driver has probed, but does not find a
> UFS storage chip connected to it, and therefore reports a probe
> failure. This is a possibility on single-board computers like the
> Radxa ROCK 4D, where the UFS storage is a separate module that plugs
> into some headers.
>
> > Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> > to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> > until sync_state") brings along with its corresponding sync_state
> > series for genpd [1]. Again, more information is needed to understand
> > what goes wrong.
>
> The reason why Rockchip's UFS driver needs this function is that once
> the RK3576_PD_USB power domain is turned on on the RK3576 SoC, it should
> not be toggled off again until a whole SoC suspend/resume cycle because
> the off/on operation is seemingly not idempotent.
So how about adding some prints in the genpd->power_on|off() callbacks
to see what goes on during boot. Along with some prints in the UFS
driver's ->probe().
In particular, what is the difference before and after the revert.
> This does not preclude
> turning off the power domain if the device isn't used at all, e.g. the DT
> node is absent. This is why the affected PDs are not marked as always-on
> in the PD init data for the SoC, as the device driver is the best place to
> set this during runtime.
>
> The way I got to this commit is through a bisect between the UFS node
> being enabled on the ROCK 4D (commit 00abee2b18342d6c2f6f37225682fa7ca0d33142)
> and v6.17-rc4. The bisect landed on the pmdomain merge commit
> (commit fc8f5028eb0cc5aee0501a99f59a04f748fbff1c) as the first bad commit,
> so I checked out v6.16, cherry-picked the UFS node enablement, made sure it
> works fine with that, and then rebased the 44 pmdomain commits that this
> merge commit merged onto this base. I verified that the tip of that then
> exhibited the faulty behaviour, namely that my ROCK 4D was locking up
> some time after boot, right after the kernel log message
>
> [ 33.756516] vdd_npu_s0: disabling
>
> so presumably when unused regulators and domains were being disabled.
> Aside note: setting vdd_npu_s0 to always-on also works to work around
> the issue, and I'm not quite sure why, because this regulator is not
> used for anything right now so this may be some peculiar SoC silicon
> design where VDD_NPU is leaking into the part that the UFS PD actually
> should be gating, preventing the lockup on accident.
>
> Anyway, so I did a bisect between the UFS introduction and the rebased
> tip of the pmdomain branch, and landed on the commit I'm reverting.
>
> The problem exhibits itself not when the affect PD is first turned off,
> but when the NPU regulator is turned off. How could this be relevant to
> power domains at all? I have no idea.
>
> I admit that I don't understand the commit I'm reverting, as it talks of
> keeping powered-on genpds on, but the code sets a member called "stay_on"
> to false.
>
> However, reverting it 100% reproducibly fixes the observed lockup. The
> lockup does not occur if an UFS module is connected to the SBC.
>
> It seems `dev_pm_genpd_rpm_always_on` is never run if UFS experiences
> a probe failure, so I'm not entirely sure how this specific commit
> changes the behaviour in a way that makes it unhappy. I agree the
> solution is probably not a revert here. Also, adding an unconditional
> `dev_pm_genpd_rpm_always_on` in the failing UFS probe path doesn't
> work, likely because the driver is unbound. Maybe this is a complete
> red herring and `dev_pm_genpd_rpm_always_on` is unrelated.
>
> The only other device that uses RK3576_PD_USB is the usb_drd0_dwc3
> usb controller. This node is not enabled in my rebase-pd-onto-ufs-enable
> branch, so even assuming the problem is that the usb driver is missing
> the same `dev_pm_genpd_rpm_always_on` call, that shouldn't matter
> becuase nothing else is using that power domain. Unless, of course, our
> description of that power domain is incomplete, which is possible.
>
> The problem may be that this was always racey and the genpd changes
> just made the race go wrong more often.
>
> Another observation: my kernel log during afflicted bootups contains
>
> rockchip-pm-domain 27380000.power-management:power-controller: sync_state() pending due to 2a2d0000.ufshc
>
> A further observation: pd_ignore_unused does not fix it. Looking at
> the revert and experimenting, removing the `stay_on = false` is not
> what actually fixes my problem, but removing the
> `#ifndef CONFIG_PM_GENERIC_DOMAINS_OF`, even without setting
> pd_ignore_unused, fixes my issue.
>
> This is probably a big enough clue to suss out what's going on; that
> I *need* "[ 2.868987] PM: genpd: Disabling unused power domains"
> to run quite early before the regulator that feeds VDD_NPU is disabled
> after the bootup completes unless an UFS module is present or the UFS
> controller is disabled makes me think this is another peculiarity of
> the RK3576 power domains hardware, because none of this I just wrote
> sounds like the words of a sane human.
Thanks a lot for sharing more information!
As I just responded to Heiko's email, my guess is that PM domain needs
the genpd->power_off() callback to be invoked first, before it can be
properly powered-on via genpd->power_on().
In some way we need to make sure the PM domain (genpd) is in a correct
state before we call pm_genpd_init(). Exactly how, I think you can
explore with different approaches.
[...]
Kind regards
Uffe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-04 16:13 ` Ulf Hansson
@ 2025-09-04 17:41 ` Nicolas Frattaroli
0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Frattaroli @ 2025-09-04 17:41 UTC (permalink / raw)
To: Ulf Hansson
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
On Thursday, 4 September 2025 18:13:22 Central European Summer Time Ulf Hansson wrote:
> On Thu, 4 Sept 2025 at 14:50, Nicolas Frattaroli
> <nicolas.frattaroli@collabora.com> wrote:
> >
> > On Thursday, 4 September 2025 11:17:01 Central European Summer Time Ulf Hansson wrote:
> > > On Tue, 2 Sept 2025 at 20:23, Nicolas Frattaroli
> > > <nicolas.frattaroli@collabora.com> wrote:
> > > >
> > > > This reverts commit de141a9aa52d6b2fbeb63f98975c2c72276f0878.
> > >
> > > I can't find this commit hash. What tree are you using when testing this?
> > >
> > > Are you trying to revert 0e789b491ba04c31de5c71249487593e386baa67 ?
> >
> > Probably, I did the revert on a rebased branch and then rebased the revert
> > onto v6.17-rc4 so it likely is the wrong hash here. I'll fix this in v2 if
> > there is a v2 (it might actually become a different patch, see huge text
> > below, sorry!)
> >
> > >
> > > >
> > > > On RK3576, the UFS controller's power domain has a quirk that requires
> > > > it to stay enabled, infrastricture for which was added in Commit
> > > > cd3fa304ba5c ("pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()").
> > > >
> > > > Unfortunately, Commit de141a9aa52d ("pmdomain: core: Leave powered-on
> > > > genpds on until sync_state") appears to break this quirk wholesale. The
> > > > result is that RK3576 devices with the UFS controller enabled but unused
> > > > will freeze once pmdomain shuts off unused domains.
> > > >
> > > > Revert it until a better fix can be found.
> > >
> > > This sounds a bit vague to me, can you please clarify and elaborate a
> > > bit more so I can try to help.
> > >
> > > What does "UFS controller enabled but unused" actually mean? Has the
> > > UFS controller driver been probed successfully and thus its
> > > corresponding device been attached to its PM domain?
> >
> > It means the UFS controller driver has probed, but does not find a
> > UFS storage chip connected to it, and therefore reports a probe
> > failure. This is a possibility on single-board computers like the
> > Radxa ROCK 4D, where the UFS storage is a separate module that plugs
> > into some headers.
> >
> > > Moreover, the behaviour of dev_pm_genpd_rpm_always_on() is orthogonal
> > > to what 0e789b491ba0 ("pmdomain: core: Leave powered-on genpds on
> > > until sync_state") brings along with its corresponding sync_state
> > > series for genpd [1]. Again, more information is needed to understand
> > > what goes wrong.
> >
> > The reason why Rockchip's UFS driver needs this function is that once
> > the RK3576_PD_USB power domain is turned on on the RK3576 SoC, it should
> > not be toggled off again until a whole SoC suspend/resume cycle because
> > the off/on operation is seemingly not idempotent.
>
> So how about adding some prints in the genpd->power_on|off() callbacks
> to see what goes on during boot. Along with some prints in the UFS
> driver's ->probe().
>
> In particular, what is the difference before and after the revert.
See my reply to myself, the problem is that power_off() is never called
before the revert. UFS was a red herring, it just changes the timings,
the real culprit is the NPU PD.
After the revert, the NPU domains do get powered off way before the regulator
ever gets touched. I've narrowed it down to just the npu PDs by doing an strstr
in the unused power domains function on the PD name for "npu".
>
> > This does not preclude
> > turning off the power domain if the device isn't used at all, e.g. the DT
> > node is absent. This is why the affected PDs are not marked as always-on
> > in the PD init data for the SoC, as the device driver is the best place to
> > set this during runtime.
> >
> > The way I got to this commit is through a bisect between the UFS node
> > being enabled on the ROCK 4D (commit 00abee2b18342d6c2f6f37225682fa7ca0d33142)
> > and v6.17-rc4. The bisect landed on the pmdomain merge commit
> > (commit fc8f5028eb0cc5aee0501a99f59a04f748fbff1c) as the first bad commit,
> > so I checked out v6.16, cherry-picked the UFS node enablement, made sure it
> > works fine with that, and then rebased the 44 pmdomain commits that this
> > merge commit merged onto this base. I verified that the tip of that then
> > exhibited the faulty behaviour, namely that my ROCK 4D was locking up
> > some time after boot, right after the kernel log message
> >
> > [ 33.756516] vdd_npu_s0: disabling
> >
> > so presumably when unused regulators and domains were being disabled.
> > Aside note: setting vdd_npu_s0 to always-on also works to work around
> > the issue, and I'm not quite sure why, because this regulator is not
> > used for anything right now so this may be some peculiar SoC silicon
> > design where VDD_NPU is leaking into the part that the UFS PD actually
> > should be gating, preventing the lockup on accident.
> >
> > Anyway, so I did a bisect between the UFS introduction and the rebased
> > tip of the pmdomain branch, and landed on the commit I'm reverting.
> >
> > The problem exhibits itself not when the affect PD is first turned off,
> > but when the NPU regulator is turned off. How could this be relevant to
> > power domains at all? I have no idea.
> >
> > I admit that I don't understand the commit I'm reverting, as it talks of
> > keeping powered-on genpds on, but the code sets a member called "stay_on"
> > to false.
> >
> > However, reverting it 100% reproducibly fixes the observed lockup. The
> > lockup does not occur if an UFS module is connected to the SBC.
> >
> > It seems `dev_pm_genpd_rpm_always_on` is never run if UFS experiences
> > a probe failure, so I'm not entirely sure how this specific commit
> > changes the behaviour in a way that makes it unhappy. I agree the
> > solution is probably not a revert here. Also, adding an unconditional
> > `dev_pm_genpd_rpm_always_on` in the failing UFS probe path doesn't
> > work, likely because the driver is unbound. Maybe this is a complete
> > red herring and `dev_pm_genpd_rpm_always_on` is unrelated.
> >
> > The only other device that uses RK3576_PD_USB is the usb_drd0_dwc3
> > usb controller. This node is not enabled in my rebase-pd-onto-ufs-enable
> > branch, so even assuming the problem is that the usb driver is missing
> > the same `dev_pm_genpd_rpm_always_on` call, that shouldn't matter
> > becuase nothing else is using that power domain. Unless, of course, our
> > description of that power domain is incomplete, which is possible.
> >
> > The problem may be that this was always racey and the genpd changes
> > just made the race go wrong more often.
> >
> > Another observation: my kernel log during afflicted bootups contains
> >
> > rockchip-pm-domain 27380000.power-management:power-controller: sync_state() pending due to 2a2d0000.ufshc
> >
> > A further observation: pd_ignore_unused does not fix it. Looking at
> > the revert and experimenting, removing the `stay_on = false` is not
> > what actually fixes my problem, but removing the
> > `#ifndef CONFIG_PM_GENERIC_DOMAINS_OF`, even without setting
> > pd_ignore_unused, fixes my issue.
> >
> > This is probably a big enough clue to suss out what's going on; that
> > I *need* "[ 2.868987] PM: genpd: Disabling unused power domains"
> > to run quite early before the regulator that feeds VDD_NPU is disabled
> > after the bootup completes unless an UFS module is present or the UFS
> > controller is disabled makes me think this is another peculiarity of
> > the RK3576 power domains hardware, because none of this I just wrote
> > sounds like the words of a sane human.
>
> Thanks a lot for sharing more information!
>
> As I just responded to Heiko's email, my guess is that PM domain needs
> the genpd->power_off() callback to be invoked first, before it can be
> properly powered-on via genpd->power_on().
It's never powered off or powered on. In my case, the SoC dies before
the affected NPU domains are ever touched apart from the probe function.
> In some way we need to make sure the PM domain (genpd) is in a correct
> state before we call pm_genpd_init(). Exactly how, I think you can
> explore with different approaches.
I'm not sure we're able to really do this, as we don't know before drivers
have probed whether a PD shouldn't be turned off, due to the
aforementioned rpm_always_on thing.
What we really need is a way to add regulators as suppliers to PDs after
the initial power controller probe function, but before the the regulator
unused suspension is run, regardless of whether a PD is used by any device
that's got a driver bound.
Alternatively, the regulator core needs to somehow synchronise with the
genpd core to have it turn off its unused domains first.
Both of these don't seem like solutions that can be snuck in during an
rc, but there really isn't any way right now to fix this from within
Rockchip's power domain controller unless we just hardcode another flag
that says a certain PD should be turned off during probe, which is a
really dirty hack.
Kind regards,
Nicolas Frattaroli
>
> [...]
>
> Kind regards
> Uffe
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-04 15:43 ` Nicolas Frattaroli
@ 2025-09-05 14:27 ` Ulf Hansson
2025-09-08 13:14 ` Nicolas Frattaroli
0 siblings, 1 reply; 15+ messages in thread
From: Ulf Hansson @ 2025-09-05 14:27 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
[...]
>
> Okay so I believe I have found the root cause of the regression. UFS is
> innocent, disabling UFS just happens to avoid it due to how the timing of
> things works out.
>
> The real issue is that the NPU power domains on the RK3576, which are
> currently unused, have an undeclared dependency on vdd_npu_s0.
>
> Declaring this dependency with a `domain-supply` and adding the
> necessary flag in the rockchip PD controller to use it does not solve
> he problem. This is because the rockchip PD controller cannot acquire
> those supplies during probe, as they're not available yet and their
> availability depends on the PD controller finishing probe.
>
> That's why it acquires them in the PD enable callback, but the NPU
> PDs are never enabled because they're unused.
>
> This worked fine when unused PDs were still turned off quite early, as
> this meant they were turned off before regulators. Now the unused
> regulators are turned off before turning off the unused PDs happens.
I see, thanks for sharing these details. What a mess.
>
> I don't really see an easy way to fix this with a patch that's fit for
> an rc cycle. We can't request the regulator early or even just add a
> device link, as the regulator is not around yet.
Right, I will work on a patch or two that allows rockchip
power-domains to opt-out from genpds new behavior and to keep using
the old one.
I think we prefer to do it like this (should be quite a limited amount
of code and okay for an rc), rather than reverting for everyone.
>
> Marking vdd_npu_s0 as always-on would be abusing DT to work around a
> Linux kernel shortcoming, which is a no-no.
>
> What we need is either a way to register with pmdomain core that
> certain PDs need a late init for additional supplies, which is then
> called before any of the unused regulator power off functionality is
> invoked by the regulator core.
>
> Any ideas?
Yes :-)
I would suggest implementing an auxiliary driver, along with the
rockchip_pm_domain_driver. The main job for the auxiliary driver would
be to get the regulator in its ->probe() - and if it fails because the
regulator isn't available yet, it should keep trying by returning
-EPROBE_DEFER. See more about the auxiliary bus/device/driver in
include/linux/auxiliary_bus.h and module_auxiliary_driver().
Moreover, when the rockchip_pm_domain_driver probes, it becomes
responsible for pre-parsing the OF nodes for the domain-supply DT
property, for each of the specified power-domains. If it finds a
domain-supply, it should register an auxiliary device that corresponds
to that particular power-domain. This can be done by using
platform-data that is shared with the auxiliary device/driver. See
devm_auxiliary_device_create().
Furthermore we would need some kind of synchronization mechanism
between the rockchip_pm_domain_driver and the auxiliary driver, to
manage the regulator get/enable/disable. I think that should be rather
easy to work out.
Do you think this can work?
Kind regards
Uffe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-05 14:27 ` Ulf Hansson
@ 2025-09-08 13:14 ` Nicolas Frattaroli
2025-09-08 13:40 ` Ulf Hansson
0 siblings, 1 reply; 15+ messages in thread
From: Nicolas Frattaroli @ 2025-09-08 13:14 UTC (permalink / raw)
To: Ulf Hansson
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
On Friday, 5 September 2025 16:27:27 Central European Summer Time Ulf Hansson wrote:
> [...]
>
> >
> > Okay so I believe I have found the root cause of the regression. UFS is
> > innocent, disabling UFS just happens to avoid it due to how the timing of
> > things works out.
> >
> > The real issue is that the NPU power domains on the RK3576, which are
> > currently unused, have an undeclared dependency on vdd_npu_s0.
> >
> > Declaring this dependency with a `domain-supply` and adding the
> > necessary flag in the rockchip PD controller to use it does not solve
> > he problem. This is because the rockchip PD controller cannot acquire
> > those supplies during probe, as they're not available yet and their
> > availability depends on the PD controller finishing probe.
> >
> > That's why it acquires them in the PD enable callback, but the NPU
> > PDs are never enabled because they're unused.
> >
> > This worked fine when unused PDs were still turned off quite early, as
> > this meant they were turned off before regulators. Now the unused
> > regulators are turned off before turning off the unused PDs happens.
>
> I see, thanks for sharing these details. What a mess.
>
Agreed :(
> >
> > I don't really see an easy way to fix this with a patch that's fit for
> > an rc cycle. We can't request the regulator early or even just add a
> > device link, as the regulator is not around yet.
>
> Right, I will work on a patch or two that allows rockchip
> power-domains to opt-out from genpds new behavior and to keep using
> the old one.
>
> I think we prefer to do it like this (should be quite a limited amount
> of code and okay for an rc), rather than reverting for everyone.
That sounds good to me.
>
> >
> > Marking vdd_npu_s0 as always-on would be abusing DT to work around a
> > Linux kernel shortcoming, which is a no-no.
> >
> > What we need is either a way to register with pmdomain core that
> > certain PDs need a late init for additional supplies, which is then
> > called before any of the unused regulator power off functionality is
> > invoked by the regulator core.
> >
> > Any ideas?
>
> Yes :-)
>
> I would suggest implementing an auxiliary driver, along with the
> rockchip_pm_domain_driver. The main job for the auxiliary driver would
> be to get the regulator in its ->probe() - and if it fails because the
> regulator isn't available yet, it should keep trying by returning
> -EPROBE_DEFER. See more about the auxiliary bus/device/driver in
> include/linux/auxiliary_bus.h and module_auxiliary_driver().
>
> Moreover, when the rockchip_pm_domain_driver probes, it becomes
> responsible for pre-parsing the OF nodes for the domain-supply DT
> property, for each of the specified power-domains. If it finds a
> domain-supply, it should register an auxiliary device that corresponds
> to that particular power-domain. This can be done by using
> platform-data that is shared with the auxiliary device/driver. See
> devm_auxiliary_device_create().
>
> Furthermore we would need some kind of synchronization mechanism
> between the rockchip_pm_domain_driver and the auxiliary driver, to
> manage the regulator get/enable/disable. I think that should be rather
> easy to work out.
>
> Do you think this can work?
This sounds similar to something Heiko suggested to me, and I agree
it could work. It does seem like a pretty painful solution though,
in terms of needed additional code and complexity to basically just
tell Linux "hey you can't get this regulator yet but please try
again later without our involvement".
To that end, I've tried working out a regulator-based solution to
it, where the rockchip_pm_domain driver registers a "proxy"
regulator for each power domain that wants a regulator, with its
supply set to the name of the real `domain-supply` regulator.
The logic behind this was that the regulator core runs a check
for whether every supply is resolved before it turns off unused
regulators. The hope was that if we register a proxy regulator
and enable it immediately in the pm_domain probe, then regulator
core will handle the actual dependency at precisely the right
time, namely before it checks to see whether any are unused and
can be turned off.
As I discovered though, this can't really work. The regulator
core will in this case just set the supply regulator to a dummy
regulator when it's enabled in our probe function, and later
supply resolving passes are then content that this supply is
resolved even though it is a dummy supply. There does not appear
to be any way to opt out of getting this dummy supply.
So knowing that this doesn't work, I have another idea, but I
feel like both the regulator subsystem and the pm-domains
subsystem will hate this: explicitly create an order between
the pmdomain idle check and the regulator idle check to make
sure that the pmdomains idle check runs first.
This would set in stone how the kernel worked previously for
kernel users that relied on this, but is a conceptually
unpleasant cross-subsystem dependency.
>
> Kind regards
> Uffe
>
Kind regards,
Nicolas Frattaroli
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-08 13:14 ` Nicolas Frattaroli
@ 2025-09-08 13:40 ` Ulf Hansson
2025-09-08 20:08 ` Sebastian Reichel
0 siblings, 1 reply; 15+ messages in thread
From: Ulf Hansson @ 2025-09-08 13:40 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: linux-pm, linux-kernel, kernel, linux-rockchip,
Cristian Ciocaltea, Sebastian Reichel, Heiko Stuebner
On Mon, 8 Sept 2025 at 15:14, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
>
> On Friday, 5 September 2025 16:27:27 Central European Summer Time Ulf Hansson wrote:
> > [...]
> >
> > >
> > > Okay so I believe I have found the root cause of the regression. UFS is
> > > innocent, disabling UFS just happens to avoid it due to how the timing of
> > > things works out.
> > >
> > > The real issue is that the NPU power domains on the RK3576, which are
> > > currently unused, have an undeclared dependency on vdd_npu_s0.
> > >
> > > Declaring this dependency with a `domain-supply` and adding the
> > > necessary flag in the rockchip PD controller to use it does not solve
> > > he problem. This is because the rockchip PD controller cannot acquire
> > > those supplies during probe, as they're not available yet and their
> > > availability depends on the PD controller finishing probe.
> > >
> > > That's why it acquires them in the PD enable callback, but the NPU
> > > PDs are never enabled because they're unused.
> > >
> > > This worked fine when unused PDs were still turned off quite early, as
> > > this meant they were turned off before regulators. Now the unused
> > > regulators are turned off before turning off the unused PDs happens.
> >
> > I see, thanks for sharing these details. What a mess.
> >
>
> Agreed :(
>
> > >
> > > I don't really see an easy way to fix this with a patch that's fit for
> > > an rc cycle. We can't request the regulator early or even just add a
> > > device link, as the regulator is not around yet.
> >
> > Right, I will work on a patch or two that allows rockchip
> > power-domains to opt-out from genpds new behavior and to keep using
> > the old one.
> >
> > I think we prefer to do it like this (should be quite a limited amount
> > of code and okay for an rc), rather than reverting for everyone.
>
> That sounds good to me.
>
> >
> > >
> > > Marking vdd_npu_s0 as always-on would be abusing DT to work around a
> > > Linux kernel shortcoming, which is a no-no.
> > >
> > > What we need is either a way to register with pmdomain core that
> > > certain PDs need a late init for additional supplies, which is then
> > > called before any of the unused regulator power off functionality is
> > > invoked by the regulator core.
> > >
> > > Any ideas?
> >
> > Yes :-)
> >
> > I would suggest implementing an auxiliary driver, along with the
> > rockchip_pm_domain_driver. The main job for the auxiliary driver would
> > be to get the regulator in its ->probe() - and if it fails because the
> > regulator isn't available yet, it should keep trying by returning
> > -EPROBE_DEFER. See more about the auxiliary bus/device/driver in
> > include/linux/auxiliary_bus.h and module_auxiliary_driver().
> >
> > Moreover, when the rockchip_pm_domain_driver probes, it becomes
> > responsible for pre-parsing the OF nodes for the domain-supply DT
> > property, for each of the specified power-domains. If it finds a
> > domain-supply, it should register an auxiliary device that corresponds
> > to that particular power-domain. This can be done by using
> > platform-data that is shared with the auxiliary device/driver. See
> > devm_auxiliary_device_create().
> >
> > Furthermore we would need some kind of synchronization mechanism
> > between the rockchip_pm_domain_driver and the auxiliary driver, to
> > manage the regulator get/enable/disable. I think that should be rather
> > easy to work out.
> >
> > Do you think this can work?
>
> This sounds similar to something Heiko suggested to me, and I agree
> it could work. It does seem like a pretty painful solution though,
> in terms of needed additional code and complexity to basically just
> tell Linux "hey you can't get this regulator yet but please try
> again later without our involvement".
Well, I would give this a go and see what you end up with. The nice
thing with this approach, I think, is that we get a driver and can use
the -EPROBE_DEFER mechanism.
Another option would be to explore using fw_devlink/device_links, to
somehow get a notification as soon as the regulator gets registered.
>
> To that end, I've tried working out a regulator-based solution to
> it, where the rockchip_pm_domain driver registers a "proxy"
> regulator for each power domain that wants a regulator, with its
> supply set to the name of the real `domain-supply` regulator.
>
> The logic behind this was that the regulator core runs a check
> for whether every supply is resolved before it turns off unused
> regulators. The hope was that if we register a proxy regulator
> and enable it immediately in the pm_domain probe, then regulator
> core will handle the actual dependency at precisely the right
> time, namely before it checks to see whether any are unused and
> can be turned off.
>
> As I discovered though, this can't really work. The regulator
> core will in this case just set the supply regulator to a dummy
> regulator when it's enabled in our probe function, and later
> supply resolving passes are then content that this supply is
> resolved even though it is a dummy supply. There does not appear
> to be any way to opt out of getting this dummy supply.
>
> So knowing that this doesn't work, I have another idea, but I
> feel like both the regulator subsystem and the pm-domains
> subsystem will hate this: explicitly create an order between
> the pmdomain idle check and the regulator idle check to make
> sure that the pmdomains idle check runs first.
I think those kinds of dependencies are better solved by using
fw_devlink/device_links.
>
> This would set in stone how the kernel worked previously for
> kernel users that relied on this, but is a conceptually
> unpleasant cross-subsystem dependency.
Then there are other resources cropping up that we need support,
extending to more subsystems. So, no, this does not seem like a
scalable solution to me. :-)
Kind regards
Uffe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-08 13:40 ` Ulf Hansson
@ 2025-09-08 20:08 ` Sebastian Reichel
2025-09-09 12:46 ` Ulf Hansson
0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Reichel @ 2025-09-08 20:08 UTC (permalink / raw)
To: Ulf Hansson
Cc: Nicolas Frattaroli, linux-pm, linux-kernel, kernel,
linux-rockchip, Cristian Ciocaltea, Heiko Stuebner
[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]
Hi,
On Mon, Sep 08, 2025 at 03:40:24PM +0200, Ulf Hansson wrote:
> On Mon, 8 Sept 2025 at 15:14, Nicolas Frattaroli wrote:
> > On Friday, 5 September 2025 16:27:27 CEST Ulf Hansson wrote:
> > > I would suggest implementing an auxiliary driver, along with the
> > > rockchip_pm_domain_driver. The main job for the auxiliary driver would
> > > be to get the regulator in its ->probe() - and if it fails because the
> > > regulator isn't available yet, it should keep trying by returning
> > > -EPROBE_DEFER. See more about the auxiliary bus/device/driver in
> > > include/linux/auxiliary_bus.h and module_auxiliary_driver().
> > >
> > > Moreover, when the rockchip_pm_domain_driver probes, it becomes
> > > responsible for pre-parsing the OF nodes for the domain-supply DT
> > > property, for each of the specified power-domains. If it finds a
> > > domain-supply, it should register an auxiliary device that corresponds
> > > to that particular power-domain. This can be done by using
> > > platform-data that is shared with the auxiliary device/driver. See
> > > devm_auxiliary_device_create().
> > >
> > > Furthermore we would need some kind of synchronization mechanism
> > > between the rockchip_pm_domain_driver and the auxiliary driver, to
> > > manage the regulator get/enable/disable. I think that should be rather
> > > easy to work out.
> > >
> > > Do you think this can work?
> >
> > This sounds similar to something Heiko suggested to me, and I agree
> > it could work. It does seem like a pretty painful solution though,
> > in terms of needed additional code and complexity to basically just
> > tell Linux "hey you can't get this regulator yet but please try
> > again later without our involvement".
>
> Well, I would give this a go and see what you end up with. The nice
> thing with this approach, I think, is that we get a driver and can use
> the -EPROBE_DEFER mechanism.
>
> Another option would be to explore using fw_devlink/device_links, to
> somehow get a notification as soon as the regulator gets registered.
I think the main pain issue with this is fw_devlink actually. The
power domain consumers are all referencing the main DT node. So once
it has been marked as initialized (of_genpd_add_provider_onecell()
calls fwnode_dev_initialized() at some point), fw_devlink allows the
consumers to be probed. As the DT node must be usable for processing
after the normal pmdomains are registered, this means the consumers
for pmdomains with "domain-supply" will potentially be probed too
early resulting in some extra -EPROBE_DEFER. OTOH that should be the
status quo, so probably it does not matter.
> I think those kinds of dependencies are better solved by using
> fw_devlink/device_links.
I think the regulator dependency tracking would happen automatically
when the auxillary sub-device uses the power-domain sub-node as its
device fwnode.
Greetings,
-- Sebastian
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pmdomian: core: don't unset stay_on during sync_state
2025-09-08 20:08 ` Sebastian Reichel
@ 2025-09-09 12:46 ` Ulf Hansson
0 siblings, 0 replies; 15+ messages in thread
From: Ulf Hansson @ 2025-09-09 12:46 UTC (permalink / raw)
To: Sebastian Reichel
Cc: Nicolas Frattaroli, linux-pm, linux-kernel, kernel,
linux-rockchip, Cristian Ciocaltea, Heiko Stuebner
On Mon, 8 Sept 2025 at 22:09, Sebastian Reichel
<sebastian.reichel@collabora.com> wrote:
>
> Hi,
>
> On Mon, Sep 08, 2025 at 03:40:24PM +0200, Ulf Hansson wrote:
> > On Mon, 8 Sept 2025 at 15:14, Nicolas Frattaroli wrote:
> > > On Friday, 5 September 2025 16:27:27 CEST Ulf Hansson wrote:
> > > > I would suggest implementing an auxiliary driver, along with the
> > > > rockchip_pm_domain_driver. The main job for the auxiliary driver would
> > > > be to get the regulator in its ->probe() - and if it fails because the
> > > > regulator isn't available yet, it should keep trying by returning
> > > > -EPROBE_DEFER. See more about the auxiliary bus/device/driver in
> > > > include/linux/auxiliary_bus.h and module_auxiliary_driver().
> > > >
> > > > Moreover, when the rockchip_pm_domain_driver probes, it becomes
> > > > responsible for pre-parsing the OF nodes for the domain-supply DT
> > > > property, for each of the specified power-domains. If it finds a
> > > > domain-supply, it should register an auxiliary device that corresponds
> > > > to that particular power-domain. This can be done by using
> > > > platform-data that is shared with the auxiliary device/driver. See
> > > > devm_auxiliary_device_create().
> > > >
> > > > Furthermore we would need some kind of synchronization mechanism
> > > > between the rockchip_pm_domain_driver and the auxiliary driver, to
> > > > manage the regulator get/enable/disable. I think that should be rather
> > > > easy to work out.
> > > >
> > > > Do you think this can work?
> > >
> > > This sounds similar to something Heiko suggested to me, and I agree
> > > it could work. It does seem like a pretty painful solution though,
> > > in terms of needed additional code and complexity to basically just
> > > tell Linux "hey you can't get this regulator yet but please try
> > > again later without our involvement".
> >
> > Well, I would give this a go and see what you end up with. The nice
> > thing with this approach, I think, is that we get a driver and can use
> > the -EPROBE_DEFER mechanism.
> >
> > Another option would be to explore using fw_devlink/device_links, to
> > somehow get a notification as soon as the regulator gets registered.
>
> I think the main pain issue with this is fw_devlink actually. The
> power domain consumers are all referencing the main DT node. So once
> it has been marked as initialized (of_genpd_add_provider_onecell()
> calls fwnode_dev_initialized() at some point), fw_devlink allows the
> consumers to be probed. As the DT node must be usable for processing
> after the normal pmdomains are registered, this means the consumers
> for pmdomains with "domain-supply" will potentially be probed too
> early resulting in some extra -EPROBE_DEFER. OTOH that should be the
> status quo, so probably it does not matter.
Right, the description in the DTS seems a bit odd to me too.
Although I was under the impression that there is kind of a circular
DT dependency here somewhere. Or are you saying that we could
potentially allow those power-domains that are described in
child-nodes to be registered later-on - and after their
domain-supply-regulator has become available?
>
> > I think those kinds of dependencies are better solved by using
> > fw_devlink/device_links.
>
> I think the regulator dependency tracking would happen automatically
> when the auxillary sub-device uses the power-domain sub-node as its
> device fwnode.
Yes, that is correct.
Would that work then? So there is no circular dependency that we forget about?
The pmic that provides the regulator may be behind an spi interface
and the spi controller isn't part of a power-domain that consumes the
regulator?
Kind regards
Uffe
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-09-09 12:46 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-02 18:23 [PATCH] pmdomian: core: don't unset stay_on during sync_state Nicolas Frattaroli
2025-09-02 18:25 ` Nicolas Frattaroli
2025-09-04 9:17 ` Ulf Hansson
2025-09-04 12:50 ` Nicolas Frattaroli
2025-09-04 15:43 ` Nicolas Frattaroli
2025-09-05 14:27 ` Ulf Hansson
2025-09-08 13:14 ` Nicolas Frattaroli
2025-09-08 13:40 ` Ulf Hansson
2025-09-08 20:08 ` Sebastian Reichel
2025-09-09 12:46 ` Ulf Hansson
2025-09-04 16:13 ` Ulf Hansson
2025-09-04 17:41 ` Nicolas Frattaroli
2025-09-04 15:49 ` Heiko Stübner
2025-09-04 15:55 ` Heiko Stübner
2025-09-04 16:07 ` Ulf Hansson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox