* [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
@ 2026-03-31 18:02 Daniel Bozeman
2026-03-31 18:02 ` [PATCH 2/2] pmdomain/rockchip: skip domains returning -EPROBE_DEFER Daniel Bozeman
2026-04-01 1:17 ` [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains Shawn Lin
0 siblings, 2 replies; 10+ messages in thread
From: Daniel Bozeman @ 2026-03-31 18:02 UTC (permalink / raw)
To: ulf.hansson, heiko
Cc: linux-pm, linux-arm-kernel, linux-rockchip, linux-kernel,
Daniel Bozeman
Idle-only power domains (pwr_mask == 0) cannot actually be powered
on or off. rockchip_do_pmu_set_power_domain() already returns early
for these domains, but rockchip_pd_power() still attempts QoS save
and idle requests before reaching that check.
On RK3528, the idle-only domains (PD_RKVENC, PD_VO, PD_VPU) have
QoS registers that may be inaccessible when the generic power domain
framework attempts to power them off, leading to synchronous external
aborts.
Return early from rockchip_pd_power() when pwr_mask is zero, matching
the existing guard in rockchip_do_pmu_set_power_domain().
Fixes: 1fe767a56c32 ("soc: rockchip: power-domain: allow domains only handling idle requests")
Signed-off-by: Daniel Bozeman <daniel@orb.net>
---
drivers/pmdomain/rockchip/pm-domains.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
index 490bbb1d1d..2eecae092a 100644
--- a/drivers/pmdomain/rockchip/pm-domains.c
+++ b/drivers/pmdomain/rockchip/pm-domains.c
@@ -640,6 +640,9 @@ static int rockchip_pd_power(struct rockchip_pm_domain *pd, bool power_on)
if (rockchip_pmu_domain_is_on(pd) == power_on)
return 0;
+ if (pd->info->pwr_mask == 0)
+ return 0;
+
ret = clk_bulk_enable(pd->num_clks, pd->clks);
if (ret < 0) {
dev_err(pmu->dev, "failed to enable clocks\n");
base-commit: bc330699801d3b4f99110365512caed5adcfaca3
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] pmdomain/rockchip: skip domains returning -EPROBE_DEFER
2026-03-31 18:02 [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains Daniel Bozeman
@ 2026-03-31 18:02 ` Daniel Bozeman
2026-04-01 1:17 ` [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains Shawn Lin
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Bozeman @ 2026-03-31 18:02 UTC (permalink / raw)
To: ulf.hansson, heiko
Cc: linux-pm, linux-arm-kernel, linux-rockchip, linux-kernel,
Daniel Bozeman
When iterating child nodes during probe, a single domain returning
-EPROBE_DEFER (e.g. due to clock dependencies not yet available)
causes the entire power domain controller to fail and tear down all
successfully registered domains.
This creates a window where devices in unrelated power domains that
would have registered successfully cannot access their hardware. On
RK3528, PD_GPU requires CRU clocks that may not be available yet,
but the idle-only domains (PD_RKVENC, PD_VO, PD_VPU) have no clock
requirements. When the controller fails due to PD_GPU, GPIO
controllers in PD_RKVENC become inaccessible, leading to synchronous
external aborts when GPIO-controlled regulators probe.
Skip domains that return -EPROBE_DEFER and continue registering the
rest. Skipped domains have NULL entries in the provider, causing
their consumers to receive -ENOENT and defer until available.
Fixes: 7c696693a4f5 ("soc: rockchip: power-domain: Add power domain driver")
Signed-off-by: Daniel Bozeman <daniel@orb.net>
---
drivers/pmdomain/rockchip/pm-domains.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
index 2eecae092a..f42880c94f 100644
--- a/drivers/pmdomain/rockchip/pm-domains.c
+++ b/drivers/pmdomain/rockchip/pm-domains.c
@@ -1077,6 +1077,11 @@ static int rockchip_pm_domain_probe(struct platform_device *pdev)
for_each_available_child_of_node_scoped(np, node) {
error = rockchip_pm_add_one_domain(pmu, node);
if (error) {
+ if (error == -EPROBE_DEFER) {
+ dev_dbg(dev, "skipped node %pOFn, dependencies not yet available\n",
+ node);
+ continue;
+ }
dev_err(dev, "failed to handle node %pOFn: %d\n",
node, error);
goto err_out;
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
2026-03-31 18:02 [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains Daniel Bozeman
2026-03-31 18:02 ` [PATCH 2/2] pmdomain/rockchip: skip domains returning -EPROBE_DEFER Daniel Bozeman
@ 2026-04-01 1:17 ` Shawn Lin
[not found] ` <CAG+Ngm+xJCCQMPddZx8AbPEeH3rUrn3GKF575zXpGPJrnELvMw@mail.gmail.com>
1 sibling, 1 reply; 10+ messages in thread
From: Shawn Lin @ 2026-04-01 1:17 UTC (permalink / raw)
To: Daniel Bozeman
Cc: shawn.lin, linux-pm, linux-arm-kernel, linux-rockchip,
linux-kernel, ulf.hansson, heiko
Hi Daniel,
在 2026/04/01 星期三 2:02, Daniel Bozeman 写道:
> Idle-only power domains (pwr_mask == 0) cannot actually be powered
> on or off. rockchip_do_pmu_set_power_domain() already returns early
> for these domains, but rockchip_pd_power() still attempts QoS save
> and idle requests before reaching that check.
>
> On RK3528, the idle-only domains (PD_RKVENC, PD_VO, PD_VPU) have
> QoS registers that may be inaccessible when the generic power domain
> framework attempts to power them off, leading to synchronous external
> aborts.
>
Is it the real abort happened on your RK3528 board? I am trying to
understand the problem first. Even with idle-only powerdomain, the code
also save the QoS registers before set idle to the powerdomain, so
how the QoS registers become inaccessible?
> Return early from rockchip_pd_power() when pwr_mask is zero, matching
> the existing guard in rockchip_do_pmu_set_power_domain().
>
> Fixes: 1fe767a56c32 ("soc: rockchip: power-domain: allow domains only handling idle requests")
> Signed-off-by: Daniel Bozeman <daniel@orb.net>
> ---
> drivers/pmdomain/rockchip/pm-domains.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
> index 490bbb1d1d..2eecae092a 100644
> --- a/drivers/pmdomain/rockchip/pm-domains.c
> +++ b/drivers/pmdomain/rockchip/pm-domains.c
> @@ -640,6 +640,9 @@ static int rockchip_pd_power(struct rockchip_pm_domain *pd, bool power_on)
> if (rockchip_pmu_domain_is_on(pd) == power_on)
> return 0;
>
> + if (pd->info->pwr_mask == 0)
> + return 0;
> +
> ret = clk_bulk_enable(pd->num_clks, pd->clks);
> if (ret < 0) {
> dev_err(pmu->dev, "failed to enable clocks\n");
>
> base-commit: bc330699801d3b4f99110365512caed5adcfaca3
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
[not found] ` <CAG+Ngm+xJCCQMPddZx8AbPEeH3rUrn3GKF575zXpGPJrnELvMw@mail.gmail.com>
@ 2026-04-01 2:54 ` Shawn Lin
2026-04-01 6:13 ` Daniel Bozeman
0 siblings, 1 reply; 10+ messages in thread
From: Shawn Lin @ 2026-04-01 2:54 UTC (permalink / raw)
To: Daniel Bozeman
Cc: shawn.lin, linux-pm, linux-arm-kernel, linux-rockchip,
linux-kernel, ulf.hansson, heiko
在 2026/04/01 星期三 10:34, Daniel Bozeman 写道:
> The NanoPi Zero2 (RK3528) kernel panics during boot when a
> GPIO-controlled USB VBUS regulator is defined on GPIO4 (which
> is in PD_RKVENC). The goal of this series is to make USB host
> power work on boards that use GPIO4 for regulator control.
>
> The root cause is a probe ordering issue. On RK3528, the power
> domain controller's first probe attempt fails because PD_GPU's
> clock lookup returns -EPROBE_DEFER (CRU hasn't probed yet).
> The driver then tears down all domains, including PD_RKVENC
> which would have registered successfully (it has no clock
> requirements). During this window, the USB regulator driver
> probes and requests GPIO4, which is in the now-unregistered
> PD_RKVENC. This triggers a synchronous external abort.
>
> With patch 2 alone (skipping deferred domains), the idle-only
> domains register successfully. But the genpd framework then
> attempts to power them off via genpd_power_off_work_fn. This
> calls rockchip_pd_power(), which does QoS save and idle
> requests on domains with pwr_mask == 0 that cannot actually
> be powered off.
>
> To your question about why QoS registers become inaccessible
> on idle-only domains: I have not root-caused that specifically.
> What I can confirm is the crash trace below, which occurs when
> patch 2 is applied without patch 1. The abort happens during
This sounds like a parent-child dependency which hasn't been sorted
out. My another question will be: with patch 1 applied, how to save-
restore qos registers during normal S2R usage?
> rockchip_pmu_set_idle_request on an idle-only domain:
>
> Internal error: synchronous external abort: 0000000096000010
> CPU: 2 PID: 60 Comm: kworker/2:3
> Workqueue: pm genpd_power_off_work_fn
> pc : regmap_mmio_read32le+0x8/0x20
> lr : regmap_mmio_read+0x44/0x70
> Call trace:
> regmap_mmio_read32le+0x8/0x20
> _regmap_bus_reg_read+0x6c/0xac
> _regmap_read+0x60/0xd8
> regmap_read+0x4c/0x7c
> rockchip_pmu_set_idle_request.isra.0+0x94/0x1b4
> rockchip_pd_power+0x37c/0x608
> rockchip_pd_power_off+0x14/0x38
> genpd_power_off.isra.0+0x1f0/0x2f0
> genpd_power_off_work_fn+0x34/0x54
>
> The two patches work together: patch 1 prevents QoS access
> on idle-only domains, and patch 2 prevents the full probe
> teardown when a single domain defers.
>
> Tested on NanoPi Zero2 (fixes panic) and Radxa E20C (no
> regression).
>
> On Tue, Mar 31, 2026 at 6:17 PM Shawn Lin <shawn.lin@rock-chips.com
> <mailto:shawn.lin@rock-chips.com>> wrote:
>
> Hi Daniel,
>
> 在 2026/04/01 星期三 2:02, Daniel Bozeman 写道:
> > Idle-only power domains (pwr_mask == 0) cannot actually be powered
> > on or off. rockchip_do_pmu_set_power_domain() already returns early
> > for these domains, but rockchip_pd_power() still attempts QoS save
> > and idle requests before reaching that check.
> >
> > On RK3528, the idle-only domains (PD_RKVENC, PD_VO, PD_VPU) have
> > QoS registers that may be inaccessible when the generic power domain
> > framework attempts to power them off, leading to synchronous external
> > aborts.
> >
>
> Is it the real abort happened on your RK3528 board? I am trying to
> understand the problem first. Even with idle-only powerdomain, the code
> also save the QoS registers before set idle to the powerdomain, so
> how the QoS registers become inaccessible?
>
> > Return early from rockchip_pd_power() when pwr_mask is zero, matching
> > the existing guard in rockchip_do_pmu_set_power_domain().
> >
> > Fixes: 1fe767a56c32 ("soc: rockchip: power-domain: allow domains
> only handling idle requests")
> > Signed-off-by: Daniel Bozeman <daniel@orb.net
> <mailto:daniel@orb.net>>
> > ---
> > drivers/pmdomain/rockchip/pm-domains.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/pmdomain/rockchip/pm-domains.c
> b/drivers/pmdomain/rockchip/pm-domains.c
> > index 490bbb1d1d..2eecae092a 100644
> > --- a/drivers/pmdomain/rockchip/pm-domains.c
> > +++ b/drivers/pmdomain/rockchip/pm-domains.c
> > @@ -640,6 +640,9 @@ static int rockchip_pd_power(struct
> rockchip_pm_domain *pd, bool power_on)
> > if (rockchip_pmu_domain_is_on(pd) == power_on)
> > return 0;
> >
> > + if (pd->info->pwr_mask == 0)
> > + return 0;
> > +
> > ret = clk_bulk_enable(pd->num_clks, pd->clks);
> > if (ret < 0) {
> > dev_err(pmu->dev, "failed to enable clocks\n");
> >
> > base-commit: bc330699801d3b4f99110365512caed5adcfaca3
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
2026-04-01 2:54 ` Shawn Lin
@ 2026-04-01 6:13 ` Daniel Bozeman
2026-04-01 7:11 ` Shawn Lin
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Bozeman @ 2026-04-01 6:13 UTC (permalink / raw)
To: shawn.lin, ulf.hansson, heiko, linux-pm, linux-arm-kernel,
linux-rockchip, linux-kernel
<fc6f00fa-10b0-44c8-d8b4-694f8ff3b9ea@rock-chips.com>
I ran additional tests to gather evidence:
Test 1: Patch 2 only (skip EPROBE_DEFER), no patch 1.
Result: kernel panic. The idle-only domains register
successfully, but genpd_power_off_work_fn attempts to power
them off and crashes in rockchip_pmu_set_idle_request:
Internal error: synchronous external abort: 0000000096000010
CPU: 0 PID: 59 Comm: kworker/0:3
Workqueue: pm genpd_power_off_work_fn
pc : regmap_mmio_read32le+0x8/0x20
Call trace:
regmap_mmio_read32le+0x8/0x20
_regmap_bus_reg_read+0x6c/0xac
_regmap_read+0x60/0xd8
regmap_read+0x4c/0x7c
rockchip_pmu_set_idle_request.isra.0+0x94/0x1b4
rockchip_pd_power+0x378/0x604
rockchip_pd_power_off+0x14/0x34
genpd_power_off.isra.0+0x1f0/0x2f0
genpd_power_off_work_fn+0x34/0x54
Test 2: No kernel patches, PD_GPU disabled via
status = "disabled" in DTS to avoid EPROBE_DEFER entirely.
Result: same kernel panic. The idle-only domains register
but crash identically when genpd tries to power them off.
Same call trace as above.
So the crash is not caused by probe ordering or
EPROBE_DEFER -- it happens whenever idle-only domains
(pwr_mask == 0) are registered and genpd attempts to
power them off. The QoS register access in
rockchip_pmu_set_idle_request faults on these domains.
Regarding S2R: you raise a valid concern about skipping
QoS save/restore. However, these idle-only domains cannot
actually be powered off (pwr_mask == 0), so
rockchip_do_pmu_set_power_domain already returns early for
them. The QoS save/idle/restore cycle in rockchip_pd_power
has no effect on these domains since the power state never
changes -- the save and restore are paired around a no-op.
Skipping the entire sequence for pwr_mask == 0 should be
safe for S2R as well.
Both patches are needed:
- Patch 1: prevents the QoS crash on idle-only domains
- Patch 2: prevents probe teardown from making things worse
Tested on NanoPi Zero2 (RK3528) with all four scenarios:
both patches (boots), patch 2 only (panic), DTS workaround
(panic), both patches + E20C regression test (no issues).
The board DTS that triggers this (GPIO-controlled USB VBUS
regulator on GPIO4/PD_RKVENC) can be seen at:
https://github.com/dboze/openwrt/blob/add-nanopi-zero2-clean/target/linux/rockchip/patches-6.12/102-arm64-dts-rockchip-Add-FriendlyElec-NanoPi-Zero2.patch
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
2026-04-01 6:13 ` Daniel Bozeman
@ 2026-04-01 7:11 ` Shawn Lin
2026-04-03 21:27 ` Daniel Bozeman
0 siblings, 1 reply; 10+ messages in thread
From: Shawn Lin @ 2026-04-01 7:11 UTC (permalink / raw)
To: Daniel Bozeman, ulf.hansson, heiko, linux-pm, linux-arm-kernel,
linux-rockchip, linux-kernel
Cc: shawn.lin, finley.xiao
+ Finley
在 2026/04/01 星期三 14:13, Daniel Bozeman 写道:
> <fc6f00fa-10b0-44c8-d8b4-694f8ff3b9ea@rock-chips.com>
>
> I ran additional tests to gather evidence:
>
> Test 1: Patch 2 only (skip EPROBE_DEFER), no patch 1.
> Result: kernel panic. The idle-only domains register
> successfully, but genpd_power_off_work_fn attempts to power
> them off and crashes in rockchip_pmu_set_idle_request:
>
> Internal error: synchronous external abort: 0000000096000010
> CPU: 0 PID: 59 Comm: kworker/0:3
> Workqueue: pm genpd_power_off_work_fn
> pc : regmap_mmio_read32le+0x8/0x20
> Call trace:
> regmap_mmio_read32le+0x8/0x20
> _regmap_bus_reg_read+0x6c/0xac
> _regmap_read+0x60/0xd8
> regmap_read+0x4c/0x7c
> rockchip_pmu_set_idle_request.isra.0+0x94/0x1b4
> rockchip_pd_power+0x378/0x604
> rockchip_pd_power_off+0x14/0x34
> genpd_power_off.isra.0+0x1f0/0x2f0
> genpd_power_off_work_fn+0x34/0x54
>
> Test 2: No kernel patches, PD_GPU disabled via
> status = "disabled" in DTS to avoid EPROBE_DEFER entirely.
> Result: same kernel panic. The idle-only domains register
> but crash identically when genpd tries to power them off.
> Same call trace as above.
>
> So the crash is not caused by probe ordering or
> EPROBE_DEFER -- it happens whenever idle-only domains
> (pwr_mask == 0) are registered and genpd attempts to
> power them off. The QoS register access in
> rockchip_pmu_set_idle_request faults on these domains.
>
> Regarding S2R: you raise a valid concern about skipping
> QoS save/restore. However, these idle-only domains cannot
> actually be powered off (pwr_mask == 0), so
> rockchip_do_pmu_set_power_domain already returns early for
> them. The QoS save/idle/restore cycle in rockchip_pd_power
> has no effect on these domains since the power state never
> changes -- the save and restore are paired around a no-op.
> Skipping the entire sequence for pwr_mask == 0 should be
> safe for S2R as well.
>
Thanks for these details but I think it explains the phenomenon
and work around it but didn't explain the root cause.
RK3528 SoC can't power down these PDs but only support to idle them.
Right, idle these PDs could still make QoS registers inaccessable.
But from the code, rockchip_pmu_save_qos() and
rockchip_pmu_restore_qos() both are called under idle-free state.
One possible guess is it's clk related. Could you please help
test your environment with "clk_ignore_unused" set in cmdline?
Another test is to print out genpd->name in the entry of
rockchip_pd_power_on() and rockchip_pd_power_off() to see
which one is inaccessable.
> Both patches are needed:
> - Patch 1: prevents the QoS crash on idle-only domains
> - Patch 2: prevents probe teardown from making things worse
>
> Tested on NanoPi Zero2 (RK3528) with all four scenarios:
> both patches (boots), patch 2 only (panic), DTS workaround
> (panic), both patches + E20C regression test (no issues).
>
> The board DTS that triggers this (GPIO-controlled USB VBUS
> regulator on GPIO4/PD_RKVENC) can be seen at:
> https://github.com/dboze/openwrt/blob/add-nanopi-zero2-clean/target/linux/rockchip/patches-6.12/102-arm64-dts-rockchip-Add-FriendlyElec-NanoPi-Zero2.patch
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
2026-04-01 7:11 ` Shawn Lin
@ 2026-04-03 21:27 ` Daniel Bozeman
2026-04-04 11:40 ` Shawn Lin
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Bozeman @ 2026-04-03 21:27 UTC (permalink / raw)
To: shawn.lin, finley.xiao, ulf.hansson, heiko, linux-pm,
linux-arm-kernel, linux-rockchip, linux-kernel
I ran both tests you requested:
Test 1: Added pr_err to rockchip_pd_power_on/off to identify
the crashing domain. With patch 2 only (skip EPROBE_DEFER),
the crash occurs on PD_VO:
rockchip_pd_power_off: vo pwr_mask=0x0
Internal error: synchronous external abort: 0000000096000010
Workqueue: pm genpd_power_off_work_fn
Call trace:
regmap_mmio_read32le+0x8/0x20
_regmap_bus_reg_read+0x6c/0xac
_regmap_read+0x60/0xd8
regmap_read+0x4c/0x7c
rockchip_pmu_set_idle_request.isra.0+0x98/0x16c
rockchip_pd_power+0x130/0x48c
rockchip_pd_power_off+0x38/0x48
genpd_power_off.isra.0+0x1f0/0x2f0
genpd_power_off_work_fn+0x34/0x54
Test 2: Same debug build, booted with clk_ignore_unused
added to kernel cmdline via U-Boot. Same crash, same domain:
rockchip_pd_power_off: vo pwr_mask=0x0
Internal error: synchronous external abort: 0000000096000010
(identical call trace)
The crash occurs even with clk_ignore_unused. The QoS
registers for PD_VO are inaccessible when genpd attempts
to power off this idle-only domain.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
2026-04-03 21:27 ` Daniel Bozeman
@ 2026-04-04 11:40 ` Shawn Lin
2026-04-04 22:42 ` Daniel Bozeman
2026-04-05 23:29 ` Jonas Karlman
0 siblings, 2 replies; 10+ messages in thread
From: Shawn Lin @ 2026-04-04 11:40 UTC (permalink / raw)
To: Daniel Bozeman, finley.xiao, ulf.hansson, heiko, linux-pm,
linux-arm-kernel, linux-rockchip, linux-kernel, Jonas Karlman
Cc: shawn.lin
+ Jonas
在 2026/04/04 星期六 5:27, Daniel Bozeman 写道:
> I ran both tests you requested:
>
> Test 1: Added pr_err to rockchip_pd_power_on/off to identify
> the crashing domain. With patch 2 only (skip EPROBE_DEFER),
> the crash occurs on PD_VO:
Thanks for fing the PD_VO, and I'm still requesting more docs internally
to check what's going on. I see there are several qos nodes under PD_VO,
but I'm not sure if they all belong to PD_VO and even not sure if their
registers are define correctly.
Perhaps, could you help dig more by removing the qos one by one from
PD_VO to narrow down the broken qos?
I also loop in Jonas who submited the code, to have a look.(I'm also
surprised to see there aren't any Qos nodes under PD_VO in vendor
kernel for reference, but upstream code has...)
>
> rockchip_pd_power_off: vo pwr_mask=0x0
> Internal error: synchronous external abort: 0000000096000010
> Workqueue: pm genpd_power_off_work_fn
> Call trace:
> regmap_mmio_read32le+0x8/0x20
> _regmap_bus_reg_read+0x6c/0xac
> _regmap_read+0x60/0xd8
> regmap_read+0x4c/0x7c
> rockchip_pmu_set_idle_request.isra.0+0x98/0x16c
> rockchip_pd_power+0x130/0x48c
> rockchip_pd_power_off+0x38/0x48
> genpd_power_off.isra.0+0x1f0/0x2f0
> genpd_power_off_work_fn+0x34/0x54
>
> Test 2: Same debug build, booted with clk_ignore_unused
> added to kernel cmdline via U-Boot. Same crash, same domain:
>
> rockchip_pd_power_off: vo pwr_mask=0x0
> Internal error: synchronous external abort: 0000000096000010
> (identical call trace)
>
> The crash occurs even with clk_ignore_unused. The QoS
> registers for PD_VO are inaccessible when genpd attempts
> to power off this idle-only domain.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
2026-04-04 11:40 ` Shawn Lin
@ 2026-04-04 22:42 ` Daniel Bozeman
2026-04-05 23:29 ` Jonas Karlman
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Bozeman @ 2026-04-04 22:42 UTC (permalink / raw)
To: shawn.lin, finley.xiao, jonas, ulf.hansson, heiko, linux-pm,
linux-arm-kernel, linux-rockchip, linux-kernel
Further testing with NO kernel patches and fw_devlink=strict
reveals both crashes happening simultaneously on different
CPUs:
CPU1 (genpd_power_off_work_fn):
pc : regmap_mmio_read32le+0x8/0x20
Workqueue: pm genpd_power_off_work_fn
CPU2 (deferred_probe_work_func):
pc : clk_gate_endisable+0xa8/0x130
Workqueue: events_unbound deferred_probe_work_func
Kernel panic - not syncing: Asynchronous SError Interrupt
This shows there are perhaps two independent issues:
1. genpd tries to power off idle-only domains and crashes
in rockchip_pmu_set_idle_request (the regmap read at
PMU offset 0x1120 faults)
2. GPIO4 probes while PD_RKVENC is not registered (power
domain controller tore down due to PD_GPU EPROBE_DEFER)
and crashes in clk_gate_endisable
Both crashes occur in the unpatched kernel. Previously we
only observed crash #2 because it appeared first in serial
output, but maybe they're racing on different CPUs?
I also tested removing all pm_qos from all idle-only
domains (PD_VO, PD_RKVENC, PD_VPU). Crash #1 still
occurs. Because it is in rockchip_pmu_set_idle_request,
not in QoS save/restore?
fw_devlink=strict does not prevent either crash.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
2026-04-04 11:40 ` Shawn Lin
2026-04-04 22:42 ` Daniel Bozeman
@ 2026-04-05 23:29 ` Jonas Karlman
1 sibling, 0 replies; 10+ messages in thread
From: Jonas Karlman @ 2026-04-05 23:29 UTC (permalink / raw)
To: Shawn Lin, Daniel Bozeman, finley.xiao@rock-chips.com
Cc: ulf.hansson@linaro.org, heiko@sntech.de, linux-pm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-rockchip@lists.infradead.org, linux-kernel@vger.kernel.org
Hi,
On 4/4/2026 1:40 PM, Shawn Lin wrote:
> + Jonas
>
> 在 2026/04/04 星期六 5:27, Daniel Bozeman 写道:
>> I ran both tests you requested:
>>
>> Test 1: Added pr_err to rockchip_pd_power_on/off to identify
>> the crashing domain. With patch 2 only (skip EPROBE_DEFER),
>> the crash occurs on PD_VO:
>
> Thanks for fing the PD_VO, and I'm still requesting more docs internally
> to check what's going on. I see there are several qos nodes under PD_VO,
> but I'm not sure if they all belong to PD_VO and even not sure if their
> registers are define correctly.
>
> Perhaps, could you help dig more by removing the qos one by one from
> PD_VO to narrow down the broken qos?
>
> I also loop in Jonas who submited the code, to have a look.(I'm also
> surprised to see there aren't any Qos nodes under PD_VO in vendor
> kernel for reference, but upstream code has...)
Upstream included all QoS that seemed to be related to each power domains
based on e.g. vendor DTs, clock driver and other hints.
Vendor kernel mostly seemed to take the easy way out and flagged all
rk3528 power domains as always on or similar, if I recall correctly.
For upstream we have instead tried to describe all power domains without
any always on flag and instead ensure all devices belong to a power
domain.
I do not have access to any rk3528 TRM or similar, so I would not be
surprised if there could be some wrong details. However, runtime
testing at time of patches was sent upstream did not show any issues.
I was however able to reproduce a crash using next-20260403 + rk3528 usb
series [1][2]. Such crash was not happening at the original submission
of the pmdomain or usb series.
Looking at pmdomain core and rk pmdomain driver changes since rk3528
merge I see that there are some changes that may have changed behavior
of the driver since initial rk3528 merge. I.e. GENPD_FLAG_NO_STAY_ON.
Following quick diff seem to remove any changed behavior introduced in
commit 2bc12a8199a0 ("pmdomain: rockchip: Fix regulator dependency with
GENPD_FLAG_NO_STAY_ON"), and fixes the crash for me.
diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
index 490bbb1d1d8e..4d69b9f68886 100644
--- a/drivers/pmdomain/rockchip/pm-domains.c
+++ b/drivers/pmdomain/rockchip/pm-domains.c
@@ -892,7 +892,9 @@ static int rockchip_pm_add_one_domain(struct rockchip_pmu *pmu,
pd->genpd.power_on = rockchip_pd_power_on;
pd->genpd.attach_dev = rockchip_pd_attach_dev;
pd->genpd.detach_dev = rockchip_pd_detach_dev;
- pd->genpd.flags = GENPD_FLAG_PM_CLK | GENPD_FLAG_NO_STAY_ON;
+ pd->genpd.flags = GENPD_FLAG_PM_CLK;
+ if (pd->info->pwr_mask || pd->info->status_mask)
+ pd->genpd.flags |= GENPD_FLAG_NO_STAY_ON;
if (pd_info->active_wakeup)
pd->genpd.flags |= GENPD_FLAG_ACTIVE_WAKEUP;
pm_genpd_init(&pd->genpd, NULL,
Could also be that GENPD_FLAG_NO_STAY_ON only need to be applied to
need_regulator domains?
[1] https://lore.kernel.org/r/20250723122323.2344916-1-jonas@kwiboo.se/
[2] https://github.com/Kwiboo/linux-rockchip/commits/next-20260403-rk3528/
Regards,
Jonas
>
>>
>> rockchip_pd_power_off: vo pwr_mask=0x0
>> Internal error: synchronous external abort: 0000000096000010
>> Workqueue: pm genpd_power_off_work_fn
>> Call trace:
>> regmap_mmio_read32le+0x8/0x20
>> _regmap_bus_reg_read+0x6c/0xac
>> _regmap_read+0x60/0xd8
>> regmap_read+0x4c/0x7c
>> rockchip_pmu_set_idle_request.isra.0+0x98/0x16c
>> rockchip_pd_power+0x130/0x48c
>> rockchip_pd_power_off+0x38/0x48
>> genpd_power_off.isra.0+0x1f0/0x2f0
>> genpd_power_off_work_fn+0x34/0x54
>>
>> Test 2: Same debug build, booted with clk_ignore_unused
>> added to kernel cmdline via U-Boot. Same crash, same domain:
>>
>> rockchip_pd_power_off: vo pwr_mask=0x0
>> Internal error: synchronous external abort: 0000000096000010
>> (identical call trace)
>>
>> The crash occurs even with clk_ignore_unused. The QoS
>> registers for PD_VO are inaccessible when genpd attempts
>> to power off this idle-only domain.
>>
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-05 23:29 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 18:02 [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains Daniel Bozeman
2026-03-31 18:02 ` [PATCH 2/2] pmdomain/rockchip: skip domains returning -EPROBE_DEFER Daniel Bozeman
2026-04-01 1:17 ` [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains Shawn Lin
[not found] ` <CAG+Ngm+xJCCQMPddZx8AbPEeH3rUrn3GKF575zXpGPJrnELvMw@mail.gmail.com>
2026-04-01 2:54 ` Shawn Lin
2026-04-01 6:13 ` Daniel Bozeman
2026-04-01 7:11 ` Shawn Lin
2026-04-03 21:27 ` Daniel Bozeman
2026-04-04 11:40 ` Shawn Lin
2026-04-04 22:42 ` Daniel Bozeman
2026-04-05 23:29 ` Jonas Karlman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox