* PM / Domains: Infinite loop during reboot
@ 2015-04-13 12:18 Geert Uytterhoeven
2015-05-05 13:49 ` Geert Uytterhoeven
0 siblings, 1 reply; 3+ messages in thread
From: Geert Uytterhoeven @ 2015-04-13 12:18 UTC (permalink / raw)
To: Linux PM list; +Cc: Linux-sh list
Sometimes reboot doesn't work on r8a7791/koelsch with the CPG clock domain.
With additional debug messages, you can see it hangs in an infinite loop:
rcar-dmac ec720000.dma-controller: removing from PM domain cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
[...]
Presumably this is this loop in genpd_dev_pm_detach():
while (1) {
ret = pm_genpd_remove_device(pd, dev);
if (ret != -EAGAIN)
break;
cond_resched();
}
which is retried because genpd->prepared_count > 0 in pm_genpd_remove_device().
This looks a bit strange, as no suspend is in progress, so prepared_count
should be zero?
I'm adding more debugging code to verify this, but it's not 100% reproducable.
[...]
Confirmed: prepared_count turns out to be 1 at reboot time if at least one
s2ram cycle happened before. Additional s2ram cycles don't seem to increase
prepared_count any further.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: PM / Domains: Infinite loop during reboot
2015-04-13 12:18 PM / Domains: Infinite loop during reboot Geert Uytterhoeven
@ 2015-05-05 13:49 ` Geert Uytterhoeven
2015-06-04 9:13 ` Ulf Hansson
0 siblings, 1 reply; 3+ messages in thread
From: Geert Uytterhoeven @ 2015-05-05 13:49 UTC (permalink / raw)
To: Ulf Hansson, Rafael J. Wysocki; +Cc: Linux PM list, Linux-sh list
On Mon, Apr 13, 2015 at 2:18 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> Sometimes reboot doesn't work on r8a7791/koelsch with the CPG clock domain.
> With additional debug messages, you can see it hangs in an infinite loop:
>
> rcar-dmac ec720000.dma-controller: removing from PM domain cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
> [...]
>
> Presumably this is this loop in genpd_dev_pm_detach():
>
> while (1) {
> ret = pm_genpd_remove_device(pd, dev);
> if (ret != -EAGAIN)
> break;
> cond_resched();
> }
>
> which is retried because genpd->prepared_count > 0 in pm_genpd_remove_device().
> This looks a bit strange, as no suspend is in progress, so prepared_count
> should be zero?
>
> I'm adding more debugging code to verify this, but it's not 100% reproducable.
>
> [...]
>
> Confirmed: prepared_count turns out to be 1 at reboot time if at least one
> s2ram cycle happened before. Additional s2ram cycles don't seem to increase
> prepared_count any further.
This seems to be caused by the sh_cmt driver's sh_cmt_enable() calling
dev_pm_syscore_device(..., true).
If dev->power.syscore is set, device_prepare(), device_complete() et al.
skip further operation.
Hence during an s2ram+resume cycle, the following happens:
PM: Syncing filesystems ... done.
PM: Preparing system for mem sleep
Freezing user space processes ... (elapsed 0.001 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
PM: Entering mem sleep
Suspending console(s) (use no_console_suspend to debug)
[ pm_genpd_prepare() is called for all devices (incl. sh_cmt), ]
[ increasing prepared_count ]
PM: suspend of devices complete after 36.472 msecs
PM: late suspend of devices complete after 7.569 msecs
PM: noirq suspend of devices complete after 8.035 msecs
Disabling non-boot CPUs ...
CPU1: shutdown
[ sh_cmt_enable() is called, and power.syscore becomes true ]
Enabling non-boot CPUs ...
CPU1 is up
PM: noirq resume of devices complete after 4.724 msecs
PM: early resume of devices complete after 5.100 msecs
PM: resume of devices complete after 84.156 msecs
[ pm_genpd_complete() is called for all devices (except for sh_cmt!, ]
[ as its power.syscore is now true!), decreasing prepared_count ]
[ prepared_count is now 1 instead of 0! ]
PM: Finishing wakeup.
Restarting tasks ...
During subsequent s2ram+resume cycles, the prepared_count imbalance doesn't
increase, as pm_genpd_prepare() is no longer called for sh_cmt due to
power.syscore still being set.
During reboot/halt, platform_drv_shutdown() is called for any platform device
that has a driver with a .shutdown() method (on R-Car Gen2, that's just the
rcar-dmac driver). platform_drv_shutdown() calls dev_pm_domain_detach(),
which enters an infinite loop, due to the prepared_count imbalance.
On R-Mobile, the imbalance also happens. But there it doesn't cause any issues
as sh_cmt is part of a PM domain that doesn't contain any devices with drivers
that have .shutdown() methods.
I see two ways to fix this:
1. Fix the prepared_count imbalance.
Anyone with a clue?
2. Fix the infinite loop.
a. Limit the loop to a fixed number of retries (e.g. 20),
b. Remove the call to dev_pm_domain_detach() from
platform_drv_shutdown(). Others buses (amba, i2c, spi) do not call
it from their .shutdown() method, only from .remove().
Thanks for your comments and suggestions!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: PM / Domains: Infinite loop during reboot
2015-05-05 13:49 ` Geert Uytterhoeven
@ 2015-06-04 9:13 ` Ulf Hansson
0 siblings, 0 replies; 3+ messages in thread
From: Ulf Hansson @ 2015-06-04 9:13 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: Rafael J. Wysocki, Linux PM list, Linux-sh list
On 5 May 2015 at 15:49, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Mon, Apr 13, 2015 at 2:18 PM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
>> Sometimes reboot doesn't work on r8a7791/koelsch with the CPG clock domain.
>> With additional debug messages, you can see it hangs in an infinite loop:
>>
>> rcar-dmac ec720000.dma-controller: removing from PM domain cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> pm_genpd_remove_device: Remove ec720000.dma-controller from cpg_clocks
>> [...]
>>
>> Presumably this is this loop in genpd_dev_pm_detach():
>>
>> while (1) {
>> ret = pm_genpd_remove_device(pd, dev);
>> if (ret != -EAGAIN)
>> break;
>> cond_resched();
>> }
>>
>> which is retried because genpd->prepared_count > 0 in pm_genpd_remove_device().
>> This looks a bit strange, as no suspend is in progress, so prepared_count
>> should be zero?
>>
>> I'm adding more debugging code to verify this, but it's not 100% reproducable.
>>
>> [...]
>>
>> Confirmed: prepared_count turns out to be 1 at reboot time if at least one
>> s2ram cycle happened before. Additional s2ram cycles don't seem to increase
>> prepared_count any further.
>
> This seems to be caused by the sh_cmt driver's sh_cmt_enable() calling
> dev_pm_syscore_device(..., true).
>
> If dev->power.syscore is set, device_prepare(), device_complete() et al.
> skip further operation.
>
> Hence during an s2ram+resume cycle, the following happens:
>
> PM: Syncing filesystems ... done.
> PM: Preparing system for mem sleep
> Freezing user space processes ... (elapsed 0.001 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> PM: Entering mem sleep
> Suspending console(s) (use no_console_suspend to debug)
>
> [ pm_genpd_prepare() is called for all devices (incl. sh_cmt), ]
> [ increasing prepared_count ]
>
> PM: suspend of devices complete after 36.472 msecs
> PM: late suspend of devices complete after 7.569 msecs
> PM: noirq suspend of devices complete after 8.035 msecs
> Disabling non-boot CPUs ...
> CPU1: shutdown
>
> [ sh_cmt_enable() is called, and power.syscore becomes true ]
>
> Enabling non-boot CPUs ...
> CPU1 is up
> PM: noirq resume of devices complete after 4.724 msecs
> PM: early resume of devices complete after 5.100 msecs
> PM: resume of devices complete after 84.156 msecs
>
> [ pm_genpd_complete() is called for all devices (except for sh_cmt!, ]
> [ as its power.syscore is now true!), decreasing prepared_count ]
>
> [ prepared_count is now 1 instead of 0! ]
>
> PM: Finishing wakeup.
> Restarting tasks ...
>
> During subsequent s2ram+resume cycles, the prepared_count imbalance doesn't
> increase, as pm_genpd_prepare() is no longer called for sh_cmt due to
> power.syscore still being set.
>
> During reboot/halt, platform_drv_shutdown() is called for any platform device
> that has a driver with a .shutdown() method (on R-Car Gen2, that's just the
> rcar-dmac driver). platform_drv_shutdown() calls dev_pm_domain_detach(),
> which enters an infinite loop, due to the prepared_count imbalance.
>
> On R-Mobile, the imbalance also happens. But there it doesn't cause any issues
> as sh_cmt is part of a PM domain that doesn't contain any devices with drivers
> that have .shutdown() methods.
>
> I see two ways to fix this:
> 1. Fix the prepared_count imbalance.
> Anyone with a clue?
Hi Geert,
Sorry for the delayed answer. I don't have an idea for 1), yet.
> 2. Fix the infinite loop.
> a. Limit the loop to a fixed number of retries (e.g. 20),
> b. Remove the call to dev_pm_domain_detach() from
> platform_drv_shutdown(). Others buses (amba, i2c, spi) do not call
> it from their .shutdown() method, only from .remove().
The reason why I added dev_pm_domain_detach() in that path was because
of the call to acpi_dev_pm_detach() that existed there already.
acpi_dev_pm_detach was added by Rafael in commit
94d76d5de38d7502c3e78fcd6bf50da95e3e0361 (platform / ACPI:
Attach/detach ACPI PM during probe/remove/shutdown).
>From genpd point of view, I think it safe to remove the call to
dev_pm_domain_detach() from platform_drv_shutdown(), but I can't tell
from ACPI point of view.
Moreover, we still need to looking into 1), as it's probably causing
other issues as well.
Kind regards
Uffe
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-06-04 9:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-13 12:18 PM / Domains: Infinite loop during reboot Geert Uytterhoeven
2015-05-05 13:49 ` Geert Uytterhoeven
2015-06-04 9:13 ` Ulf Hansson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).