From: Mason <slash.tmp@free.fr>
To: linux-pm <linux-pm@vger.kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
Russell King <linux@arm.linux.org.uk>,
Kevin Hilman <khilman@kernel.org>,
Sebastian Frias <sf84@laposte.net>,
Thibaud Cornic <thibaud_cornic@sigmadesigns.com>,
Thomas Petazzoni <thomas.petazzoni@free-electrons.com>,
Mark Rutland <mark.rutland@arm.com>,
Will Deacon <will.deacon@arm.com>
Subject: Re: CPU1 does not come back online after failed suspend request
Date: Mon, 27 Jun 2016 23:12:15 +0200 [thread overview]
Message-ID: <577196AF.1010201@free.fr> (raw)
In-Reply-To: <576AAEFA.1050509@free.fr>
On 22/06/2016 17:30, Mason wrote:
> My dual-core platform defines the usual hooks:
>
> static const struct smp_operations tango_smp_ops __initconst = {
> .smp_boot_secondary = tango_boot_secondary,
> .cpu_kill = tango_cpu_kill,
> .cpu_die = tango_cpu_die,
> };
>
> static const struct platform_suspend_ops tango_pm_ops = {
> .enter = tango_pm_enter,
> .valid = tango_pm_valid,
> };
>
> static int tango_pm_powerdown(unsigned long data)
> {
> // tango_suspend(virt_to_phys(cpu_resume)); // SHOULD NOT RETURN
> printk("DEBUG: %s\n", __func__);
> // INSERT ONE SECOND DELAY
> return 42;
> }
>
> static int tango_pm_enter(suspend_state_t state)
> {
> printk("DEBUG: %s\n", __func__);
> int ret = cpu_suspend(0, tango_pm_powerdown);
> printk("DEBUG: cpu_suspend returned %d\n", ret);
> return 0;
> }
>
> I'm trying to test the error path, i.e. when tango_pm_powerdown()
> does in fact return.
>
> Secondary core off-lining via /sys/devices/system/cpu/cpu1/online
> seems to work as expected:
>
> # cat /sys/devices/system/cpu/online
> 0-1
> # echo 0 > /sys/devices/system/cpu/cpu1/online
> [ 64.022349] CPU1: shutdown
> [ 64.022354] DEBUG: tango_cpu_die
> [ 64.028370] DEBUG: tango_cpu_kill
> # cat /sys/devices/system/cpu/online
> 0
> # echo 1 > /sys/devices/system/cpu/cpu1/online
> [ 73.955994] DEBUG: tango_boot_secondary
> # cat /sys/devices/system/cpu/online
> 0-1
>
>
> But the secondary core does not come back online after a failed
> suspend attempt (see below). I tried adding a 1 second delay in
> tango_pm_powerdown() to rule out timing issues.
>
> # echo mem > /sys/power/state
> [ 16.328980] PM: Syncing filesystems ... done.
> [ 16.336844] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [ 16.345421] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [ 16.354034] Suspending console(s) (use no_console_suspend to debug)
> [ 16.362965] PM: suspend of devices complete after 1.764 msecs
> [ 16.363870] PM: late suspend of devices complete after 0.896 msecs
> [ 16.364519] PM: noirq suspend of devices complete after 0.642 msecs
> [ 16.364522] Disabling non-boot CPUs ...
> [ 16.382340] CPU1: shutdown
> [ 16.382344] DEBUG: tango_cpu_die
> [ 16.382346] DEBUG: tango_cpu_kill
> [ 16.392635] DEBUG: tango_pm_enter
> [ 16.392635] DEBUG: tango_pm_powerdown
> [ 16.392635] DEBUG: cpu_suspend returned 42
> [ 16.392664] Enabling non-boot CPUs ...
> [ 16.412544] DEBUG: tango_boot_secondary
> [ 17.411927] CPU1: failed to come online
> [ 17.432448] Error taking CPU1 up: -5
> [ 17.433034] PM: noirq resume of devices complete after 0.576 msecs
> [ 17.433750] PM: early resume of devices complete after 0.688 msecs
> [ 17.435121] nb8800 26000.ethernet eth0: Link is Down
> [ 17.435301] PM: resume of devices complete after 1.541 msecs
> [ 17.516826] Restarting tasks ... done.
>
> [root@toto5 ~]# cat /sys/devices/system/cpu/online
> 0
>
> As you can see, cpu1 did not come back online.
> [ 17.411927] CPU1: failed to come online
> [ 17.432448] Error taking CPU1 up: -5
>
> The other weirdness is that my 1 second delay happens between
> "DEBUG: tango_pm_powerdown" and "DEBUG: cpu_suspend returned 42",
> yet the timestamps for these two lines are identical. Is that
> because that the timestamp variable is not updated deep within
> the suspend framework? (My timer ticks at 27 MHz.)
>
> Any idea what might be going wrong?
I've made some progress on this issue, thanks to Mark Rutland.
The FW allows only one thread at a time - similar to the BKL
of old. This means kill() on cpu0 fails, if die() is still
executing on cpu1. The issue would be even worse with 4 cores,
as die() may fail also.
Basically, I was assuming that Linux guaranteed some kind of
synchronization - by first calling die(), waiting for die()
to "complete", and then calling kill() - but there can be no
such synchronization.
So the new plan is as follows:
In die()
Call the FW in a loop, until the given core actually "dies".
static void tango_cpu_die(unsigned int cpu)
{
while ( 1 )
tango_aux_core_die(cpu);
}
In kill()
Since kill() and die() are not synchronized, kill() might
try to affect cores that have not died yet.
Call the FW in a loop, but sleep/spin a while between tries,
to give die() a chance to run.
static int tango_cpu_kill(unsigned int cpu)
{
int err;
do {
//msleep(cpu); // or mdelay?
err = tango_aux_core_kill(cpu);
} while (err);
return 1;
}
Question: are we allowed to sleep in cpu_kill? or may we only spin?
Does the above plan look acceptable, at least for my platform?
Regards.
prev parent reply other threads:[~2016-06-27 21:12 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-22 15:30 CPU1 does not come back online after failed suspend request Mason
2016-06-24 19:21 ` Mason
2016-06-27 21:12 ` Mason [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=577196AF.1010201@free.fr \
--to=slash.tmp@free.fr \
--cc=khilman@kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=mark.rutland@arm.com \
--cc=rjw@rjwysocki.net \
--cc=sf84@laposte.net \
--cc=thibaud_cornic@sigmadesigns.com \
--cc=thomas.petazzoni@free-electrons.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).