linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mason <slash.tmp@free.fr>
To: linux-pm <linux-pm@vger.kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Russell King <linux@arm.linux.org.uk>,
	Kevin Hilman <khilman@kernel.org>,
	Sebastian Frias <sf84@laposte.net>,
	Thibaud Cornic <thibaud_cornic@sigmadesigns.com>,
	Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Subject: Re: CPU1 does not come back online after failed suspend request
Date: Fri, 24 Jun 2016 21:21:14 +0200	[thread overview]
Message-ID: <576D882A.4030903@free.fr> (raw)
In-Reply-To: <576AAEFA.1050509@free.fr>

On 22/06/2016 17:30, Mason wrote:

> (I'm using v4.7-rc4)
> 
> My dual-core platform defines the usual hooks:
> 
> static const struct smp_operations tango_smp_ops __initconst = {
> 	.smp_boot_secondary	= tango_boot_secondary,
> 	.cpu_kill		= tango_cpu_kill,
> 	.cpu_die		= tango_cpu_die,
> };
> 
> static const struct platform_suspend_ops tango_pm_ops = {
> 	.enter = tango_pm_enter,
> 	.valid = tango_pm_valid,
> };
> 
> static int tango_pm_powerdown(unsigned long data)
> {
> 	// tango_suspend(virt_to_phys(cpu_resume)); // SHOULD NOT RETURN
> 	printk("DEBUG: %s\n", __func__);
> 	// INSERT ONE SECOND DELAY
> 	return 42;
> }
> 
> static int tango_pm_enter(suspend_state_t state)
> {
> 	printk("DEBUG: %s\n", __func__);
> 	int ret = cpu_suspend(0, tango_pm_powerdown);
> 	printk("DEBUG: cpu_suspend returned %d\n", ret);
> 	return 0;
> }
> 
> I'm trying to test the error path, i.e. when tango_pm_powerdown()
> does in fact return.
> 
> Secondary core off-lining via /sys/devices/system/cpu/cpu1/online
> seems to work as expected:
> 
> # cat /sys/devices/system/cpu/online     
> 0-1
> # echo 0 > /sys/devices/system/cpu/cpu1/online
> [   64.022349] CPU1: shutdown
> [   64.022354] DEBUG: tango_cpu_die
> [   64.028370] DEBUG: tango_cpu_kill
> # cat /sys/devices/system/cpu/online 
> 0
> # echo 1 > /sys/devices/system/cpu/cpu1/online
> [   73.955994] DEBUG: tango_boot_secondary
> # cat /sys/devices/system/cpu/online 
> 0-1
> 
> 
> But the secondary core does not come back online after a failed
> suspend attempt (see below). I tried adding a 1 second delay in
> tango_pm_powerdown() to rule out timing issues.
> 
> # echo mem > /sys/power/state
> [   16.328980] PM: Syncing filesystems ... done.
> [   16.336844] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [   16.345421] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   16.354034] Suspending console(s) (use no_console_suspend to debug)
> [   16.362965] PM: suspend of devices complete after 1.764 msecs
> [   16.363870] PM: late suspend of devices complete after 0.896 msecs
> [   16.364519] PM: noirq suspend of devices complete after 0.642 msecs
> [   16.364522] Disabling non-boot CPUs ...
> [   16.382340] CPU1: shutdown
> [   16.382344] DEBUG: tango_cpu_die
> [   16.382346] DEBUG: tango_cpu_kill
> [   16.392635] DEBUG: tango_pm_enter
> [   16.392635] DEBUG: tango_pm_powerdown
> [   16.392635] DEBUG: cpu_suspend returned 42
> [   16.392664] Enabling non-boot CPUs ...
> [   16.412544] DEBUG: tango_boot_secondary
> [   17.411927] CPU1: failed to come online
> [   17.432448] Error taking CPU1 up: -5
> [   17.433034] PM: noirq resume of devices complete after 0.576 msecs
> [   17.433750] PM: early resume of devices complete after 0.688 msecs
> [   17.435121] nb8800 26000.ethernet eth0: Link is Down
> [   17.435301] PM: resume of devices complete after 1.541 msecs
> [   17.516826] Restarting tasks ... done.
> 
> [root@toto5 ~]# cat /sys/devices/system/cpu/online
> 0
> 
> As you can see, cpu1 did not come back online.
> [ 17.411927] CPU1: failed to come online
> [ 17.432448] Error taking CPU1 up: -5
> 
> The other weirdness is that my 1 second delay happens between
> "DEBUG: tango_pm_powerdown" and "DEBUG: cpu_suspend returned 42",
> yet the timestamps for these two lines are identical. Is that
> because that the timestamp variable is not updated deep within
> the suspend framework? (My timer ticks at 27 MHz.)

Any idea if the code flow is different in the two cases?
(Manual offline/online via sysfs vs offline/online by the
suspend framework)

"CPU1: failed to come online" apparently comes from __cpu_up()
http://lxr.free-electrons.com/source/arch/arm/kernel/smp.c#L137

"Error taking CPU1 up: -5" apparently comes from enable_nonboot_cpus()
http://lxr.free-electrons.com/source/kernel/cpu.c#L1110

And -5 is simply -EIO returned from __cpu_up()

Is there any point in testing with v4.6?

Regards.


  reply	other threads:[~2016-06-24 19:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-22 15:30 CPU1 does not come back online after failed suspend request Mason
2016-06-24 19:21 ` Mason [this message]
2016-06-27 21:12 ` Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=576D882A.4030903@free.fr \
    --to=slash.tmp@free.fr \
    --cc=khilman@kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=rjw@rjwysocki.net \
    --cc=sf84@laposte.net \
    --cc=thibaud_cornic@sigmadesigns.com \
    --cc=thomas.petazzoni@free-electrons.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).