linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: swarren@wwwdotorg.org (Stephen Warren)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM: tegra: disable nonboot CPUs when reboot
Date: Fri, 07 Jun 2013 12:56:28 -0600	[thread overview]
Message-ID: <51B22CDC.4080200@wwwdotorg.org> (raw)
In-Reply-To: <20130607181846.GL8111@mudshark.cambridge.arm.com>

On 06/07/2013 12:18 PM, Will Deacon wrote:
> On Fri, Jun 07, 2013 at 05:44:33PM +0100, Stephen Warren wrote:
>> On 06/07/2013 03:36 AM, Joseph Lo wrote:
>>> The normal CPU hotplug flow in kernel and the flow for Tegra we expected,
>>> is checking the CPU ID is OK for hotplug by "tegra_cpu_disable", the CPU
>>> that would be hotplugged runs into a power-gate state by "tegra_cpu_die",
>>> then the other CPU waits for the CPU that was hotplugged in reset and
>>> clock gate it by "tegra_cpu_kill". That means we don't support the CPU
>>> being stopped or put into offline by trigger "tegra_cpu_kill" directly.
>>> It may cause a busy loop for waiting CPU in reset.
>>>
>>> After the commit "62e930e reboot: rigrate shutdown/reboot to boot cpu",
>>> we remove "disable_nonboot_cpus" when kernel_{restart,halt,power_off}.
>>> But the ARM kernel trigger "send_smp_stop" when machine_shutdown, that
>>> would cause the "tegra_cpu_kill" directly without "tegra_cpu_die" first.
>>>
>>> We hook "disable_nonboot_cpus" in "reboot_notifier" to avoid that happens.
>>> And it can work for reboot, shutdown, halt and kexec.
>>
>> I don't believe this is the correct solution.
>>
>> If the semantics of cpu_kill/cpu_die are such that it's legal to call
>> only cpu_kill without having cause cpu_die to run on the killed CPU
>> first, then Tegra's implementation is buggy. We should simply fix that,
>> rather than avoiding this by forcing a different order for the calls to
>> cpu_kill/cpu_die.
>>
>> If the semantics of cpu_kill/cpu_die are such that one /must/ cause
>> cpu_die to run on the killed CPU before cpu_kill can be used on it, then
>> there's a bug in the code that isn't doing that.
>>
>> I'm CCing a few people in an attempt to find out exactly what the
>> expected semantics are for cpu_kill/cpu_die; is it legal to call
>> cpu_kill without having first caused cpu_die to execute?
> 
> By cpu_kill, do you mean platform_cpu_kill called from __cpu_die?

The struct smp_operations .cpu_kill/.cpu_die hooks. So, yes.

> If so,
> __cpu_die and cpu_die are definitely supposed to be treated as a pair, since
> they synchronise via the cpu_died completion.

So the analysis I did, cribbed from our internal bug report so hopefully
it makes sense without any context there, was:

==========
Before that patch (62e930e reboot: "rigrate shutdown/reboot to boot
cpu"), kernel/sys.c:kernel_restart() and kernel_power_off() used to use
CPU hotplug mechanisms to unplug every CPU other than one CPU, then do
the reboot or shutdown. The ARM implementation of machine_restart() and
machine_power_off() both call machine_shutdown() which calls
smp_send_stop(), which IPIs to every CPU to tell it to stop. However,
since all the CPUs were unplugged, this was a no-op.

With the patch, the kernel simply disables scheduling on all CPUs except
logical CPU 0 in kernel_restart() and kernel_power_off(). This
guarantees that the code is running on logical CPU 0, but leaves the
other CPUs still present. Hence, the call to smp_send_stop() from the
ARM code is no longer a no-op. This code hangs.

The implementation of smp_send_stop() raises IPI_CPU_STOP on each CPU
(other than logical CPU 0). This eventually calls down to
tegra_cpu_kill()[1], which calls tegra_wait_cpu_in_reset() which calls
tegra20_wait_cpu_in_reset(). That hangs, because nothing has ever told
the flow controller to put the CPU in reset, so logical CPU 0 waits
indefinitely for this to happen, which is the hang.
==========

[1] Perhaps the issue is why ipi_send_stop() calls down into
tegra_cpu_kill() rather than tegra_cpu_die(), since die() is what should
be run on the killed CPU, and kill() on the killing CPU?

  reply	other threads:[~2013-06-07 18:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-07  9:36 [PATCH] ARM: tegra: disable nonboot CPUs when reboot Joseph Lo
2013-06-07 16:44 ` Stephen Warren
2013-06-07 18:18   ` Will Deacon
2013-06-07 18:56     ` Stephen Warren [this message]
2013-06-07 21:28       ` Stephen Warren
2013-06-07 22:15         ` Russell King - ARM Linux
2013-06-07 22:39           ` Stephen Warren
2013-06-07 22:55             ` Russell King - ARM Linux
2013-06-10 14:42               ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B22CDC.4080200@wwwdotorg.org \
    --to=swarren@wwwdotorg.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).