From: Stephen Warren <swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
To: Russell King - ARM Linux <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
Cc: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>,
Joseph Lo <josephl-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
"linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>
Subject: Re: [PATCH] ARM: tegra: disable nonboot CPUs when reboot
Date: Fri, 07 Jun 2013 16:39:32 -0600 [thread overview]
Message-ID: <51B26124.5060505@wwwdotorg.org> (raw)
In-Reply-To: <20130607221526.GC18614-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
On 06/07/2013 04:15 PM, Russell King - ARM Linux wrote:
> On Fri, Jun 07, 2013 at 03:28:38PM -0600, Stephen Warren wrote:
>> On 06/07/2013 12:56 PM, Stephen Warren wrote:
>> ...
>>> [1] Perhaps the issue is why ipi_send_stop() calls down into
>>> tegra_cpu_kill() rather than tegra_cpu_die(), since die() is what
>>> should be run on the killed CPU, and kill() on the killing CPU?
>>
>> Scratch that; I don't think it's calling down to /either/; I was
>> confused. It seems like it /should/ call cpu_die() though, at least if
>> hotplug is enabled, right?
>
> The problem is really complex.
>
> CPU hotplug is done in paths where we're relatively confident that the
> system is working correctly. So all the features such as scheduling
> are available, the timer ticks work and so forth.
>
> However, reboot is a totally different environment. This can happen
> from almost any context with the system in any state what so ever. A
> CPU could be stuck. A CPU could have oopsed. The CPU which is in
> the reboot code could be the CPU which has oopsed. It could be called
> from within an interrupt...
>
> What that means is the usual CPU hotplug methods can't be used in the
> reboot path. Well, they can, but it will be fragile.
>
> For reboot, the real solution there is not to use software-based
> reboot, but bring the other cores to a halt (which is what
> ipi_send_stop is doing) and then issue a hardware reset to the whole
> system, including the other CPUs.
Ignoring the issues with oops in reboot, I think there's a bug in that
when hotplug is enabled, smp_kill_cpus() calls platform_cpu_kill(), but
nothing causes the failing CPU to ever execute smp_ops.cpu_die(). Hence,
if the implementation of smp_ops.cpu_kill() relies on the target CPU
having run smp_ops.cpu_die(), then smp_ops.cpu_kill() may not operate
correctly.
Or, must smp_ops.cpu_kill() not assume that smp_ops.cpu_die() will be
called on the target CPU? What are the semantics here? Will mentioned
that __cpu_die and cpu_die are a pair, but what about is the smp_ops are
used directly; are they also supposed to be a pair?
The change below solves the pairing issue, by making ipi_cpu_stop()
perform the low-level part of hotplug that matches what smp_kill_cpus()
call to platform_cpu_kill(). This certainly fixes the
hang-in-reboot-or-shutdown problem on Tegra.
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 550d63c..541f667 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -581,11 +581,20 @@ static void ipi_cpu_stop(unsigned int cpu)
>
> set_cpu_online(cpu, false);
>
> +#ifdef CONFIG_HOTPLUG_CPU
> +#if 0
> + arch_cpu_idle_dead();
> +#else
> + /* The body of arch_cpu_idle_dead() - which is better? */
> + cpu_die();
> +#endif
> +#else
> local_fiq_disable();
> local_irq_disable();
>
> while (1)
> cpu_relax();
> +#endif
> }
Some things I'm not sure of here:
* cpu_die() calls idle_task_exit(). That's probably wrong if it's
triggered from an IPI; who knows what task it's executing. That said, if
migrate_to_reboot_cpu() did set_cpus_allowed_ptr(current,
cpumask_of(cpu)), perhaps that guarantees the CPU is running the idle
task since there's nothing else that could be running?
* ipi_cpu_stop() currently calls local_fiq_disable(), but cpu_die()
doesn't. Should both functions call both local_fiq_disable() and
local_irq_disable()?
* Perhaps smp_kill_cpus() should also be changed, to call cpu_die() not
platform_cpu_kill(), to keep the pairing correct at that level too.
Plus, I ignored any issues you raised for the oops case on reboot...
WARNING: multiple messages have this Message-ID (diff)
From: swarren@wwwdotorg.org (Stephen Warren)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM: tegra: disable nonboot CPUs when reboot
Date: Fri, 07 Jun 2013 16:39:32 -0600 [thread overview]
Message-ID: <51B26124.5060505@wwwdotorg.org> (raw)
In-Reply-To: <20130607221526.GC18614@n2100.arm.linux.org.uk>
On 06/07/2013 04:15 PM, Russell King - ARM Linux wrote:
> On Fri, Jun 07, 2013 at 03:28:38PM -0600, Stephen Warren wrote:
>> On 06/07/2013 12:56 PM, Stephen Warren wrote:
>> ...
>>> [1] Perhaps the issue is why ipi_send_stop() calls down into
>>> tegra_cpu_kill() rather than tegra_cpu_die(), since die() is what
>>> should be run on the killed CPU, and kill() on the killing CPU?
>>
>> Scratch that; I don't think it's calling down to /either/; I was
>> confused. It seems like it /should/ call cpu_die() though, at least if
>> hotplug is enabled, right?
>
> The problem is really complex.
>
> CPU hotplug is done in paths where we're relatively confident that the
> system is working correctly. So all the features such as scheduling
> are available, the timer ticks work and so forth.
>
> However, reboot is a totally different environment. This can happen
> from almost any context with the system in any state what so ever. A
> CPU could be stuck. A CPU could have oopsed. The CPU which is in
> the reboot code could be the CPU which has oopsed. It could be called
> from within an interrupt...
>
> What that means is the usual CPU hotplug methods can't be used in the
> reboot path. Well, they can, but it will be fragile.
>
> For reboot, the real solution there is not to use software-based
> reboot, but bring the other cores to a halt (which is what
> ipi_send_stop is doing) and then issue a hardware reset to the whole
> system, including the other CPUs.
Ignoring the issues with oops in reboot, I think there's a bug in that
when hotplug is enabled, smp_kill_cpus() calls platform_cpu_kill(), but
nothing causes the failing CPU to ever execute smp_ops.cpu_die(). Hence,
if the implementation of smp_ops.cpu_kill() relies on the target CPU
having run smp_ops.cpu_die(), then smp_ops.cpu_kill() may not operate
correctly.
Or, must smp_ops.cpu_kill() not assume that smp_ops.cpu_die() will be
called on the target CPU? What are the semantics here? Will mentioned
that __cpu_die and cpu_die are a pair, but what about is the smp_ops are
used directly; are they also supposed to be a pair?
The change below solves the pairing issue, by making ipi_cpu_stop()
perform the low-level part of hotplug that matches what smp_kill_cpus()
call to platform_cpu_kill(). This certainly fixes the
hang-in-reboot-or-shutdown problem on Tegra.
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 550d63c..541f667 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -581,11 +581,20 @@ static void ipi_cpu_stop(unsigned int cpu)
>
> set_cpu_online(cpu, false);
>
> +#ifdef CONFIG_HOTPLUG_CPU
> +#if 0
> + arch_cpu_idle_dead();
> +#else
> + /* The body of arch_cpu_idle_dead() - which is better? */
> + cpu_die();
> +#endif
> +#else
> local_fiq_disable();
> local_irq_disable();
>
> while (1)
> cpu_relax();
> +#endif
> }
Some things I'm not sure of here:
* cpu_die() calls idle_task_exit(). That's probably wrong if it's
triggered from an IPI; who knows what task it's executing. That said, if
migrate_to_reboot_cpu() did set_cpus_allowed_ptr(current,
cpumask_of(cpu)), perhaps that guarantees the CPU is running the idle
task since there's nothing else that could be running?
* ipi_cpu_stop() currently calls local_fiq_disable(), but cpu_die()
doesn't. Should both functions call both local_fiq_disable() and
local_irq_disable()?
* Perhaps smp_kill_cpus() should also be changed, to call cpu_die() not
platform_cpu_kill(), to keep the pairing correct at that level too.
Plus, I ignored any issues you raised for the oops case on reboot...
next prev parent reply other threads:[~2013-06-07 22:39 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-07 9:36 [PATCH] ARM: tegra: disable nonboot CPUs when reboot Joseph Lo
2013-06-07 9:36 ` Joseph Lo
[not found] ` <1370597810-1153-1-git-send-email-josephl-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2013-06-07 16:44 ` Stephen Warren
2013-06-07 16:44 ` Stephen Warren
[not found] ` <51B20DF1.3030207-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-06-07 18:18 ` Will Deacon
2013-06-07 18:18 ` Will Deacon
[not found] ` <20130607181846.GL8111-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-07 18:56 ` Stephen Warren
2013-06-07 18:56 ` Stephen Warren
[not found] ` <51B22CDC.4080200-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-06-07 21:28 ` Stephen Warren
2013-06-07 21:28 ` Stephen Warren
[not found] ` <51B25086.6020209-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-06-07 22:15 ` Russell King - ARM Linux
2013-06-07 22:15 ` Russell King - ARM Linux
[not found] ` <20130607221526.GC18614-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2013-06-07 22:39 ` Stephen Warren [this message]
2013-06-07 22:39 ` Stephen Warren
[not found] ` <51B26124.5060505-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-06-07 22:55 ` Russell King - ARM Linux
2013-06-07 22:55 ` Russell King - ARM Linux
[not found] ` <20130607225512.GF18614-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2013-06-10 14:42 ` Will Deacon
2013-06-10 14:42 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B26124.5060505@wwwdotorg.org \
--to=swarren-3lzwwm7+weoh9zmkesr00q@public.gmane.org \
--cc=josephl-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
--cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org \
--cc=linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.