From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2] ARM: Don't use complete() during __cpu_die
Date: Sun, 22 Mar 2015 16:30:57 -0700 [thread overview]
Message-ID: <20150322233057.GA17605@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150226010505.GU15405@linux.vnet.ibm.com>
On Wed, Feb 25, 2015 at 05:05:05PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 25, 2015 at 03:16:59PM -0500, Nicolas Pitre wrote:
> > On Wed, 25 Feb 2015, Nicolas Pitre wrote:
> >
> > > On Wed, 25 Feb 2015, Russell King - ARM Linux wrote:
> > >
> > > > We could just use the spin-and-poll solution instead of an IPI, but
> > > > I really don't like that - when you see the complexity needed to
> > > > re-initialise it each time, it quickly becomes very yucky because
> > > > there is no well defined order between __cpu_die() and __cpu_kill()
> > > > being called by the two respective CPUs.
> > > >
> > > > The last patch I saw doing that had multiple bits to indicate success
> > > > and timeout, and rather a lot of complexity to recover from failures,
> > > > and reinitialise state for a second CPU going down.
> > >
> > > What about a per CPU state? That would at least avoid the need to
> > > serialize things across CPUs. If only one CPU may write its state, that
> > > should eliminate the need for any kind of locking.
> >
> > Something like the following? If according to $subject it is the
> > complete() usage that has problems, then this replacement certainly has
> > it removed while keeping things simple. And I doubt CPU hotplug is
> > performance critical so a simple polling is certainly good enough.
>
> For whatever it is worth, I am proposing the patch below for common code.
> Works on x86. (Famous last words...)
So I am intending to submit these changes to the upcoming merge window.
Do you guys have something in place for ARM?
Thanx, Paul
> ------------------------------------------------------------------------
>
> smpboot: Add common code for notification from dying CPU
>
> RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
> (They -can- use SRCU, but not RCU.) This means that any use of RCU
> during or after the call to arch_cpu_idle_dead(). Unfortunately,
> commit 2ed53c0d6cc99 added a complete() call, which will contain RCU
> read-side critical sections if there is a task waiting to be awakened.
>
> Which, as it turns out, there almost never is. In my qemu/KVM testing,
> the to-be-awakened task is not yet asleep more than 99.5% of the time.
> In current mainline, failure is even harder to reproduce, requiring a
> virtualized environment that delays the outgoing CPU by at least three
> jiffies between the time it exits its stop_machine() task at CPU_DYING
> time and the time it calls arch_cpu_idle_dead() from the idle loop.
> However, this problem really can occur, especially in virtualized
> environments, and therefore really does need to be fixed
>
> This suggests moving back to the polling loop, but using a much shorter
> wait, with gentle exponential backoff instead of the old 100-millisecond
> wait. Most of the time, the loop will exit without waiting at all,
> and almost all of the remaining uses will wait only five microseconds.
> If the outgoing CPU is preempted, a loop will wait one jiffy, then
> increase the wait by a factor of 11/10ths, rounding up. As before, there
> is a five-second timeout.
>
> This commit therefore provides common-code infrastructure to do the
> dying-to-surviving CPU handoff in a safe manner. This code also
> provides an indication at CPU-online of whether the CPU to be onlined
> previously timed out on offline. The new cpu_check_up_prepare() function
> returns -EBUSY if this CPU previously took more than five seconds to
> go offline, or -EAGAIN if it has not yet managed to go offline. The
> rationale for -EAGAIN is that it might still be preempted, so an additional
> wait might well find it correctly offlined. Architecture-specific code
> can decide how to handle these conditions. Systems in which CPUs take
> themselves completely offline might respond to an -EBUSY return as if
> it was a zero (success) return. Systems in which the surviving CPU must
> take some action might take it at this time, or might simply mark the
> other CPU as unusable.
>
> Note that architectures that take the easy way out and simply pass the
> -EBUSY and -EAGAIN upwards will change the sysfs API.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: <linux-api@vger.kernel.org>
> Cc: <linux-arch@vger.kernel.org>
>
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 1d58c7a6ed72..ef87e3c2451a 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -97,6 +97,8 @@ enum {
> * must not fail */
> #define CPU_DYING_IDLE 0x000B /* CPU (unsigned)v dying, reached
> * idle loop. */
> +#define CPU_BROKEN 0x000C /* CPU (unsigned)v did not die properly,
> + * perhaps due to preemption. */
>
> /* Used for CPU hotplug events occurring while tasks are frozen due to a suspend
> * operation in progress
> @@ -275,4 +277,12 @@ void arch_cpu_idle_dead(void);
>
> DECLARE_PER_CPU(bool, cpu_dead_idle);
>
> +int cpu_report_state(int cpu);
> +int cpu_check_up_prepare(int cpu);
> +void cpu_set_state_online(int cpu);
> +#ifdef CONFIG_HOTPLUG_CPU
> +bool cpu_wait_death(unsigned int cpu);
> +bool cpu_report_death(void);
> +#endif /* #ifdef CONFIG_HOTPLUG_CPU */
> +
> #endif /* _LINUX_CPU_H_ */
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index f032fb5284e3..e940f68008db 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -4,6 +4,7 @@
> #include <linux/cpu.h>
> #include <linux/err.h>
> #include <linux/smp.h>
> +#include <linux/delay.h>
> #include <linux/init.h>
> #include <linux/list.h>
> #include <linux/slab.h>
> @@ -312,3 +313,139 @@ void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)
> put_online_cpus();
> }
> EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
> +
> +static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
> +
> +/*
> + * Called to poll specified CPU's state, for example, when waiting for
> + * a CPU to come online.
> + */
> +int cpu_report_state(int cpu)
> +{
> + return atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> +}
> +
> +/*
> + * If CPU has died properly, set its state to CPU_UP_PREPARE and
> + * return success. Otherwise, return -EBUSY if the CPU died after
> + * cpu_wait_death() timed out. And yet otherwise again, return -EAGAIN
> + * if cpu_wait_death() timed out and the CPU still hasn't gotten around
> + * to dying. In the latter two cases, the CPU might not be set up
> + * properly, but it is up to the arch-specific code to decide.
> + * Finally, -EIO indicates an unanticipated problem.
> + */
> +int cpu_check_up_prepare(int cpu)
> +{
> + if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
> + atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
> + return 0;
> + }
> +
> + switch (atomic_read(&per_cpu(cpu_hotplug_state, cpu))) {
> +
> + case CPU_POST_DEAD:
> +
> + /* The CPU died properly, so just start it up again. */
> + atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
> + return 0;
> +
> + case CPU_DEAD:
> +
> + /*
> + * Timeout during CPU death, so let caller know.
> + * The outgoing CPU completed its processing, but after
> + * cpu_wait_death() timed out and reported the error. The
> + * caller is free to proceed, in which case the state
> + * will be reset properly by cpu_set_state_online().
> + * Proceeding despite this -EBUSY return makes sense
> + * for systems where the outgoing CPUs take themselves
> + * offline, with no post-death manipulation required from
> + * a surviving CPU.
> + */
> + return -EBUSY;
> +
> + case CPU_BROKEN:
> +
> + /*
> + * The most likely reason we got here is that there was
> + * a timeout during CPU death, and the outgoing CPU never
> + * did complete its processing. This could happen on
> + * a virtualized system if the outgoing VCPU gets preempted
> + * for more than five seconds, and the user attempts to
> + * immediately online that same CPU. Trying again later
> + * might return -EBUSY above, hence -EAGAIN.
> + */
> + return -EAGAIN;
> +
> + default:
> +
> + /* Should not happen. Famous last words. */
> + return -EIO;
> + }
> +}
> +
> +/*
> + * Mark the specified CPU online.
> + */
> +void cpu_set_state_online(int cpu)
> +{
> + (void)atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPU_ONLINE);
> +}
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +
> +/*
> + * Wait for the specified CPU to exit the idle loop and die.
> + */
> +bool cpu_wait_death(unsigned int cpu)
> +{
> + int jf_left = 5 * HZ;
> + int oldstate;
> + bool ret = true;
> + int sleep_jf = 1;
> +
> + might_sleep();
> +
> + /* The outgoing CPU will normally get done quite quickly. */
> + if (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == CPU_DEAD)
> + goto update_state;
> + udelay(5);
> +
> + /* But if the outgoing CPU dawdles, wait increasingly long times. */
> + while (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) != CPU_DEAD) {
> + schedule_timeout_uninterruptible(sleep_jf);
> + jf_left -= sleep_jf;
> + if (jf_left <= 0)
> + break;
> + sleep_jf = DIV_ROUND_UP(sleep_jf * 11, 10);
> + }
> +update_state:
> + oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> + if (oldstate == CPU_DEAD) {
> + /* Outgoing CPU died normally, update state. */
> + smp_mb(); /* atomic_read() before update. */
> + atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_POST_DEAD);
> + } else {
> + /* Outgoing CPU still hasn't died, set state accordingly. */
> + if (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
> + oldstate, CPU_BROKEN) != oldstate)
> + goto update_state;
> + ret = false;
> + }
> + return ret;
> +}
> +
> +/*
> + * Called by the outgoing CPU to report its successful death. Return
> + * false if this report follows the surviving CPU's timing out.
> + */
> +bool cpu_report_death(void)
> +{
> + int oldstate;
> + int cpu = smp_processor_id();
> +
> + oldstate = atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPU_DEAD);
> + return oldstate == CPU_ONLINE;
> +}
> +
> +#endif /* #ifdef CONFIG_HOTPLUG_CPU */
next prev parent reply other threads:[~2015-03-22 23:30 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-05 10:14 [PATCH v2] ARM: Don't use complete() during __cpu_die Krzysztof Kozlowski
2015-02-05 10:50 ` Russell King - ARM Linux
2015-02-05 11:00 ` Krzysztof Kozlowski
2015-02-05 11:08 ` Russell King - ARM Linux
2015-02-05 11:28 ` Mark Rutland
2015-02-05 11:30 ` Russell King - ARM Linux
2015-02-05 14:29 ` Paul E. McKenney
2015-02-05 16:11 ` Russell King - ARM Linux
2015-02-05 17:02 ` Paul E. McKenney
2015-02-05 17:34 ` Russell King - ARM Linux
2015-02-05 17:54 ` Paul E. McKenney
2015-02-10 1:24 ` Stephen Boyd
2015-02-10 1:37 ` Paul E. McKenney
2015-02-10 2:05 ` Stephen Boyd
2015-02-10 3:05 ` Paul E. McKenney
2015-02-10 15:14 ` Mark Rutland
2015-02-10 20:48 ` Stephen Boyd
2015-02-10 21:04 ` Stephen Boyd
2015-02-10 21:15 ` Russell King - ARM Linux
2015-02-10 21:49 ` Stephen Boyd
2015-02-10 22:05 ` Stephen Boyd
2015-02-13 15:52 ` Mark Rutland
2015-02-13 16:27 ` Russell King - ARM Linux
2015-02-13 17:21 ` Mark Rutland
2015-02-13 17:30 ` Russell King - ARM Linux
2015-02-13 16:28 ` Stephen Boyd
2015-02-13 15:38 ` Mark Rutland
2015-02-10 20:58 ` Russell King - ARM Linux
2015-02-10 15:41 ` Russell King - ARM Linux
2015-02-10 18:33 ` Stephen Boyd
2015-02-25 12:56 ` Russell King - ARM Linux
2015-02-25 16:47 ` Nicolas Pitre
2015-02-25 17:00 ` Russell King - ARM Linux
2015-02-25 18:13 ` Nicolas Pitre
2015-02-25 20:16 ` Nicolas Pitre
2015-02-26 1:05 ` Paul E. McKenney
2015-03-22 23:30 ` Paul E. McKenney [this message]
2015-03-23 12:55 ` Russell King - ARM Linux
2015-03-23 13:21 ` Paul E. McKenney
2015-03-23 14:00 ` Russell King - ARM Linux
2015-03-23 15:37 ` Paul E. McKenney
2015-03-23 16:56 ` Paul E. McKenney
2015-02-26 19:14 ` Daniel Thompson
2015-02-26 19:47 ` Nicolas Pitre
2015-02-05 10:53 ` Mark Rutland
2015-02-05 10:59 ` Krzysztof Kozlowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150322233057.GA17605@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).