linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2] ARM: Don't use complete() during __cpu_die
Date: Sun, 22 Mar 2015 16:30:57 -0700	[thread overview]
Message-ID: <20150322233057.GA17605@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150226010505.GU15405@linux.vnet.ibm.com>

On Wed, Feb 25, 2015 at 05:05:05PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 25, 2015 at 03:16:59PM -0500, Nicolas Pitre wrote:
> > On Wed, 25 Feb 2015, Nicolas Pitre wrote:
> > 
> > > On Wed, 25 Feb 2015, Russell King - ARM Linux wrote:
> > > 
> > > > We could just use the spin-and-poll solution instead of an IPI, but
> > > > I really don't like that - when you see the complexity needed to
> > > > re-initialise it each time, it quickly becomes very yucky because
> > > > there is no well defined order between __cpu_die() and __cpu_kill()
> > > > being called by the two respective CPUs.
> > > > 
> > > > The last patch I saw doing that had multiple bits to indicate success
> > > > and timeout, and rather a lot of complexity to recover from failures,
> > > > and reinitialise state for a second CPU going down.
> > > 
> > > What about a per CPU state?  That would at least avoid the need to 
> > > serialize things across CPUs.  If only one CPU may write its state, that 
> > > should eliminate the need for any kind of locking.
> > 
> > Something like the following?  If according to $subject it is the 
> > complete() usage that has problems, then this replacement certainly has 
> > it removed while keeping things simple.  And I doubt CPU hotplug is 
> > performance critical so a simple polling is certainly good enough.
> 
> For whatever it is worth, I am proposing the patch below for common code.
> Works on x86.  (Famous last words...)

So I am intending to submit these changes to the upcoming merge window.
Do you guys have something in place for ARM?

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> smpboot: Add common code for notification from dying CPU
> 
> RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
> (They -can- use SRCU, but not RCU.)  This means that any use of RCU
> during or after the call to arch_cpu_idle_dead().  Unfortunately,
> commit 2ed53c0d6cc99 added a complete() call, which will contain RCU
> read-side critical sections if there is a task waiting to be awakened.
> 
> Which, as it turns out, there almost never is.  In my qemu/KVM testing,
> the to-be-awakened task is not yet asleep more than 99.5% of the time.
> In current mainline, failure is even harder to reproduce, requiring a
> virtualized environment that delays the outgoing CPU by at least three
> jiffies between the time it exits its stop_machine() task at CPU_DYING
> time and the time it calls arch_cpu_idle_dead() from the idle loop.
> However, this problem really can occur, especially in virtualized
> environments, and therefore really does need to be fixed
> 
> This suggests moving back to the polling loop, but using a much shorter
> wait, with gentle exponential backoff instead of the old 100-millisecond
> wait.  Most of the time, the loop will exit without waiting at all,
> and almost all of the remaining uses will wait only five microseconds.
> If the outgoing CPU is preempted, a loop will wait one jiffy, then
> increase the wait by a factor of 11/10ths, rounding up.  As before, there
> is a five-second timeout.
> 
> This commit therefore provides common-code infrastructure to do the
> dying-to-surviving CPU handoff in a safe manner.  This code also
> provides an indication at CPU-online of whether the CPU to be onlined
> previously timed out on offline.  The new cpu_check_up_prepare() function
> returns -EBUSY if this CPU previously took more than five seconds to
> go offline, or -EAGAIN if it has not yet managed to go offline.  The
> rationale for -EAGAIN is that it might still be preempted, so an additional
> wait might well find it correctly offlined.  Architecture-specific code
> can decide how to handle these conditions.  Systems in which CPUs take
> themselves completely offline might respond to an -EBUSY return as if
> it was a zero (success) return.  Systems in which the surviving CPU must
> take some action might take it at this time, or might simply mark the
> other CPU as unusable.
> 
> Note that architectures that take the easy way out and simply pass the
> -EBUSY and -EAGAIN upwards will change the sysfs API.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: <linux-api@vger.kernel.org>
> Cc: <linux-arch@vger.kernel.org>
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 1d58c7a6ed72..ef87e3c2451a 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -97,6 +97,8 @@ enum {
>  					* must not fail */
>  #define CPU_DYING_IDLE		0x000B /* CPU (unsigned)v dying, reached
>  					* idle loop. */
> +#define CPU_BROKEN		0x000C /* CPU (unsigned)v did not die properly,
> +					* perhaps due to preemption. */
>  
>  /* Used for CPU hotplug events occurring while tasks are frozen due to a suspend
>   * operation in progress
> @@ -275,4 +277,12 @@ void arch_cpu_idle_dead(void);
>  
>  DECLARE_PER_CPU(bool, cpu_dead_idle);
>  
> +int cpu_report_state(int cpu);
> +int cpu_check_up_prepare(int cpu);
> +void cpu_set_state_online(int cpu);
> +#ifdef CONFIG_HOTPLUG_CPU
> +bool cpu_wait_death(unsigned int cpu);
> +bool cpu_report_death(void);
> +#endif /* #ifdef CONFIG_HOTPLUG_CPU */
> +
>  #endif /* _LINUX_CPU_H_ */
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index f032fb5284e3..e940f68008db 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -4,6 +4,7 @@
>  #include <linux/cpu.h>
>  #include <linux/err.h>
>  #include <linux/smp.h>
> +#include <linux/delay.h>
>  #include <linux/init.h>
>  #include <linux/list.h>
>  #include <linux/slab.h>
> @@ -312,3 +313,139 @@ void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)
>  	put_online_cpus();
>  }
>  EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
> +
> +static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
> +
> +/*
> + * Called to poll specified CPU's state, for example, when waiting for
> + * a CPU to come online.
> + */
> +int cpu_report_state(int cpu)
> +{
> +	return atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> +}
> +
> +/*
> + * If CPU has died properly, set its state to CPU_UP_PREPARE and
> + * return success.  Otherwise, return -EBUSY if the CPU died after
> + * cpu_wait_death() timed out.  And yet otherwise again, return -EAGAIN
> + * if cpu_wait_death() timed out and the CPU still hasn't gotten around
> + * to dying.  In the latter two cases, the CPU might not be set up
> + * properly, but it is up to the arch-specific code to decide.
> + * Finally, -EIO indicates an unanticipated problem.
> + */
> +int cpu_check_up_prepare(int cpu)
> +{
> +	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
> +		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
> +		return 0;
> +	}
> +
> +	switch (atomic_read(&per_cpu(cpu_hotplug_state, cpu))) {
> +
> +	case CPU_POST_DEAD:
> +
> +		/* The CPU died properly, so just start it up again. */
> +		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
> +		return 0;
> +
> +	case CPU_DEAD:
> +
> +		/*
> +		 * Timeout during CPU death, so let caller know.
> +		 * The outgoing CPU completed its processing, but after
> +		 * cpu_wait_death() timed out and reported the error. The
> +		 * caller is free to proceed, in which case the state
> +		 * will be reset properly by cpu_set_state_online().
> +		 * Proceeding despite this -EBUSY return makes sense
> +		 * for systems where the outgoing CPUs take themselves
> +		 * offline, with no post-death manipulation required from
> +		 * a surviving CPU.
> +		 */
> +		return -EBUSY;
> +
> +	case CPU_BROKEN:
> +
> +		/*
> +		 * The most likely reason we got here is that there was
> +		 * a timeout during CPU death, and the outgoing CPU never
> +		 * did complete its processing.  This could happen on
> +		 * a virtualized system if the outgoing VCPU gets preempted
> +		 * for more than five seconds, and the user attempts to
> +		 * immediately online that same CPU.  Trying again later
> +		 * might return -EBUSY above, hence -EAGAIN.
> +		 */
> +		return -EAGAIN;
> +
> +	default:
> +
> +		/* Should not happen.  Famous last words. */
> +		return -EIO;
> +	}
> +}
> +
> +/*
> + * Mark the specified CPU online.
> + */
> +void cpu_set_state_online(int cpu)
> +{
> +	(void)atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPU_ONLINE);
> +}
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +
> +/*
> + * Wait for the specified CPU to exit the idle loop and die.
> + */
> +bool cpu_wait_death(unsigned int cpu)
> +{
> +	int jf_left = 5 * HZ;
> +	int oldstate;
> +	bool ret = true;
> +	int sleep_jf = 1;
> +
> +	might_sleep();
> +
> +	/* The outgoing CPU will normally get done quite quickly. */
> +	if (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == CPU_DEAD)
> +		goto update_state;
> +	udelay(5);
> +
> +	/* But if the outgoing CPU dawdles, wait increasingly long times. */
> +	while (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) != CPU_DEAD) {
> +		schedule_timeout_uninterruptible(sleep_jf);
> +		jf_left -= sleep_jf;
> +		if (jf_left <= 0)
> +			break;
> +		sleep_jf = DIV_ROUND_UP(sleep_jf * 11, 10);
> +	}
> +update_state:
> +	oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> +	if (oldstate == CPU_DEAD) {
> +		/* Outgoing CPU died normally, update state. */
> +		smp_mb(); /* atomic_read() before update. */
> +		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_POST_DEAD);
> +	} else {
> +		/* Outgoing CPU still hasn't died, set state accordingly. */
> +		if (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
> +				   oldstate, CPU_BROKEN) != oldstate)
> +			goto update_state;
> +		ret = false;
> +	}
> +	return ret;
> +}
> +
> +/*
> + * Called by the outgoing CPU to report its successful death.  Return
> + * false if this report follows the surviving CPU's timing out.
> + */
> +bool cpu_report_death(void)
> +{
> +	int oldstate;
> +	int cpu = smp_processor_id();
> +
> +	oldstate = atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPU_DEAD);
> +	return oldstate == CPU_ONLINE;
> +}
> +
> +#endif /* #ifdef CONFIG_HOTPLUG_CPU */

  reply	other threads:[~2015-03-22 23:30 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-05 10:14 [PATCH v2] ARM: Don't use complete() during __cpu_die Krzysztof Kozlowski
2015-02-05 10:50 ` Russell King - ARM Linux
2015-02-05 11:00   ` Krzysztof Kozlowski
2015-02-05 11:08     ` Russell King - ARM Linux
2015-02-05 11:28   ` Mark Rutland
2015-02-05 11:30     ` Russell King - ARM Linux
2015-02-05 14:29   ` Paul E. McKenney
2015-02-05 16:11     ` Russell King - ARM Linux
2015-02-05 17:02       ` Paul E. McKenney
2015-02-05 17:34         ` Russell King - ARM Linux
2015-02-05 17:54           ` Paul E. McKenney
2015-02-10  1:24       ` Stephen Boyd
2015-02-10  1:37         ` Paul E. McKenney
2015-02-10  2:05           ` Stephen Boyd
2015-02-10  3:05             ` Paul E. McKenney
2015-02-10 15:14         ` Mark Rutland
2015-02-10 20:48           ` Stephen Boyd
2015-02-10 21:04             ` Stephen Boyd
2015-02-10 21:15               ` Russell King - ARM Linux
2015-02-10 21:49                 ` Stephen Boyd
2015-02-10 22:05                   ` Stephen Boyd
2015-02-13 15:52               ` Mark Rutland
2015-02-13 16:27                 ` Russell King - ARM Linux
2015-02-13 17:21                   ` Mark Rutland
2015-02-13 17:30                     ` Russell King - ARM Linux
2015-02-13 16:28                 ` Stephen Boyd
2015-02-13 15:38             ` Mark Rutland
2015-02-10 20:58           ` Russell King - ARM Linux
2015-02-10 15:41         ` Russell King - ARM Linux
2015-02-10 18:33           ` Stephen Boyd
2015-02-25 12:56       ` Russell King - ARM Linux
2015-02-25 16:47         ` Nicolas Pitre
2015-02-25 17:00           ` Russell King - ARM Linux
2015-02-25 18:13             ` Nicolas Pitre
2015-02-25 20:16               ` Nicolas Pitre
2015-02-26  1:05                 ` Paul E. McKenney
2015-03-22 23:30                   ` Paul E. McKenney [this message]
2015-03-23 12:55                     ` Russell King - ARM Linux
2015-03-23 13:21                       ` Paul E. McKenney
2015-03-23 14:00                         ` Russell King - ARM Linux
2015-03-23 15:37                           ` Paul E. McKenney
2015-03-23 16:56                             ` Paul E. McKenney
2015-02-26 19:14           ` Daniel Thompson
2015-02-26 19:47             ` Nicolas Pitre
2015-02-05 10:53 ` Mark Rutland
2015-02-05 10:59   ` Krzysztof Kozlowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150322233057.GA17605@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).