All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
Date: Fri, 11 Oct 2013 14:38:20 +0200	[thread overview]
Message-ID: <20131011123820.GV3081@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CA+55aFyoP5KFUg44Xenp_Jz0_5ssosn-KF6LDsa0qsmCjyJ56g@mail.gmail.com>

On Thu, Oct 10, 2013 at 11:49:15AM -0700, Linus Torvalds wrote:
> On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > But my point is that even though there aren't many of these today; with
> > the growing number of cpus in 'commodity' hardware you want to move away
> > from using preempt_disable() as hotplug lock.
> 
> Umm.
> 
> Wasn't this pretty much the argument for the COMPLETELY F*CKED UP
> change to make the vma locking use a semaphore?

Not entirely; but fair enough, I did screw up there.

> The whole "it's more convenient to use sleeping locks" argument is
> PURE AND UTTER SHIT when it comes to really core code. We are *much*
> better off saying "this is core, we care so deeply about performance
> that you had better use your brain before you do this".
> 
> Seriously. Your argument is bad, but more importantly, it is
> *dangerously* bad. It's crap that results in bad code: and the bad
> code is almost impossible to fix up later, because once you encourage
> people to do the easy thing, they'll do so.

Right, I fell face first into this trap. The existence of
{get,put}_online_cpus() made me stop thinking and use it.

As a penance I'll start by removing all get_online_cpus() usage from the
scheduler.

---
Subject: sched: Remove get_online_cpus() usage

Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:

 - sched_init_smp(); where its completely superfluous since we're in
   'early' boot and there simply cannot be any hotplugging.

 - sched_getaffinity(); we already take a raw spinlock to protect the
   task cpus_allowed mask, this disables preemption and therefore
   also stabilizes cpu_online_mask as that's modified using
   stop_machine. However switch to active mask for symmetry with
   sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
   mask stability by inserting sync_rcu/sched() into _cpu_down.

 - sched_setaffinity(); we don't appear to need get_online_cpus()
   either, there's two sites where hotplug appears relevant:
    * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
      for the cpuset case we hold task_lock, which is a spinlock and
      thus for mainline disables preemption (might cause pain on RT).
    * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
      preemption properly disabled; also it already deals with hotplug
      races explicitly where it releases them.

 - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
   us with a little trickery. By adding a sync_sched/rcu() after the
   CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
   cpu_active_mask. Use these to validate that both our cpus are active
   when queueing the stop work before we queue the stop_machine works
   for take_cpu_down().

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/cpu.c          | 17 +++++++++++++++++
 kernel/sched/core.c   | 20 ++++++++++----------
 kernel/stop_machine.c | 26 +++++++++++++++++++++-----
 3 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d7f07a2da5a6..63aa50d7ce1e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -308,6 +308,23 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	}
 	smpboot_park_threads(cpu);
 
+	/*
+	 * By now we've cleared cpu_active_mask, wait for all preempt-disabled
+	 * and RCU users of this state to go away such that all new such users
+	 * will observe it.
+	 *
+	 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
+	 * not imply sync_sched(), so explicitly call both.
+	 */
+#ifdef CONFIG_PREEMPT
+	synchronize_sched();
+#endif
+	synchronize_rcu();
+
+	/*
+	 * So now all preempt/rcu users must observe !cpu_active().
+	 */
+
 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c3feebcf112..498a5e5a53f5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1081,8 +1081,6 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p)
 	struct migration_swap_arg arg;
 	int ret = -EINVAL;
 
-	get_online_cpus();
-
 	arg = (struct migration_swap_arg){
 		.src_task = cur,
 		.src_cpu = task_cpu(cur),
@@ -1093,6 +1091,10 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p)
 	if (arg.src_cpu == arg.dst_cpu)
 		goto out;
 
+	/*
+	 * These three tests are all lockless; this is OK since all of them
+	 * will be re-checked with proper locks held further down the line.
+	 */
 	if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
 		goto out;
 
@@ -1105,7 +1107,6 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p)
 	ret = stop_two_cpus(arg.dst_cpu, arg.src_cpu, migrate_swap_stop, &arg);
 
 out:
-	put_online_cpus();
 	return ret;
 }
 
@@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 	struct task_struct *p;
 	int retval;
 
-	get_online_cpus();
 	rcu_read_lock();
 
 	p = find_process_by_pid(pid);
@@ -3769,7 +3769,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 	free_cpumask_var(cpus_allowed);
 out_put_task:
 	put_task_struct(p);
-	put_online_cpus();
 	return retval;
 }
 
@@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
 	unsigned long flags;
 	int retval;
 
-	get_online_cpus();
 	rcu_read_lock();
 
 	retval = -ESRCH;
@@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
 		goto out_unlock;
 
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	cpumask_and(mask, &p->cpus_allowed, cpu_online_mask);
+	cpumask_and(mask, &p->cpus_allowed, cpu_active_mask);
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 out_unlock:
 	rcu_read_unlock();
-	put_online_cpus();
 
 	return retval;
 }
@@ -6490,14 +6487,17 @@ void __init sched_init_smp(void)
 
 	sched_init_numa();
 
-	get_online_cpus();
+	/*
+	 * There's no userspace yet to cause hotplug operations; hence all the
+	 * cpu masks are stable and all blatant races in the below code cannot
+	 * happen.
+	 */
 	mutex_lock(&sched_domains_mutex);
 	init_sched_domains(cpu_active_mask);
 	cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
 	if (cpumask_empty(non_isolated_cpus))
 		cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
 	mutex_unlock(&sched_domains_mutex);
-	put_online_cpus();
 
 	hotcpu_notifier(sched_domains_numa_masks_update, CPU_PRI_SCHED_ACTIVE);
 	hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 32a6c44d8f78..a6eb6d519284 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -234,11 +234,13 @@ static void irq_cpu_stop_queue_work(void *arg)
  */
 int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *arg)
 {
-	int call_cpu;
 	struct cpu_stop_done done;
 	struct cpu_stop_work work1, work2;
 	struct irq_cpu_stop_queue_work_info call_args;
-	struct multi_stop_data msdata = {
+	struct multi_stop_data msdata;
+
+	preempt_disable();
+	msdata = (struct multi_stop_date){
 		.fn = fn,
 		.data = arg,
 		.num_threads = 2,
@@ -262,16 +264,30 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
 	set_state(&msdata, MULTI_STOP_PREPARE);
 
 	/*
+	 * If we observe both CPUs active we know _cpu_down() cannot yet have
+	 * queued its stop_machine works and therefore ours will get executed
+	 * first. Or its not either one of our CPUs that's getting unplugged,
+	 * in which case we don't care.
+	 *
+	 * This relies on the stopper workqueues to be FIFO.
+	 */
+	if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
+		preempt_enable();
+		return -ENOENT;
+	}
+
+	/*
 	 * Queuing needs to be done by the lowest numbered CPU, to ensure
 	 * that works are always queued in the same order on every CPU.
 	 * This prevents deadlocks.
 	 */
-	call_cpu = min(cpu1, cpu2);
-
-	smp_call_function_single(call_cpu, &irq_cpu_stop_queue_work,
+	smp_call_function_single(min(cpu1, cpu2),
+				 &irq_cpu_stop_queue_work,
 				 &call_args, 0);
+	preempt_enable();
 
 	wait_for_completion(&done.completion);
+
 	return done.executed ? done.ret : -ENOENT;
 }
 

  parent reply	other threads:[~2013-10-11 12:38 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-08 10:25 [PATCH 0/6] Optimize the cpu hotplug locking -v2 Peter Zijlstra
2013-10-08 10:25 ` [PATCH 1/6] hotplug: Optimize {get,put}_online_cpus() Peter Zijlstra
2013-10-08 15:08   ` Rik van Riel
2013-10-10  5:47   ` Andrew Morton
2013-10-10 11:06     ` Oleg Nesterov
2013-10-10 14:55       ` Paul E. McKenney
2013-10-08 10:25 ` [PATCH 2/6] rcu: Create rcu_sync infrastructure Peter Zijlstra
2013-10-08 20:40   ` Jonathan Corbet
2013-10-09 19:52     ` Peter Zijlstra
2013-10-17  2:56   ` Lai Jiangshan
2013-10-17 10:36   ` Srikar Dronamraju
2013-10-08 10:25 ` [PATCH 3/6] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync Peter Zijlstra
2013-10-08 16:28   ` Paul E. McKenney
2013-10-08 10:25 ` [PATCH 4/6] rcusync: Introduce struct rcu_sync_ops Peter Zijlstra
2013-10-08 16:30   ` Paul E. McKenney
2013-10-17  2:07   ` Lai Jiangshan
     [not found]     ` <20131017154228.GL4553@linux.vnet.ibm.com>
2013-10-18  1:23       ` Lai Jiangshan
2013-10-18 12:10         ` Oleg Nesterov
2013-10-20 16:58           ` Paul E. McKenney
2013-10-08 10:25 ` [PATCH 5/6] rcusync: Add the CONFIG_PROVE_RCU checks Peter Zijlstra
2013-10-08 16:30   ` Paul E. McKenney
2013-10-08 10:25 ` [PATCH 6/6] rcusync: Introduce rcu_sync_dtor() Peter Zijlstra
2013-10-08 16:32   ` Paul E. McKenney
2013-10-08 15:27 ` [PATCH 0/6] Optimize the cpu hotplug locking -v2 Oleg Nesterov
2013-10-08 15:38   ` Peter Zijlstra
2013-10-10  5:50 ` Andrew Morton
2013-10-10  6:27   ` Ingo Molnar
2013-10-10  6:34     ` Andrew Morton
2013-10-10  7:27       ` Ingo Molnar
2013-10-10  7:33         ` Andrew Morton
2013-10-10  7:45           ` Ingo Molnar
2013-10-10 12:19   ` Peter Zijlstra
2013-10-10 14:57     ` Ingo Molnar
2013-10-10 15:21       ` Peter Zijlstra
2013-10-10 15:36         ` Oleg Nesterov
2013-10-10 16:50         ` Ingo Molnar
2013-10-10 17:13           ` Paul E. McKenney
2013-10-10 17:35             ` Ingo Molnar
2013-10-10 18:35             ` Peter Zijlstra
2013-10-10 15:26       ` Oleg Nesterov
2013-10-10 16:00         ` Andrew Morton
2013-10-10 16:36           ` Steven Rostedt
2013-10-10 16:43             ` Andrew Morton
2013-10-10 16:53               ` Peter Zijlstra
2013-10-10 17:13                 ` Steven Rostedt
2013-10-10 17:48                   ` Andrew Morton
2013-10-10 18:10                     ` Linus Torvalds
2013-10-10 18:43                       ` Steven Rostedt
2013-10-10 18:50                         ` Peter Zijlstra
2013-10-10 19:15                           ` Paul E. McKenney
2013-10-10 19:00                         ` Linus Torvalds
2013-10-10 18:46                       ` Peter Zijlstra
2013-10-10 18:34                     ` Peter Zijlstra
2013-10-10 18:49                       ` Linus Torvalds
2013-10-10 19:04                         ` Steven Rostedt
2013-10-10 19:16                           ` Linus Torvalds
2013-10-10 19:34                             ` Peter Zijlstra
2013-10-10 19:34                             ` Steven Rostedt
2013-10-11  6:09                             ` Ingo Molnar
2013-10-11 12:38                         ` Peter Zijlstra [this message]
2013-10-11 18:25                           ` Oleg Nesterov
2013-10-11 20:48                             ` Peter Zijlstra
2013-10-12 17:06                               ` Oleg Nesterov
2013-10-14  9:05                                 ` Peter Zijlstra
2013-10-14  9:23                                   ` Paul E. McKenney
2013-10-15  1:01                                     ` Paul E. McKenney
2013-10-17 16:49                           ` [tip:sched/core] sched: Remove get_online_cpus() usage tip-bot for Peter Zijlstra
2013-10-10 17:39                 ` [PATCH 0/6] Optimize the cpu hotplug locking -v2 Oleg Nesterov
2013-10-10 16:52           ` Ingo Molnar
2013-10-10 17:44             ` Paul E. McKenney
2013-10-10 16:54           ` Oleg Nesterov
2013-10-10 19:04             ` Srivatsa S. Bhat
2013-10-10 18:52         ` Srivatsa S. Bhat
  -- strict thread matches above, loose matches on Subject: below --
2013-10-11 12:14 George Spelvin
2013-10-11 12:45 ` Steven Rostedt
2013-10-11 13:51   ` George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131011123820.GV3081@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.