[PATCH 0/6] sched/cpusets fixes, more changes are needed

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] sched/cpusets fixes, more changes are needed
@ 2010-03-15  9:09 Oleg Nesterov
  2010-03-24 17:38 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-15  9:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie,
	Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

Ingo, Peter.

Unless I missed something, with or without these patches the TASK_WAKING
logic in do_fork() is very broken.

	- do_fork() clears PF_STARTING and then calls wake_up_new_task()
	  which finally does s/WAKING/RUNNING.

	  But. Nobody can take rq->lock in between. This means a signal
	  from irq (quite possible with CLONE_THREAD) or another rt
	  thread which preempts us can lockup.

	- the comment in wake_up_new_task says:

		We still have TASK_WAKING but PF_STARTING is gone now, meaning
		->cpus_allowed is stable

	  this is not true. Yes, nobody can take rq->lock _after_ we cleared
	  PF_STARTING, but it is possible that another thread took this lock
	  before and still holds it doing, say, sched_setaffinity().

No?

If yes. I can make a patch, but the question is: what is the point to use
TASK_WAKING in fork pathes? Can't sched_fork() set TASK_RUNNING instead?
Afaics, TASK_RUNNING can equally protect from premature wakeups but doesn't
these PF_STARTING complications.

As for this series. Please review. I don't understand how it is possible
to really test these changes.

Dear cpuset developers! Please review ;) If you don't like 6/6, please make
a better fix. I tried to make as "simple" patch as possible because I hardly
understand cpuset.c, last time I quickly read it a long ago.

Oleg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
  2010-03-15  9:09 [PATCH 0/6] sched/cpusets fixes, more changes are needed Oleg Nesterov
@ 2010-03-24 17:38 ` Peter Zijlstra
  2010-03-24 18:09   ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2010-03-24 17:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
	Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> Ingo, Peter.
> 
> Unless I missed something, with or without these patches the TASK_WAKING
> logic in do_fork() is very broken.
> 
> 	- do_fork() clears PF_STARTING and then calls wake_up_new_task()
> 	  which finally does s/WAKING/RUNNING.
> 
> 	  But. Nobody can take rq->lock in between. This means a signal
> 	  from irq (quite possible with CLONE_THREAD) or another rt
> 	  thread which preempts us can lockup.

Hmm, the signal case might indeed be a problem, however I cannot see how
the RT thread can be a problem because until we do wake_up_new_task()
the child will not be runnable and can thus not be preempted.

We could frob it by taking rq->lock over clearing PF_STARTING but that's
beyond ugly...

> 	- the comment in wake_up_new_task says:
> 	
> 		We still have TASK_WAKING but PF_STARTING is gone now, meaning
> 		->cpus_allowed is stable
> 
> 	  this is not true. Yes, nobody can take rq->lock _after_ we cleared
> 	  PF_STARTING, but it is possible that another thread took this lock
> 	  before and still holds it doing, say, sched_setaffinity().
> 
> No?
> 
> If yes. I can make a patch, but the question is: what is the point to use
> TASK_WAKING in fork pathes? Can't sched_fork() set TASK_RUNNING instead?
> Afaics, TASK_RUNNING can equally protect from premature wakeups but doesn't
> these PF_STARTING complications.

Argh, yes.. that's because PF_STARTING is cleared after we expose the
PID, and we needed the PF_STARTING exemption because of that
ns_cgroup_clone() trainwreck.

The reason we have that TASK_WAKING stuff for fork is because
wake_up_new_task() needs p->cpus_allowed to be stable, and we cannot do
select_task_rq() with rq->lock held because of the cgroup-sched crap.

/me goes read the code after applying your patches and frobs the
following patch on top..


So the below patch makes select_task_rq_fair unlock the rq when needed,
and then puts all ->select_task_rq() calls under rq->lock. This should
allow us to remove the TASK_WAKING thing from fork which in turn allows
us to remove the PF_STARTING check in task_is_waking.

How does that look?

(totally untested, will try and boot after dinner)


---
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1051,7 +1051,8 @@ struct sched_class {
 	void (*put_prev_task) (struct rq *rq, struct task_struct *p);
 
 #ifdef CONFIG_SMP
-	int  (*select_task_rq)(struct task_struct *p, int sd_flag, int flags);
+	int  (*select_task_rq)(struct rq *rq, struct task_struct *p,
+			       int sd_flag, int flags);
 
 	void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
 	void (*post_schedule) (struct rq *this_rq);
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -916,14 +916,10 @@ static inline void finish_lock_switch(st
 /*
  * Check whether the task is waking, we use this to synchronize against
  * ttwu() so that task_cpu() reports a stable number.
- *
- * We need to make an exception for PF_STARTING tasks because the fork
- * path might require task_rq_lock() to work, eg. it can call
- * set_cpus_allowed_ptr() from the cpuset clone_ns code.
  */
 static inline int task_is_waking(struct task_struct *p)
 {
-	return unlikely((p->state == TASK_WAKING) && !(p->flags & PF_STARTING));
+	return unlikely(p->state == TASK_WAKING);
 }
 
 /*
@@ -2319,9 +2315,9 @@ static int select_fallback_rq(int cpu, s
  * The caller (fork, wakeup) owns TASK_WAKING, ->cpus_allowed is stable.
  */
 static inline
-int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
+int select_task_rq(struct rq *rq, struct task_struct *p, int sd_flags, int wake_flags)
 {
-	int cpu = p->sched_class->select_task_rq(p, sd_flags, wake_flags);
+	int cpu = p->sched_class->select_task_rq(rq, p, sd_flags, wake_flags);
 
 	/*
 	 * In order not to call set_task_cpu() on a blocking task we need
@@ -2392,9 +2388,7 @@ static int try_to_wake_up(struct task_st
 	if (p->sched_class->task_waking)
 		p->sched_class->task_waking(rq, p);
 
-	__task_rq_unlock(rq);
-
-	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
+	cpu = select_task_rq(rq, p, SD_BALANCE_WAKE, wake_flags);
 	if (cpu != orig_cpu) {
 		/*
 		 * Since we migrate the task without holding any rq->lock,
@@ -2403,6 +2397,7 @@ static int try_to_wake_up(struct task_st
 		 */
 		set_task_cpu(p, cpu);
 	}
+	__task_rq_unlock(rq);
 
 	rq = cpu_rq(cpu);
 	raw_spin_lock(&rq->lock);
@@ -2533,7 +2528,7 @@ void sched_fork(struct task_struct *p, i
 	 * nobody will actually run it, and a signal or other external
 	 * event cannot wake it up and insert it on the runqueue either.
 	 */
-	p->state = TASK_WAKING;
+	p->state = TASK_RUNNING;
 
 	/*
 	 * Revert to default priority/policy on fork if requested.
@@ -2600,28 +2595,25 @@ void wake_up_new_task(struct task_struct
 	int cpu __maybe_unused = get_cpu();
 
 #ifdef CONFIG_SMP
+	rq = task_rq_lock(p, &flags);
+	p->state = TASK_WAKING;
+
 	/*
 	 * Fork balancing, do it here and not earlier because:
 	 *  - cpus_allowed can change in the fork path
 	 *  - any previously selected cpu might disappear through hotplug
 	 *
-	 * We still have TASK_WAKING but PF_STARTING is gone now, meaning
-	 * ->cpus_allowed is stable, we have preemption disabled, meaning
-	 * cpu_online_mask is stable.
+	 * We set TASK_WAKING so that select_task_rq() can drop rq->lock
+	 * without people poking at ->cpus_allowed.
 	 */
-	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
+	cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
 	set_task_cpu(p, cpu);
-#endif
 
-	/*
-	 * Since the task is not on the rq and we still have TASK_WAKING set
-	 * nobody else will migrate this task.
-	 */
-	rq = cpu_rq(cpu);
-	raw_spin_lock_irqsave(&rq->lock, flags);
-
-	BUG_ON(p->state != TASK_WAKING);
 	p->state = TASK_RUNNING;
+	task_rq_unlock(rq, &flags);
+#endif
+
+	rq = task_rq_lock(p, &flags);
 	activate_task(rq, p, 0);
 	trace_sched_wakeup_new(rq, p, 1);
 	check_preempt_curr(rq, p, WF_FORK);
@@ -3067,19 +3059,15 @@ void sched_exec(void)
 {
 	struct task_struct *p = current;
 	struct migration_req req;
-	int dest_cpu, this_cpu;
 	unsigned long flags;
 	struct rq *rq;
-
-	this_cpu = get_cpu();
-	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
-	if (dest_cpu == this_cpu) {
-		put_cpu();
-		return;
-	}
+	int dest_cpu;
 
 	rq = task_rq_lock(p, &flags);
-	put_cpu();
+	dest_cpu = p->sched_class->select_task_rq(rq, p, SD_BALANCE_EXEC, 0);
+	if (dest_cpu == smp_processor_id())
+		goto unlock;
+
 	/*
 	 * select_task_rq() can race against ->cpus_allowed
 	 */
@@ -3097,6 +3085,7 @@ void sched_exec(void)
 
 		return;
 	}
+unlock:
 	task_rq_unlock(rq, &flags);
 }
 
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1414,7 +1414,8 @@ select_idle_sibling(struct task_struct *
  *
  * preempt must be disabled.
  */
-static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
+static int
+select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_flags)
 {
 	struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
 	int cpu = smp_processor_id();
@@ -1512,8 +1513,11 @@ static int select_task_rq_fair(struct ta
 				  cpumask_weight(sched_domain_span(sd))))
 			tmp = affine_sd;
 
-		if (tmp)
+		if (tmp) {
+			raw_spin_unlock(&rq->lock);
 			update_shares(tmp);
+			raw_spin_lock(&rq->lock);
+		}
 	}
 #endif
 
Index: linux-2.6/kernel/sched_idletask.c
===================================================================
--- linux-2.6.orig/kernel/sched_idletask.c
+++ linux-2.6/kernel/sched_idletask.c
@@ -6,7 +6,8 @@
  */
 
 #ifdef CONFIG_SMP
-static int select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
+static int
+select_task_rq_idle(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
 {
 	return task_cpu(p); /* IDLE tasks as never migrated */
 }
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -948,10 +948,9 @@ static void yield_task_rt(struct rq *rq)
 #ifdef CONFIG_SMP
 static int find_lowest_rq(struct task_struct *task);
 
-static int select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
+static int
+select_task_rq_rt(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
 {
-	struct rq *rq = task_rq(p);
-
 	if (sd_flag != SD_BALANCE_WAKE)
 		return smp_processor_id();
 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
  2010-03-24 17:38 ` Peter Zijlstra
@ 2010-03-24 18:09   ` Oleg Nesterov
  2010-03-25 10:22     ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-24 18:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
	Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

On 03/24, Peter Zijlstra wrote:
>
> On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> >
> > 	- do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > 	  which finally does s/WAKING/RUNNING.
> >
> > 	  But. Nobody can take rq->lock in between. This means a signal
> > 	  from irq (quite possible with CLONE_THREAD) or another rt
> > 	  thread which preempts us can lockup.
>
> Hmm, the signal case might indeed be a problem, however I cannot see how
> the RT thread can be a problem because until we do wake_up_new_task()
> the child will not be runnable and can thus not be preempted.

Indeed, but I meant the _parent_ can be preempted ;)

In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can
be called from any context, it should be irq-safe: any owner must disable
inerrupts and preemption.

> The reason we have that TASK_WAKING stuff for fork is because
> wake_up_new_task() needs p->cpus_allowed to be stable

Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
a moment, but afaics that is all?

> So the below patch makes select_task_rq_fair unlock the rq when needed,
> and then puts all ->select_task_rq() calls under rq->lock. This should
> allow us to remove the TASK_WAKING thing from fork which in turn allows
> us to remove the PF_STARTING check in task_is_waking.
>
> How does that look?

I'll try to read this patch tomorrow. But could you please consider
the suggestion above?

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
  2010-03-24 18:09   ` Oleg Nesterov
@ 2010-03-25 10:22     ` Peter Zijlstra
  2010-03-25 15:46       ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2010-03-25 10:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
	Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote:
> On 03/24, Peter Zijlstra wrote:
> >
> > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> > >
> > > 	- do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > > 	  which finally does s/WAKING/RUNNING.
> > >
> > > 	  But. Nobody can take rq->lock in between. This means a signal
> > > 	  from irq (quite possible with CLONE_THREAD) or another rt
> > > 	  thread which preempts us can lockup.
> >
> > Hmm, the signal case might indeed be a problem, however I cannot see how
> > the RT thread can be a problem because until we do wake_up_new_task()
> > the child will not be runnable and can thus not be preempted.
> 
> Indeed, but I meant the _parent_ can be preempted ;)

I still can't see how that would be a problem..

> In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can
> be called from any context, it should be irq-safe: any owner must disable
> inerrupts and preemption.

Agreed, and I think that's corrected with my patch.

> > The reason we have that TASK_WAKING stuff for fork is because
> > wake_up_new_task() needs p->cpus_allowed to be stable
> 
> Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
> like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
> a moment, but afaics that is all?

My patch does that.

> > So the below patch makes select_task_rq_fair unlock the rq when needed,
> > and then puts all ->select_task_rq() calls under rq->lock. This should
> > allow us to remove the TASK_WAKING thing from fork which in turn allows
> > us to remove the PF_STARTING check in task_is_waking.
> >
> > How does that look?
> 
> I'll try to read this patch tomorrow. But could you please consider
> the suggestion above?

I think I do all those :-)

I was still looking at removing the TASK_WAKING check from
task_rq_lock() since now we do set_task_cpu() under rq->lock again so it
should be good again.

Hmm, except for sched_fork() that still does set_task_cpu() without
holding rq->lock, but that is before the child gets exposed so there
should not be any concurrency.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
  2010-03-25 10:22     ` Peter Zijlstra
@ 2010-03-25 15:46       ` Oleg Nesterov
  2010-03-25 16:02         ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-25 15:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
	Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

On 03/25, Peter Zijlstra wrote:
>
> On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote:
> > On 03/24, Peter Zijlstra wrote:
> > >
> > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> > > >
> > > > 	- do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > > > 	  which finally does s/WAKING/RUNNING.
> > > >
> > > > 	  But. Nobody can take rq->lock in between. This means a signal
> > > > 	  from irq (quite possible with CLONE_THREAD) or another rt
> > > > 	  thread which preempts us can lockup.
> > >
> > > Hmm, the signal case might indeed be a problem, however I cannot see how
> > > the RT thread can be a problem because until we do wake_up_new_task()
> > > the child will not be runnable and can thus not be preempted.
> >
> > Indeed, but I meant the _parent_ can be preempted ;)
>
> I still can't see how that would be a problem..

The parent P does do_fork(), copy_process returns the new child C with
TASK_WAKING at PF_STARTING set.

do_fork() clears PF_STARTING, but TASK_WAKING is still set, and C is
already visible to the user-space

An rt-thread X preempts P and calls ttwu() (say, it sends a signal to C)

ttwu() loops in task_rq_lock() "forever", because TASK_WAKING is still
set.

> > > The reason we have that TASK_WAKING stuff for fork is because
> > > wake_up_new_task() needs p->cpus_allowed to be stable
> >
> > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
> > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
> > a moment, but afaics that is all?
>
> My patch does that.

OK, that is what I meant. Now, why sched_fork() can't just set TASK_RUNNING ?
This way the "spurious" wakeup can do nothing with the new child, and we
do not have problems with cpuset which needs rq->lock for set_cpus_allowed().

> > > So the below patch makes select_task_rq_fair unlock the rq when needed,
> > > and then puts all ->select_task_rq() calls under rq->lock.

Yes, I thought about this too. I tried to preserve the current
"->select_task_rq() is called without rq->lock" logic.

> This should
> > > allow us to remove the TASK_WAKING thing from fork

Confused. Why can't we simply remove TASK_WAKING from fork without any
changes except in wake_up_new_task() ?

> which in turn allows
> > > us to remove the PF_STARTING check in task_is_waking.

Even if I do not think I understand sched.c above, I'd say we must do
this in any case ;)

> I was still looking at removing the TASK_WAKING check from
> task_rq_lock()

Peter, I can't apply your patch due to rejects (will try again later)
but at first glance, it makes TASK_WAKING unneeded? Since we do not
drop the lock after we set TASK_WAKING, why do we need this state at
all ?

Aha... select_task_rq_fair() can drop the lock, yes? Well, in this
case probably select_task_rq_fair() can set TASK_WAKING before unlock?

I like the current idea to call select_task_rq() without rq->lock, but
of course this is up to you.

However, once again, can't we make a simpler patch?

	- remove PF_STARTING from task_waking()

	- change sched_fork() to set RUNNING instead of WAKING

	- change wake_up_new_task() to set WAKING under rq->lock

This looks simpler to me, and allows to drop rq->lock in ttwu() right
after it sets WAKING.

Another change which seems reasonable is to allow ttwu() to take rq->lock
even if WAKING is set, it can do nothing but check task->state in this case.

What do you think?

Oleg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
  2010-03-25 15:46       ` Oleg Nesterov
@ 2010-03-25 16:02         ` Oleg Nesterov
  2010-03-25 16:10           ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-25 16:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
	Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

On 03/25, Oleg Nesterov wrote:
>
> I like the current idea to call select_task_rq() without rq->lock, but
> of course this is up to you.
>
> However, once again, can't we make a simpler patch?
>
> 	- remove PF_STARTING from task_waking()
> 	
> 	- change sched_fork() to set RUNNING instead of WAKING
>
> 	- change wake_up_new_task() to set WAKING under rq->lock
>
> This looks simpler to me, and allows to drop rq->lock in ttwu() right
> after it sets WAKING.

IOW, something like the (unchecked, uncompiled) patch below.

What do you think?

Oleg.

--- x/kernel/sched.c
+++ x/kernel/sched.c
@@ -912,7 +912,7 @@ static inline void finish_lock_switch(st
  */
 static inline int task_is_waking(struct task_struct *p)
 {
-	return unlikely((p->state == TASK_WAKING) && !(p->flags & PF_STARTING));
+	return unlikely(p->state == TASK_WAKING);
 }
 
 /*
@@ -2568,11 +2568,10 @@ void sched_fork(struct task_struct *p, i
 
 	__sched_fork(p);
 	/*
-	 * We mark the process as waking here. This guarantees that
-	 * nobody will actually run it, and a signal or other external
-	 * event cannot wake it up and insert it on the runqueue either.
+	 * We mark the process as running here. This guarantees that
+	 * nobody will actually wake it up until wake_up_new_task().
 	 */
-	p->state = TASK_WAKING;
+	p->state = TASK_RUNNING;
 
 	/*
 	 * Revert to default priority/policy on fork if requested.
@@ -2638,15 +2637,18 @@ void wake_up_new_task(struct task_struct
 	struct rq *rq;
 	int cpu = get_cpu();
 
+	p->state = TASK_WAKING;
+	smp_mb();
+	raw_spin_unlock_wait(&rq->lock);
+
 #ifdef CONFIG_SMP
 	/*
 	 * Fork balancing, do it here and not earlier because:
 	 *  - cpus_allowed can change in the fork path
 	 *  - any previously selected cpu might disappear through hotplug
 	 *
-	 * We still have TASK_WAKING but PF_STARTING is gone now, meaning
-	 * ->cpus_allowed is stable, we have preemption disabled, meaning
-	 * cpu_online_mask is stable.
+	 * TASK_WAKING means ->cpus_allowed is stable, we have preemption
+	 * disabled, meaning cpu_online_mask is stable.
 	 */
 	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
 	set_task_cpu(p, cpu);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
  2010-03-25 16:02         ` Oleg Nesterov
@ 2010-03-25 16:10           ` Oleg Nesterov
  2010-03-25 17:29             ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-25 16:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
	Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

Argh, sorry for noise...

On 03/25, Oleg Nesterov wrote:
>
> On 03/25, Oleg Nesterov wrote:
> >
> > I like the current idea to call select_task_rq() without rq->lock, but
> > of course this is up to you.
> >
> > However, once again, can't we make a simpler patch?
> >
> > 	- remove PF_STARTING from task_waking()
> > 	
> > 	- change sched_fork() to set RUNNING instead of WAKING

When I reread this thread, suddenly finally I noticed you mentioned
_twice_ your patch does this too ;) Not to mention the patch itself
which I misread. Sorry.

> IOW, something like the (unchecked, uncompiled) patch below.

Still, what do you think?

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
  2010-03-25 16:10           ` Oleg Nesterov
@ 2010-03-25 17:29             ` Peter Zijlstra
  2010-03-25 19:15               ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2010-03-25 17:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
	Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

On Thu, 2010-03-25 at 17:10 +0100, Oleg Nesterov wrote:
> Argh, sorry for noise...
> 
> On 03/25, Oleg Nesterov wrote:
> >
> > On 03/25, Oleg Nesterov wrote:
> > >
> > > I like the current idea to call select_task_rq() without rq->lock, but
> > > of course this is up to you.
> > >
> > > However, once again, can't we make a simpler patch?
> > >
> > > 	- remove PF_STARTING from task_waking()
> > > 	
> > > 	- change sched_fork() to set RUNNING instead of WAKING
> 
> When I reread this thread, suddenly finally I noticed you mentioned
> _twice_ your patch does this too ;) Not to mention the patch itself
> which I misread. Sorry.
> 
> > IOW, something like the (unchecked, uncompiled) patch below.
> 
> Still, what do you think?

Yeah, such a smaller patch might work too, but I was trying to remove
some more of the complexity we grown.

Being able to fully remove that TASK_WAKING check from task_rq_lock()
and only have it in set_cpus_allowed_ptr() would reduce some fast-path
logic.

You patch add a memory barrier and an unlock_wait(), which, while
seemingly correct, are harder to parse than the modified locking.

(Ideally we'd protect ->cpus_allowed using a per-task lock, but adding
more atomics ops to ttwu() is to be avoided)

(Now if I could manage to remove that lock-drop for the cgroup muck we'd
be able to remove TASK_WAKING... but that looks like a long term goal)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
  2010-03-25 17:29             ` Peter Zijlstra
@ 2010-03-25 19:15               ` Oleg Nesterov
  0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-25 19:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
	Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel

On 03/25, Peter Zijlstra wrote:
>
> Yeah, such a smaller patch might work too, but I was trying to remove
> some more of the complexity we grown.
>
> Being able to fully remove that TASK_WAKING check from task_rq_lock()
> and only have it in set_cpus_allowed_ptr() would reduce some fast-path
> logic.

OK. Agreed.

> You patch add a memory barrier and an unlock_wait(), which, while
> seemingly correct, are harder to parse than the modified locking.

Yes, lock + set WAKING + unlock is simpler and cleaner, but this
doesn't matter.

I think your patch should address all problems.

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-03-25 19:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-15  9:09 [PATCH 0/6] sched/cpusets fixes, more changes are needed Oleg Nesterov
2010-03-24 17:38 ` Peter Zijlstra
2010-03-24 18:09   ` Oleg Nesterov
2010-03-25 10:22     ` Peter Zijlstra
2010-03-25 15:46       ` Oleg Nesterov
2010-03-25 16:02         ` Oleg Nesterov
2010-03-25 16:10           ` Oleg Nesterov
2010-03-25 17:29             ` Peter Zijlstra
2010-03-25 19:15               ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox