* [PATCH 0/6] sched/cpusets fixes, more changes are needed
@ 2010-03-15 9:09 Oleg Nesterov
2010-03-24 17:38 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-15 9:09 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie,
Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
Ingo, Peter.
Unless I missed something, with or without these patches the TASK_WAKING
logic in do_fork() is very broken.
- do_fork() clears PF_STARTING and then calls wake_up_new_task()
which finally does s/WAKING/RUNNING.
But. Nobody can take rq->lock in between. This means a signal
from irq (quite possible with CLONE_THREAD) or another rt
thread which preempts us can lockup.
- the comment in wake_up_new_task says:
We still have TASK_WAKING but PF_STARTING is gone now, meaning
->cpus_allowed is stable
this is not true. Yes, nobody can take rq->lock _after_ we cleared
PF_STARTING, but it is possible that another thread took this lock
before and still holds it doing, say, sched_setaffinity().
No?
If yes. I can make a patch, but the question is: what is the point to use
TASK_WAKING in fork pathes? Can't sched_fork() set TASK_RUNNING instead?
Afaics, TASK_RUNNING can equally protect from premature wakeups but doesn't
these PF_STARTING complications.
As for this series. Please review. I don't understand how it is possible
to really test these changes.
Dear cpuset developers! Please review ;) If you don't like 6/6, please make
a better fix. I tried to make as "simple" patch as possible because I hardly
understand cpuset.c, last time I quickly read it a long ago.
Oleg.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
2010-03-15 9:09 [PATCH 0/6] sched/cpusets fixes, more changes are needed Oleg Nesterov
@ 2010-03-24 17:38 ` Peter Zijlstra
2010-03-24 18:09 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2010-03-24 17:38 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> Ingo, Peter.
>
> Unless I missed something, with or without these patches the TASK_WAKING
> logic in do_fork() is very broken.
>
> - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> which finally does s/WAKING/RUNNING.
>
> But. Nobody can take rq->lock in between. This means a signal
> from irq (quite possible with CLONE_THREAD) or another rt
> thread which preempts us can lockup.
Hmm, the signal case might indeed be a problem, however I cannot see how
the RT thread can be a problem because until we do wake_up_new_task()
the child will not be runnable and can thus not be preempted.
We could frob it by taking rq->lock over clearing PF_STARTING but that's
beyond ugly...
> - the comment in wake_up_new_task says:
>
> We still have TASK_WAKING but PF_STARTING is gone now, meaning
> ->cpus_allowed is stable
>
> this is not true. Yes, nobody can take rq->lock _after_ we cleared
> PF_STARTING, but it is possible that another thread took this lock
> before and still holds it doing, say, sched_setaffinity().
>
> No?
>
> If yes. I can make a patch, but the question is: what is the point to use
> TASK_WAKING in fork pathes? Can't sched_fork() set TASK_RUNNING instead?
> Afaics, TASK_RUNNING can equally protect from premature wakeups but doesn't
> these PF_STARTING complications.
Argh, yes.. that's because PF_STARTING is cleared after we expose the
PID, and we needed the PF_STARTING exemption because of that
ns_cgroup_clone() trainwreck.
The reason we have that TASK_WAKING stuff for fork is because
wake_up_new_task() needs p->cpus_allowed to be stable, and we cannot do
select_task_rq() with rq->lock held because of the cgroup-sched crap.
/me goes read the code after applying your patches and frobs the
following patch on top..
So the below patch makes select_task_rq_fair unlock the rq when needed,
and then puts all ->select_task_rq() calls under rq->lock. This should
allow us to remove the TASK_WAKING thing from fork which in turn allows
us to remove the PF_STARTING check in task_is_waking.
How does that look?
(totally untested, will try and boot after dinner)
---
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1051,7 +1051,8 @@ struct sched_class {
void (*put_prev_task) (struct rq *rq, struct task_struct *p);
#ifdef CONFIG_SMP
- int (*select_task_rq)(struct task_struct *p, int sd_flag, int flags);
+ int (*select_task_rq)(struct rq *rq, struct task_struct *p,
+ int sd_flag, int flags);
void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
void (*post_schedule) (struct rq *this_rq);
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -916,14 +916,10 @@ static inline void finish_lock_switch(st
/*
* Check whether the task is waking, we use this to synchronize against
* ttwu() so that task_cpu() reports a stable number.
- *
- * We need to make an exception for PF_STARTING tasks because the fork
- * path might require task_rq_lock() to work, eg. it can call
- * set_cpus_allowed_ptr() from the cpuset clone_ns code.
*/
static inline int task_is_waking(struct task_struct *p)
{
- return unlikely((p->state == TASK_WAKING) && !(p->flags & PF_STARTING));
+ return unlikely(p->state == TASK_WAKING);
}
/*
@@ -2319,9 +2315,9 @@ static int select_fallback_rq(int cpu, s
* The caller (fork, wakeup) owns TASK_WAKING, ->cpus_allowed is stable.
*/
static inline
-int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
+int select_task_rq(struct rq *rq, struct task_struct *p, int sd_flags, int wake_flags)
{
- int cpu = p->sched_class->select_task_rq(p, sd_flags, wake_flags);
+ int cpu = p->sched_class->select_task_rq(rq, p, sd_flags, wake_flags);
/*
* In order not to call set_task_cpu() on a blocking task we need
@@ -2392,9 +2388,7 @@ static int try_to_wake_up(struct task_st
if (p->sched_class->task_waking)
p->sched_class->task_waking(rq, p);
- __task_rq_unlock(rq);
-
- cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
+ cpu = select_task_rq(rq, p, SD_BALANCE_WAKE, wake_flags);
if (cpu != orig_cpu) {
/*
* Since we migrate the task without holding any rq->lock,
@@ -2403,6 +2397,7 @@ static int try_to_wake_up(struct task_st
*/
set_task_cpu(p, cpu);
}
+ __task_rq_unlock(rq);
rq = cpu_rq(cpu);
raw_spin_lock(&rq->lock);
@@ -2533,7 +2528,7 @@ void sched_fork(struct task_struct *p, i
* nobody will actually run it, and a signal or other external
* event cannot wake it up and insert it on the runqueue either.
*/
- p->state = TASK_WAKING;
+ p->state = TASK_RUNNING;
/*
* Revert to default priority/policy on fork if requested.
@@ -2600,28 +2595,25 @@ void wake_up_new_task(struct task_struct
int cpu __maybe_unused = get_cpu();
#ifdef CONFIG_SMP
+ rq = task_rq_lock(p, &flags);
+ p->state = TASK_WAKING;
+
/*
* Fork balancing, do it here and not earlier because:
* - cpus_allowed can change in the fork path
* - any previously selected cpu might disappear through hotplug
*
- * We still have TASK_WAKING but PF_STARTING is gone now, meaning
- * ->cpus_allowed is stable, we have preemption disabled, meaning
- * cpu_online_mask is stable.
+ * We set TASK_WAKING so that select_task_rq() can drop rq->lock
+ * without people poking at ->cpus_allowed.
*/
- cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
+ cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
set_task_cpu(p, cpu);
-#endif
- /*
- * Since the task is not on the rq and we still have TASK_WAKING set
- * nobody else will migrate this task.
- */
- rq = cpu_rq(cpu);
- raw_spin_lock_irqsave(&rq->lock, flags);
-
- BUG_ON(p->state != TASK_WAKING);
p->state = TASK_RUNNING;
+ task_rq_unlock(rq, &flags);
+#endif
+
+ rq = task_rq_lock(p, &flags);
activate_task(rq, p, 0);
trace_sched_wakeup_new(rq, p, 1);
check_preempt_curr(rq, p, WF_FORK);
@@ -3067,19 +3059,15 @@ void sched_exec(void)
{
struct task_struct *p = current;
struct migration_req req;
- int dest_cpu, this_cpu;
unsigned long flags;
struct rq *rq;
-
- this_cpu = get_cpu();
- dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
- if (dest_cpu == this_cpu) {
- put_cpu();
- return;
- }
+ int dest_cpu;
rq = task_rq_lock(p, &flags);
- put_cpu();
+ dest_cpu = p->sched_class->select_task_rq(rq, p, SD_BALANCE_EXEC, 0);
+ if (dest_cpu == smp_processor_id())
+ goto unlock;
+
/*
* select_task_rq() can race against ->cpus_allowed
*/
@@ -3097,6 +3085,7 @@ void sched_exec(void)
return;
}
+unlock:
task_rq_unlock(rq, &flags);
}
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1414,7 +1414,8 @@ select_idle_sibling(struct task_struct *
*
* preempt must be disabled.
*/
-static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
+static int
+select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_flags)
{
struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
int cpu = smp_processor_id();
@@ -1512,8 +1513,11 @@ static int select_task_rq_fair(struct ta
cpumask_weight(sched_domain_span(sd))))
tmp = affine_sd;
- if (tmp)
+ if (tmp) {
+ raw_spin_unlock(&rq->lock);
update_shares(tmp);
+ raw_spin_lock(&rq->lock);
+ }
}
#endif
Index: linux-2.6/kernel/sched_idletask.c
===================================================================
--- linux-2.6.orig/kernel/sched_idletask.c
+++ linux-2.6/kernel/sched_idletask.c
@@ -6,7 +6,8 @@
*/
#ifdef CONFIG_SMP
-static int select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
+static int
+select_task_rq_idle(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
{
return task_cpu(p); /* IDLE tasks as never migrated */
}
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -948,10 +948,9 @@ static void yield_task_rt(struct rq *rq)
#ifdef CONFIG_SMP
static int find_lowest_rq(struct task_struct *task);
-static int select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
+static int
+select_task_rq_rt(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
{
- struct rq *rq = task_rq(p);
-
if (sd_flag != SD_BALANCE_WAKE)
return smp_processor_id();
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
2010-03-24 17:38 ` Peter Zijlstra
@ 2010-03-24 18:09 ` Oleg Nesterov
2010-03-25 10:22 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-24 18:09 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On 03/24, Peter Zijlstra wrote:
>
> On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> >
> > - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > which finally does s/WAKING/RUNNING.
> >
> > But. Nobody can take rq->lock in between. This means a signal
> > from irq (quite possible with CLONE_THREAD) or another rt
> > thread which preempts us can lockup.
>
> Hmm, the signal case might indeed be a problem, however I cannot see how
> the RT thread can be a problem because until we do wake_up_new_task()
> the child will not be runnable and can thus not be preempted.
Indeed, but I meant the _parent_ can be preempted ;)
In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can
be called from any context, it should be irq-safe: any owner must disable
inerrupts and preemption.
> The reason we have that TASK_WAKING stuff for fork is because
> wake_up_new_task() needs p->cpus_allowed to be stable
Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
a moment, but afaics that is all?
> So the below patch makes select_task_rq_fair unlock the rq when needed,
> and then puts all ->select_task_rq() calls under rq->lock. This should
> allow us to remove the TASK_WAKING thing from fork which in turn allows
> us to remove the PF_STARTING check in task_is_waking.
>
> How does that look?
I'll try to read this patch tomorrow. But could you please consider
the suggestion above?
Oleg.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
2010-03-24 18:09 ` Oleg Nesterov
@ 2010-03-25 10:22 ` Peter Zijlstra
2010-03-25 15:46 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2010-03-25 10:22 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote:
> On 03/24, Peter Zijlstra wrote:
> >
> > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> > >
> > > - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > > which finally does s/WAKING/RUNNING.
> > >
> > > But. Nobody can take rq->lock in between. This means a signal
> > > from irq (quite possible with CLONE_THREAD) or another rt
> > > thread which preempts us can lockup.
> >
> > Hmm, the signal case might indeed be a problem, however I cannot see how
> > the RT thread can be a problem because until we do wake_up_new_task()
> > the child will not be runnable and can thus not be preempted.
>
> Indeed, but I meant the _parent_ can be preempted ;)
I still can't see how that would be a problem..
> In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can
> be called from any context, it should be irq-safe: any owner must disable
> inerrupts and preemption.
Agreed, and I think that's corrected with my patch.
> > The reason we have that TASK_WAKING stuff for fork is because
> > wake_up_new_task() needs p->cpus_allowed to be stable
>
> Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
> like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
> a moment, but afaics that is all?
My patch does that.
> > So the below patch makes select_task_rq_fair unlock the rq when needed,
> > and then puts all ->select_task_rq() calls under rq->lock. This should
> > allow us to remove the TASK_WAKING thing from fork which in turn allows
> > us to remove the PF_STARTING check in task_is_waking.
> >
> > How does that look?
>
> I'll try to read this patch tomorrow. But could you please consider
> the suggestion above?
I think I do all those :-)
I was still looking at removing the TASK_WAKING check from
task_rq_lock() since now we do set_task_cpu() under rq->lock again so it
should be good again.
Hmm, except for sched_fork() that still does set_task_cpu() without
holding rq->lock, but that is before the child gets exposed so there
should not be any concurrency.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
2010-03-25 10:22 ` Peter Zijlstra
@ 2010-03-25 15:46 ` Oleg Nesterov
2010-03-25 16:02 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-25 15:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On 03/25, Peter Zijlstra wrote:
>
> On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote:
> > On 03/24, Peter Zijlstra wrote:
> > >
> > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> > > >
> > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > > > which finally does s/WAKING/RUNNING.
> > > >
> > > > But. Nobody can take rq->lock in between. This means a signal
> > > > from irq (quite possible with CLONE_THREAD) or another rt
> > > > thread which preempts us can lockup.
> > >
> > > Hmm, the signal case might indeed be a problem, however I cannot see how
> > > the RT thread can be a problem because until we do wake_up_new_task()
> > > the child will not be runnable and can thus not be preempted.
> >
> > Indeed, but I meant the _parent_ can be preempted ;)
>
> I still can't see how that would be a problem..
The parent P does do_fork(), copy_process returns the new child C with
TASK_WAKING at PF_STARTING set.
do_fork() clears PF_STARTING, but TASK_WAKING is still set, and C is
already visible to the user-space
An rt-thread X preempts P and calls ttwu() (say, it sends a signal to C)
ttwu() loops in task_rq_lock() "forever", because TASK_WAKING is still
set.
> > > The reason we have that TASK_WAKING stuff for fork is because
> > > wake_up_new_task() needs p->cpus_allowed to be stable
> >
> > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
> > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
> > a moment, but afaics that is all?
>
> My patch does that.
OK, that is what I meant. Now, why sched_fork() can't just set TASK_RUNNING ?
This way the "spurious" wakeup can do nothing with the new child, and we
do not have problems with cpuset which needs rq->lock for set_cpus_allowed().
> > > So the below patch makes select_task_rq_fair unlock the rq when needed,
> > > and then puts all ->select_task_rq() calls under rq->lock.
Yes, I thought about this too. I tried to preserve the current
"->select_task_rq() is called without rq->lock" logic.
> This should
> > > allow us to remove the TASK_WAKING thing from fork
Confused. Why can't we simply remove TASK_WAKING from fork without any
changes except in wake_up_new_task() ?
> which in turn allows
> > > us to remove the PF_STARTING check in task_is_waking.
Even if I do not think I understand sched.c above, I'd say we must do
this in any case ;)
> I was still looking at removing the TASK_WAKING check from
> task_rq_lock()
Peter, I can't apply your patch due to rejects (will try again later)
but at first glance, it makes TASK_WAKING unneeded? Since we do not
drop the lock after we set TASK_WAKING, why do we need this state at
all ?
Aha... select_task_rq_fair() can drop the lock, yes? Well, in this
case probably select_task_rq_fair() can set TASK_WAKING before unlock?
I like the current idea to call select_task_rq() without rq->lock, but
of course this is up to you.
However, once again, can't we make a simpler patch?
- remove PF_STARTING from task_waking()
- change sched_fork() to set RUNNING instead of WAKING
- change wake_up_new_task() to set WAKING under rq->lock
This looks simpler to me, and allows to drop rq->lock in ttwu() right
after it sets WAKING.
Another change which seems reasonable is to allow ttwu() to take rq->lock
even if WAKING is set, it can do nothing but check task->state in this case.
What do you think?
Oleg.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
2010-03-25 15:46 ` Oleg Nesterov
@ 2010-03-25 16:02 ` Oleg Nesterov
2010-03-25 16:10 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-25 16:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On 03/25, Oleg Nesterov wrote:
>
> I like the current idea to call select_task_rq() without rq->lock, but
> of course this is up to you.
>
> However, once again, can't we make a simpler patch?
>
> - remove PF_STARTING from task_waking()
>
> - change sched_fork() to set RUNNING instead of WAKING
>
> - change wake_up_new_task() to set WAKING under rq->lock
>
> This looks simpler to me, and allows to drop rq->lock in ttwu() right
> after it sets WAKING.
IOW, something like the (unchecked, uncompiled) patch below.
What do you think?
Oleg.
--- x/kernel/sched.c
+++ x/kernel/sched.c
@@ -912,7 +912,7 @@ static inline void finish_lock_switch(st
*/
static inline int task_is_waking(struct task_struct *p)
{
- return unlikely((p->state == TASK_WAKING) && !(p->flags & PF_STARTING));
+ return unlikely(p->state == TASK_WAKING);
}
/*
@@ -2568,11 +2568,10 @@ void sched_fork(struct task_struct *p, i
__sched_fork(p);
/*
- * We mark the process as waking here. This guarantees that
- * nobody will actually run it, and a signal or other external
- * event cannot wake it up and insert it on the runqueue either.
+ * We mark the process as running here. This guarantees that
+ * nobody will actually wake it up until wake_up_new_task().
*/
- p->state = TASK_WAKING;
+ p->state = TASK_RUNNING;
/*
* Revert to default priority/policy on fork if requested.
@@ -2638,15 +2637,18 @@ void wake_up_new_task(struct task_struct
struct rq *rq;
int cpu = get_cpu();
+ p->state = TASK_WAKING;
+ smp_mb();
+ raw_spin_unlock_wait(&rq->lock);
+
#ifdef CONFIG_SMP
/*
* Fork balancing, do it here and not earlier because:
* - cpus_allowed can change in the fork path
* - any previously selected cpu might disappear through hotplug
*
- * We still have TASK_WAKING but PF_STARTING is gone now, meaning
- * ->cpus_allowed is stable, we have preemption disabled, meaning
- * cpu_online_mask is stable.
+ * TASK_WAKING means ->cpus_allowed is stable, we have preemption
+ * disabled, meaning cpu_online_mask is stable.
*/
cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
set_task_cpu(p, cpu);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
2010-03-25 16:02 ` Oleg Nesterov
@ 2010-03-25 16:10 ` Oleg Nesterov
2010-03-25 17:29 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-25 16:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
Argh, sorry for noise...
On 03/25, Oleg Nesterov wrote:
>
> On 03/25, Oleg Nesterov wrote:
> >
> > I like the current idea to call select_task_rq() without rq->lock, but
> > of course this is up to you.
> >
> > However, once again, can't we make a simpler patch?
> >
> > - remove PF_STARTING from task_waking()
> >
> > - change sched_fork() to set RUNNING instead of WAKING
When I reread this thread, suddenly finally I noticed you mentioned
_twice_ your patch does this too ;) Not to mention the patch itself
which I misread. Sorry.
> IOW, something like the (unchecked, uncompiled) patch below.
Still, what do you think?
Oleg.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
2010-03-25 16:10 ` Oleg Nesterov
@ 2010-03-25 17:29 ` Peter Zijlstra
2010-03-25 19:15 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2010-03-25 17:29 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On Thu, 2010-03-25 at 17:10 +0100, Oleg Nesterov wrote:
> Argh, sorry for noise...
>
> On 03/25, Oleg Nesterov wrote:
> >
> > On 03/25, Oleg Nesterov wrote:
> > >
> > > I like the current idea to call select_task_rq() without rq->lock, but
> > > of course this is up to you.
> > >
> > > However, once again, can't we make a simpler patch?
> > >
> > > - remove PF_STARTING from task_waking()
> > >
> > > - change sched_fork() to set RUNNING instead of WAKING
>
> When I reread this thread, suddenly finally I noticed you mentioned
> _twice_ your patch does this too ;) Not to mention the patch itself
> which I misread. Sorry.
>
> > IOW, something like the (unchecked, uncompiled) patch below.
>
> Still, what do you think?
Yeah, such a smaller patch might work too, but I was trying to remove
some more of the complexity we grown.
Being able to fully remove that TASK_WAKING check from task_rq_lock()
and only have it in set_cpus_allowed_ptr() would reduce some fast-path
logic.
You patch add a memory barrier and an unlock_wait(), which, while
seemingly correct, are harder to parse than the modified locking.
(Ideally we'd protect ->cpus_allowed using a per-task lock, but adding
more atomics ops to ttwu() is to be avoided)
(Now if I could manage to remove that lock-drop for the cgroup muck we'd
be able to remove TASK_WAKING... but that looks like a long term goal)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
2010-03-25 17:29 ` Peter Zijlstra
@ 2010-03-25 19:15 ` Oleg Nesterov
0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2010-03-25 19:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On 03/25, Peter Zijlstra wrote:
>
> Yeah, such a smaller patch might work too, but I was trying to remove
> some more of the complexity we grown.
>
> Being able to fully remove that TASK_WAKING check from task_rq_lock()
> and only have it in set_cpus_allowed_ptr() would reduce some fast-path
> logic.
OK. Agreed.
> You patch add a memory barrier and an unlock_wait(), which, while
> seemingly correct, are harder to parse than the modified locking.
Yes, lock + set WAKING + unlock is simpler and cleaner, but this
doesn't matter.
I think your patch should address all problems.
Oleg.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-03-25 19:18 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-15 9:09 [PATCH 0/6] sched/cpusets fixes, more changes are needed Oleg Nesterov
2010-03-24 17:38 ` Peter Zijlstra
2010-03-24 18:09 ` Oleg Nesterov
2010-03-25 10:22 ` Peter Zijlstra
2010-03-25 15:46 ` Oleg Nesterov
2010-03-25 16:02 ` Oleg Nesterov
2010-03-25 16:10 ` Oleg Nesterov
2010-03-25 17:29 ` Peter Zijlstra
2010-03-25 19:15 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox