* [PATCH 0/6] sched/cpusets fixes, more changes are needed @ 2010-03-15 9:09 Oleg Nesterov 2010-03-24 17:38 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2010-03-15 9:09 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel Ingo, Peter. Unless I missed something, with or without these patches the TASK_WAKING logic in do_fork() is very broken. - do_fork() clears PF_STARTING and then calls wake_up_new_task() which finally does s/WAKING/RUNNING. But. Nobody can take rq->lock in between. This means a signal from irq (quite possible with CLONE_THREAD) or another rt thread which preempts us can lockup. - the comment in wake_up_new_task says: We still have TASK_WAKING but PF_STARTING is gone now, meaning ->cpus_allowed is stable this is not true. Yes, nobody can take rq->lock _after_ we cleared PF_STARTING, but it is possible that another thread took this lock before and still holds it doing, say, sched_setaffinity(). No? If yes. I can make a patch, but the question is: what is the point to use TASK_WAKING in fork pathes? Can't sched_fork() set TASK_RUNNING instead? Afaics, TASK_RUNNING can equally protect from premature wakeups but doesn't these PF_STARTING complications. As for this series. Please review. I don't understand how it is possible to really test these changes. Dear cpuset developers! Please review ;) If you don't like 6/6, please make a better fix. I tried to make as "simple" patch as possible because I hardly understand cpuset.c, last time I quickly read it a long ago. Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed 2010-03-15 9:09 [PATCH 0/6] sched/cpusets fixes, more changes are needed Oleg Nesterov @ 2010-03-24 17:38 ` Peter Zijlstra 2010-03-24 18:09 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2010-03-24 17:38 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote: > Ingo, Peter. > > Unless I missed something, with or without these patches the TASK_WAKING > logic in do_fork() is very broken. > > - do_fork() clears PF_STARTING and then calls wake_up_new_task() > which finally does s/WAKING/RUNNING. > > But. Nobody can take rq->lock in between. This means a signal > from irq (quite possible with CLONE_THREAD) or another rt > thread which preempts us can lockup. Hmm, the signal case might indeed be a problem, however I cannot see how the RT thread can be a problem because until we do wake_up_new_task() the child will not be runnable and can thus not be preempted. We could frob it by taking rq->lock over clearing PF_STARTING but that's beyond ugly... > - the comment in wake_up_new_task says: > > We still have TASK_WAKING but PF_STARTING is gone now, meaning > ->cpus_allowed is stable > > this is not true. Yes, nobody can take rq->lock _after_ we cleared > PF_STARTING, but it is possible that another thread took this lock > before and still holds it doing, say, sched_setaffinity(). > > No? > > If yes. I can make a patch, but the question is: what is the point to use > TASK_WAKING in fork pathes? Can't sched_fork() set TASK_RUNNING instead? > Afaics, TASK_RUNNING can equally protect from premature wakeups but doesn't > these PF_STARTING complications. Argh, yes.. that's because PF_STARTING is cleared after we expose the PID, and we needed the PF_STARTING exemption because of that ns_cgroup_clone() trainwreck. The reason we have that TASK_WAKING stuff for fork is because wake_up_new_task() needs p->cpus_allowed to be stable, and we cannot do select_task_rq() with rq->lock held because of the cgroup-sched crap. /me goes read the code after applying your patches and frobs the following patch on top.. So the below patch makes select_task_rq_fair unlock the rq when needed, and then puts all ->select_task_rq() calls under rq->lock. This should allow us to remove the TASK_WAKING thing from fork which in turn allows us to remove the PF_STARTING check in task_is_waking. How does that look? (totally untested, will try and boot after dinner) --- Index: linux-2.6/include/linux/sched.h =================================================================== --- linux-2.6.orig/include/linux/sched.h +++ linux-2.6/include/linux/sched.h @@ -1051,7 +1051,8 @@ struct sched_class { void (*put_prev_task) (struct rq *rq, struct task_struct *p); #ifdef CONFIG_SMP - int (*select_task_rq)(struct task_struct *p, int sd_flag, int flags); + int (*select_task_rq)(struct rq *rq, struct task_struct *p, + int sd_flag, int flags); void (*pre_schedule) (struct rq *this_rq, struct task_struct *task); void (*post_schedule) (struct rq *this_rq); Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -916,14 +916,10 @@ static inline void finish_lock_switch(st /* * Check whether the task is waking, we use this to synchronize against * ttwu() so that task_cpu() reports a stable number. - * - * We need to make an exception for PF_STARTING tasks because the fork - * path might require task_rq_lock() to work, eg. it can call - * set_cpus_allowed_ptr() from the cpuset clone_ns code. */ static inline int task_is_waking(struct task_struct *p) { - return unlikely((p->state == TASK_WAKING) && !(p->flags & PF_STARTING)); + return unlikely(p->state == TASK_WAKING); } /* @@ -2319,9 +2315,9 @@ static int select_fallback_rq(int cpu, s * The caller (fork, wakeup) owns TASK_WAKING, ->cpus_allowed is stable. */ static inline -int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags) +int select_task_rq(struct rq *rq, struct task_struct *p, int sd_flags, int wake_flags) { - int cpu = p->sched_class->select_task_rq(p, sd_flags, wake_flags); + int cpu = p->sched_class->select_task_rq(rq, p, sd_flags, wake_flags); /* * In order not to call set_task_cpu() on a blocking task we need @@ -2392,9 +2388,7 @@ static int try_to_wake_up(struct task_st if (p->sched_class->task_waking) p->sched_class->task_waking(rq, p); - __task_rq_unlock(rq); - - cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags); + cpu = select_task_rq(rq, p, SD_BALANCE_WAKE, wake_flags); if (cpu != orig_cpu) { /* * Since we migrate the task without holding any rq->lock, @@ -2403,6 +2397,7 @@ static int try_to_wake_up(struct task_st */ set_task_cpu(p, cpu); } + __task_rq_unlock(rq); rq = cpu_rq(cpu); raw_spin_lock(&rq->lock); @@ -2533,7 +2528,7 @@ void sched_fork(struct task_struct *p, i * nobody will actually run it, and a signal or other external * event cannot wake it up and insert it on the runqueue either. */ - p->state = TASK_WAKING; + p->state = TASK_RUNNING; /* * Revert to default priority/policy on fork if requested. @@ -2600,28 +2595,25 @@ void wake_up_new_task(struct task_struct int cpu __maybe_unused = get_cpu(); #ifdef CONFIG_SMP + rq = task_rq_lock(p, &flags); + p->state = TASK_WAKING; + /* * Fork balancing, do it here and not earlier because: * - cpus_allowed can change in the fork path * - any previously selected cpu might disappear through hotplug * - * We still have TASK_WAKING but PF_STARTING is gone now, meaning - * ->cpus_allowed is stable, we have preemption disabled, meaning - * cpu_online_mask is stable. + * We set TASK_WAKING so that select_task_rq() can drop rq->lock + * without people poking at ->cpus_allowed. */ - cpu = select_task_rq(p, SD_BALANCE_FORK, 0); + cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0); set_task_cpu(p, cpu); -#endif - /* - * Since the task is not on the rq and we still have TASK_WAKING set - * nobody else will migrate this task. - */ - rq = cpu_rq(cpu); - raw_spin_lock_irqsave(&rq->lock, flags); - - BUG_ON(p->state != TASK_WAKING); p->state = TASK_RUNNING; + task_rq_unlock(rq, &flags); +#endif + + rq = task_rq_lock(p, &flags); activate_task(rq, p, 0); trace_sched_wakeup_new(rq, p, 1); check_preempt_curr(rq, p, WF_FORK); @@ -3067,19 +3059,15 @@ void sched_exec(void) { struct task_struct *p = current; struct migration_req req; - int dest_cpu, this_cpu; unsigned long flags; struct rq *rq; - - this_cpu = get_cpu(); - dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0); - if (dest_cpu == this_cpu) { - put_cpu(); - return; - } + int dest_cpu; rq = task_rq_lock(p, &flags); - put_cpu(); + dest_cpu = p->sched_class->select_task_rq(rq, p, SD_BALANCE_EXEC, 0); + if (dest_cpu == smp_processor_id()) + goto unlock; + /* * select_task_rq() can race against ->cpus_allowed */ @@ -3097,6 +3085,7 @@ void sched_exec(void) return; } +unlock: task_rq_unlock(rq, &flags); } Index: linux-2.6/kernel/sched_fair.c =================================================================== --- linux-2.6.orig/kernel/sched_fair.c +++ linux-2.6/kernel/sched_fair.c @@ -1414,7 +1414,8 @@ select_idle_sibling(struct task_struct * * * preempt must be disabled. */ -static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) +static int +select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_flags) { struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL; int cpu = smp_processor_id(); @@ -1512,8 +1513,11 @@ static int select_task_rq_fair(struct ta cpumask_weight(sched_domain_span(sd)))) tmp = affine_sd; - if (tmp) + if (tmp) { + raw_spin_unlock(&rq->lock); update_shares(tmp); + raw_spin_lock(&rq->lock); + } } #endif Index: linux-2.6/kernel/sched_idletask.c =================================================================== --- linux-2.6.orig/kernel/sched_idletask.c +++ linux-2.6/kernel/sched_idletask.c @@ -6,7 +6,8 @@ */ #ifdef CONFIG_SMP -static int select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) +static int +select_task_rq_idle(struct rq *rq, struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } Index: linux-2.6/kernel/sched_rt.c =================================================================== --- linux-2.6.orig/kernel/sched_rt.c +++ linux-2.6/kernel/sched_rt.c @@ -948,10 +948,9 @@ static void yield_task_rt(struct rq *rq) #ifdef CONFIG_SMP static int find_lowest_rq(struct task_struct *task); -static int select_task_rq_rt(struct task_struct *p, int sd_flag, int flags) +static int +select_task_rq_rt(struct rq *rq, struct task_struct *p, int sd_flag, int flags) { - struct rq *rq = task_rq(p); - if (sd_flag != SD_BALANCE_WAKE) return smp_processor_id(); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed 2010-03-24 17:38 ` Peter Zijlstra @ 2010-03-24 18:09 ` Oleg Nesterov 2010-03-25 10:22 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2010-03-24 18:09 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel On 03/24, Peter Zijlstra wrote: > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote: > > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task() > > which finally does s/WAKING/RUNNING. > > > > But. Nobody can take rq->lock in between. This means a signal > > from irq (quite possible with CLONE_THREAD) or another rt > > thread which preempts us can lockup. > > Hmm, the signal case might indeed be a problem, however I cannot see how > the RT thread can be a problem because until we do wake_up_new_task() > the child will not be runnable and can thus not be preempted. Indeed, but I meant the _parent_ can be preempted ;) In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can be called from any context, it should be irq-safe: any owner must disable inerrupts and preemption. > The reason we have that TASK_WAKING stuff for fork is because > wake_up_new_task() needs p->cpus_allowed to be stable Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for a moment, but afaics that is all? > So the below patch makes select_task_rq_fair unlock the rq when needed, > and then puts all ->select_task_rq() calls under rq->lock. This should > allow us to remove the TASK_WAKING thing from fork which in turn allows > us to remove the PF_STARTING check in task_is_waking. > > How does that look? I'll try to read this patch tomorrow. But could you please consider the suggestion above? Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed 2010-03-24 18:09 ` Oleg Nesterov @ 2010-03-25 10:22 ` Peter Zijlstra 2010-03-25 15:46 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2010-03-25 10:22 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote: > On 03/24, Peter Zijlstra wrote: > > > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote: > > > > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task() > > > which finally does s/WAKING/RUNNING. > > > > > > But. Nobody can take rq->lock in between. This means a signal > > > from irq (quite possible with CLONE_THREAD) or another rt > > > thread which preempts us can lockup. > > > > Hmm, the signal case might indeed be a problem, however I cannot see how > > the RT thread can be a problem because until we do wake_up_new_task() > > the child will not be runnable and can thus not be preempted. > > Indeed, but I meant the _parent_ can be preempted ;) I still can't see how that would be a problem.. > In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can > be called from any context, it should be irq-safe: any owner must disable > inerrupts and preemption. Agreed, and I think that's corrected with my patch. > > The reason we have that TASK_WAKING stuff for fork is because > > wake_up_new_task() needs p->cpus_allowed to be stable > > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for > a moment, but afaics that is all? My patch does that. > > So the below patch makes select_task_rq_fair unlock the rq when needed, > > and then puts all ->select_task_rq() calls under rq->lock. This should > > allow us to remove the TASK_WAKING thing from fork which in turn allows > > us to remove the PF_STARTING check in task_is_waking. > > > > How does that look? > > I'll try to read this patch tomorrow. But could you please consider > the suggestion above? I think I do all those :-) I was still looking at removing the TASK_WAKING check from task_rq_lock() since now we do set_task_cpu() under rq->lock again so it should be good again. Hmm, except for sched_fork() that still does set_task_cpu() without holding rq->lock, but that is before the child gets exposed so there should not be any concurrency. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed 2010-03-25 10:22 ` Peter Zijlstra @ 2010-03-25 15:46 ` Oleg Nesterov 2010-03-25 16:02 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2010-03-25 15:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel On 03/25, Peter Zijlstra wrote: > > On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote: > > On 03/24, Peter Zijlstra wrote: > > > > > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote: > > > > > > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task() > > > > which finally does s/WAKING/RUNNING. > > > > > > > > But. Nobody can take rq->lock in between. This means a signal > > > > from irq (quite possible with CLONE_THREAD) or another rt > > > > thread which preempts us can lockup. > > > > > > Hmm, the signal case might indeed be a problem, however I cannot see how > > > the RT thread can be a problem because until we do wake_up_new_task() > > > the child will not be runnable and can thus not be preempted. > > > > Indeed, but I meant the _parent_ can be preempted ;) > > I still can't see how that would be a problem.. The parent P does do_fork(), copy_process returns the new child C with TASK_WAKING at PF_STARTING set. do_fork() clears PF_STARTING, but TASK_WAKING is still set, and C is already visible to the user-space An rt-thread X preempts P and calls ttwu() (say, it sends a signal to C) ttwu() loops in task_rq_lock() "forever", because TASK_WAKING is still set. > > > The reason we have that TASK_WAKING stuff for fork is because > > > wake_up_new_task() needs p->cpus_allowed to be stable > > > > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING > > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for > > a moment, but afaics that is all? > > My patch does that. OK, that is what I meant. Now, why sched_fork() can't just set TASK_RUNNING ? This way the "spurious" wakeup can do nothing with the new child, and we do not have problems with cpuset which needs rq->lock for set_cpus_allowed(). > > > So the below patch makes select_task_rq_fair unlock the rq when needed, > > > and then puts all ->select_task_rq() calls under rq->lock. Yes, I thought about this too. I tried to preserve the current "->select_task_rq() is called without rq->lock" logic. > This should > > > allow us to remove the TASK_WAKING thing from fork Confused. Why can't we simply remove TASK_WAKING from fork without any changes except in wake_up_new_task() ? > which in turn allows > > > us to remove the PF_STARTING check in task_is_waking. Even if I do not think I understand sched.c above, I'd say we must do this in any case ;) > I was still looking at removing the TASK_WAKING check from > task_rq_lock() Peter, I can't apply your patch due to rejects (will try again later) but at first glance, it makes TASK_WAKING unneeded? Since we do not drop the lock after we set TASK_WAKING, why do we need this state at all ? Aha... select_task_rq_fair() can drop the lock, yes? Well, in this case probably select_task_rq_fair() can set TASK_WAKING before unlock? I like the current idea to call select_task_rq() without rq->lock, but of course this is up to you. However, once again, can't we make a simpler patch? - remove PF_STARTING from task_waking() - change sched_fork() to set RUNNING instead of WAKING - change wake_up_new_task() to set WAKING under rq->lock This looks simpler to me, and allows to drop rq->lock in ttwu() right after it sets WAKING. Another change which seems reasonable is to allow ttwu() to take rq->lock even if WAKING is set, it can do nothing but check task->state in this case. What do you think? Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed 2010-03-25 15:46 ` Oleg Nesterov @ 2010-03-25 16:02 ` Oleg Nesterov 2010-03-25 16:10 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2010-03-25 16:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel On 03/25, Oleg Nesterov wrote: > > I like the current idea to call select_task_rq() without rq->lock, but > of course this is up to you. > > However, once again, can't we make a simpler patch? > > - remove PF_STARTING from task_waking() > > - change sched_fork() to set RUNNING instead of WAKING > > - change wake_up_new_task() to set WAKING under rq->lock > > This looks simpler to me, and allows to drop rq->lock in ttwu() right > after it sets WAKING. IOW, something like the (unchecked, uncompiled) patch below. What do you think? Oleg. --- x/kernel/sched.c +++ x/kernel/sched.c @@ -912,7 +912,7 @@ static inline void finish_lock_switch(st */ static inline int task_is_waking(struct task_struct *p) { - return unlikely((p->state == TASK_WAKING) && !(p->flags & PF_STARTING)); + return unlikely(p->state == TASK_WAKING); } /* @@ -2568,11 +2568,10 @@ void sched_fork(struct task_struct *p, i __sched_fork(p); /* - * We mark the process as waking here. This guarantees that - * nobody will actually run it, and a signal or other external - * event cannot wake it up and insert it on the runqueue either. + * We mark the process as running here. This guarantees that + * nobody will actually wake it up until wake_up_new_task(). */ - p->state = TASK_WAKING; + p->state = TASK_RUNNING; /* * Revert to default priority/policy on fork if requested. @@ -2638,15 +2637,18 @@ void wake_up_new_task(struct task_struct struct rq *rq; int cpu = get_cpu(); + p->state = TASK_WAKING; + smp_mb(); + raw_spin_unlock_wait(&rq->lock); + #ifdef CONFIG_SMP /* * Fork balancing, do it here and not earlier because: * - cpus_allowed can change in the fork path * - any previously selected cpu might disappear through hotplug * - * We still have TASK_WAKING but PF_STARTING is gone now, meaning - * ->cpus_allowed is stable, we have preemption disabled, meaning - * cpu_online_mask is stable. + * TASK_WAKING means ->cpus_allowed is stable, we have preemption + * disabled, meaning cpu_online_mask is stable. */ cpu = select_task_rq(p, SD_BALANCE_FORK, 0); set_task_cpu(p, cpu); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed 2010-03-25 16:02 ` Oleg Nesterov @ 2010-03-25 16:10 ` Oleg Nesterov 2010-03-25 17:29 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2010-03-25 16:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel Argh, sorry for noise... On 03/25, Oleg Nesterov wrote: > > On 03/25, Oleg Nesterov wrote: > > > > I like the current idea to call select_task_rq() without rq->lock, but > > of course this is up to you. > > > > However, once again, can't we make a simpler patch? > > > > - remove PF_STARTING from task_waking() > > > > - change sched_fork() to set RUNNING instead of WAKING When I reread this thread, suddenly finally I noticed you mentioned _twice_ your patch does this too ;) Not to mention the patch itself which I misread. Sorry. > IOW, something like the (unchecked, uncompiled) patch below. Still, what do you think? Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed 2010-03-25 16:10 ` Oleg Nesterov @ 2010-03-25 17:29 ` Peter Zijlstra 2010-03-25 19:15 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2010-03-25 17:29 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel On Thu, 2010-03-25 at 17:10 +0100, Oleg Nesterov wrote: > Argh, sorry for noise... > > On 03/25, Oleg Nesterov wrote: > > > > On 03/25, Oleg Nesterov wrote: > > > > > > I like the current idea to call select_task_rq() without rq->lock, but > > > of course this is up to you. > > > > > > However, once again, can't we make a simpler patch? > > > > > > - remove PF_STARTING from task_waking() > > > > > > - change sched_fork() to set RUNNING instead of WAKING > > When I reread this thread, suddenly finally I noticed you mentioned > _twice_ your patch does this too ;) Not to mention the patch itself > which I misread. Sorry. > > > IOW, something like the (unchecked, uncompiled) patch below. > > Still, what do you think? Yeah, such a smaller patch might work too, but I was trying to remove some more of the complexity we grown. Being able to fully remove that TASK_WAKING check from task_rq_lock() and only have it in set_cpus_allowed_ptr() would reduce some fast-path logic. You patch add a memory barrier and an unlock_wait(), which, while seemingly correct, are harder to parse than the modified locking. (Ideally we'd protect ->cpus_allowed using a per-task lock, but adding more atomics ops to ttwu() is to be avoided) (Now if I could manage to remove that lock-drop for the cgroup muck we'd be able to remove TASK_WAKING... but that looks like a long term goal) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed 2010-03-25 17:29 ` Peter Zijlstra @ 2010-03-25 19:15 ` Oleg Nesterov 0 siblings, 0 replies; 9+ messages in thread From: Oleg Nesterov @ 2010-03-25 19:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel On 03/25, Peter Zijlstra wrote: > > Yeah, such a smaller patch might work too, but I was trying to remove > some more of the complexity we grown. > > Being able to fully remove that TASK_WAKING check from task_rq_lock() > and only have it in set_cpus_allowed_ptr() would reduce some fast-path > logic. OK. Agreed. > You patch add a memory barrier and an unlock_wait(), which, while > seemingly correct, are harder to parse than the modified locking. Yes, lock + set WAKING + unlock is simpler and cleaner, but this doesn't matter. I think your patch should address all problems. Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-03-25 19:18 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-15 9:09 [PATCH 0/6] sched/cpusets fixes, more changes are needed Oleg Nesterov 2010-03-24 17:38 ` Peter Zijlstra 2010-03-24 18:09 ` Oleg Nesterov 2010-03-25 10:22 ` Peter Zijlstra 2010-03-25 15:46 ` Oleg Nesterov 2010-03-25 16:02 ` Oleg Nesterov 2010-03-25 16:10 ` Oleg Nesterov 2010-03-25 17:29 ` Peter Zijlstra 2010-03-25 19:15 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox