* [RFC PATCH] sched: make sure sched-priority after invoke idle_balance() @ 2014-02-14 4:54 Michael wang 2014-02-14 12:38 ` Peter Zijlstra 0 siblings, 1 reply; 5+ messages in thread From: Michael wang @ 2014-02-14 4:54 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar; +Cc: LKML Since idle_balance() will release rq-lock for a while, there is a chance that RT/DL tasks will be enqueued and ask for the resched, the func used to be invoked ahead of pick_next_task(), which will make sure we drop into the bottom-half inside pick_next_task(). Now since idle_balance() was done inside pick_next_task_fair(), pick_next_task() can no longer make sure the priority, the worst case is that we will going to pick the pulled fair task while there is RT/DL on rq which actually should be picked up. This patch will prevent this happen by some rechecking after idle_balance(), it utilize the resched-flag for the case when RT/DL task was enqueued but don't ask for resched (will that ever happened?). CC: Ingo Molnar <mingo@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> --- kernel/sched/fair.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 235cfa7..ce67514 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4776,6 +4776,16 @@ simple: idle: #ifdef CONFIG_SMP + /* + * We came here only when there is no more tasks on rq (top-half of + * pick_next_task()), and we are now going to pull some fair entities. + * + * Since prev is still the current on rq, clear it's resched-flag so + * we would be able to know when we got a new resched-request during + * idle_balance(), check below for more details. + */ + clear_tsk_need_resched(prev); + idle_enter_fair(rq); /* * We must set idle_stamp _before_ calling idle_balance(), such that we @@ -4784,7 +4794,18 @@ idle: rq->idle_stamp = rq_clock(rq); if (idle_balance(rq)) { /* drops rq->lock */ rq->idle_stamp = 0; - goto again; + /* + * Before we start to pick one of the pulled fair entities, take + * care if some RT/DL tasks has been enqueued during the time + * we release rq-lock inside idle_balance(). + * + * In such cases, since clear_tsk_need_resched() was done + * already, need_resched() will imply the request to sched-in + * the enqueued RT/DL tasks, so don't 'goto again' to make sure + * the priority. + */ + if (rq->nr_running == rq->cfs.h_nr_running || !need_resched()) + goto again; } #endif -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] sched: make sure sched-priority after invoke idle_balance() 2014-02-14 4:54 [RFC PATCH] sched: make sure sched-priority after invoke idle_balance() Michael wang @ 2014-02-14 12:38 ` Peter Zijlstra 2014-02-17 3:31 ` Michael wang 0 siblings, 1 reply; 5+ messages in thread From: Peter Zijlstra @ 2014-02-14 12:38 UTC (permalink / raw) To: Michael wang; +Cc: Ingo Molnar, LKML, Steven Rostedt, Juri Lelli On Fri, Feb 14, 2014 at 12:54:37PM +0800, Michael wang wrote: > Since idle_balance() will release rq-lock for a while, there is a chance that > RT/DL tasks will be enqueued and ask for the resched, the func used to be > invoked ahead of pick_next_task(), which will make sure we drop into the > bottom-half inside pick_next_task(). > > Now since idle_balance() was done inside pick_next_task_fair(), pick_next_task() > can no longer make sure the priority, the worst case is that we will going to > pick the pulled fair task while there is RT/DL on rq which actually should be > picked up. > > This patch will prevent this happen by some rechecking after idle_balance(), it > utilize the resched-flag for the case when RT/DL task was enqueued but don't ask > for resched (will that ever happened?). I'm not sure this is actually working right; the problem is that while we do retry on need_resched() in the main schedule() loop, that last need_resched() is on @next (then current). So clearing/resetting @prev's need_resched() is not going to trigger that loop. Not to mention we explicitly clear @prev's need_resched right after pick_next_task(). So how about something like this? I don't particularly like adding that condition to pick_next_task(); but the alternative is recursively calling pick_next_task() and while recursion is strictly limited to the number of sched_classes, it does feel kinda icky. Anybody got any preferences? --- Subject: sched: Guarantee task priority in pick_next_task() From: Peter Zijlstra <peterz@infradead.org> Date: Fri Feb 14 12:25:08 CET 2014 Michael spotted that the idle_balance() push down created a task priority problem. Previously, when we called idle_balance() before pick_next_task() it wasn't a problem when -- because of the rq->lock droppage -- an rt/dl task slipped in. Similarly for pre_schedule(), rt pre-schedule could have a dl task slip in. But by pulling it into the pick_next_task() loop, we'll not try a higher task priority again. Cure this by creating a re-start condition in pick_next_task(); and triggering this from pick_next_task_{rt,fair}(). Fixes: 38033c37faab ("sched: Push down pre_schedule() and idle_balance()") Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reported-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-jrdk7auga87duk4lkpo8xusk@git.kernel.org --- kernel/sched/core.c | 19 +++++++++++++++---- kernel/sched/fair.c | 16 +++++++++++++++- kernel/sched/rt.c | 10 +++++++++- 3 files changed, 39 insertions(+), 6 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2574,27 +2574,38 @@ static inline void schedule_debug(struct static inline struct task_struct * pick_next_task(struct rq *rq, struct task_struct *prev) { - const struct sched_class *class; + const struct sched_class *class = &fair_sched_class; struct task_struct *p; /* * Optimization: we know that if all tasks are in * the fair class we can call that function directly: */ - if (likely(prev->sched_class == &fair_sched_class && + if (likely(prev->sched_class == class && rq->nr_running == rq->cfs.h_nr_running)) { p = fair_sched_class.pick_next_task(rq, prev); if (likely(p)) - return p; + goto got_task; } +again: for_each_class(class) { p = class->pick_next_task(rq, prev); if (p) - return p; + goto got_task; } BUG(); /* the idle class will always have a runnable task */ + +got_task: + /* + * See pick_next_task_{fair,rt}(); they return rq->idle in case + * they want to re-start the task selection. + */ + if (unlikely(p->sched_class != class)) + goto again; + + return p; } /* --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4684,6 +4684,7 @@ pick_next_task_fair(struct rq *rq, struc struct cfs_rq *cfs_rq = &rq->cfs; struct sched_entity *se; struct task_struct *p; + int new_tasks; again: #ifdef CONFIG_FAIR_GROUP_SCHED @@ -4782,7 +4783,20 @@ pick_next_task_fair(struct rq *rq, struc return p; idle: - if (idle_balance(rq)) /* drops rq->lock */ + /* + * Because idle_balance() releases (and re-acquires) rq->lock, it is + * possible for any higher priority task to appear. In that case we + * must re-start the pick_next_entity() loop. + */ + new_tasks = idle_balance(rq); + + /* + * See pick_next_task(); we return rq->idle to restart task selection. + */ + if (rq->nr_running != rq->cfs.h_nr_running) + return rq->idle; + + if (new_tasks) goto again; return NULL; --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1360,8 +1360,16 @@ pick_next_task_rt(struct rq *rq, struct struct task_struct *p; struct rt_rq *rt_rq = &rq->rt; - if (need_pull_rt_task(rq, prev)) + if (need_pull_rt_task(rq, prev)) { pull_rt_task(rq); + /* + * pull_rt_task() can drop (and re-acquire) rq->lock; this + * means a dl task can slip in, in which case we need to + * re-start task selection. + */ + if (unlikely(rq->dl.dl_nr_running)) + return rq->idle; + } if (!rt_rq->rt_nr_running) return NULL; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] sched: make sure sched-priority after invoke idle_balance() 2014-02-14 12:38 ` Peter Zijlstra @ 2014-02-17 3:31 ` Michael wang 2014-02-17 11:24 ` Peter Zijlstra 0 siblings, 1 reply; 5+ messages in thread From: Michael wang @ 2014-02-17 3:31 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, Steven Rostedt, Juri Lelli On 02/14/2014 08:38 PM, Peter Zijlstra wrote: [snip] >> >> This patch will prevent this happen by some rechecking after idle_balance(), it >> utilize the resched-flag for the case when RT/DL task was enqueued but don't ask >> for resched (will that ever happened?). > > I'm not sure this is actually working right; the problem is that while > we do retry on need_resched() in the main schedule() loop, that last > need_resched() is on @next (then current). So clearing/resetting @prev's > need_resched() is not going to trigger that loop. > > Not to mention we explicitly clear @prev's need_resched right after > pick_next_task(). Actually it's not aim at that timing, but consider about the RT case, it won't work as expected anyway... > > So how about something like this? > > I don't particularly like adding that condition to pick_next_task(); but > the alternative is recursively calling pick_next_task() and while > recursion is strictly limited to the number of sched_classes, it does > feel kinda icky. Yeah...but it works, the RT stuff is inside the loop and really hard to be handled... > > Anybody got any preferences? > > --- > Subject: sched: Guarantee task priority in pick_next_task() [snip] > pick_next_task(struct rq *rq, struct task_struct *prev) > { > - const struct sched_class *class; > + const struct sched_class *class = &fair_sched_class; > struct task_struct *p; > > /* > * Optimization: we know that if all tasks are in > * the fair class we can call that function directly: > */ > - if (likely(prev->sched_class == &fair_sched_class && > + if (likely(prev->sched_class == class && > rq->nr_running == rq->cfs.h_nr_running)) { > p = fair_sched_class.pick_next_task(rq, prev); > if (likely(p)) > - return p; > + goto got_task; Since idle_balance() won't happen in the loop, may be we could use: if p && p->sched_class == class return p in here, let it fall down into the loop if p is idle, since that means we got RT/DL and will do this anyway, could save two jump work may be? (and may could combine some code below if so?) Regards, Michael Wang > } > > +again: > for_each_class(class) { > p = class->pick_next_task(rq, prev); > if (p) > - return p; > + goto got_task; > } > > BUG(); /* the idle class will always have a runnable task */ > + > +got_task: > + /* > + * See pick_next_task_{fair,rt}(); they return rq->idle in case > + * they want to re-start the task selection. > + */ > + if (unlikely(p->sched_class != class)) > + goto again; > + > + return p; > } > > /* > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4684,6 +4684,7 @@ pick_next_task_fair(struct rq *rq, struc > struct cfs_rq *cfs_rq = &rq->cfs; > struct sched_entity *se; > struct task_struct *p; > + int new_tasks; > > again: > #ifdef CONFIG_FAIR_GROUP_SCHED > @@ -4782,7 +4783,20 @@ pick_next_task_fair(struct rq *rq, struc > return p; > > idle: > - if (idle_balance(rq)) /* drops rq->lock */ > + /* > + * Because idle_balance() releases (and re-acquires) rq->lock, it is > + * possible for any higher priority task to appear. In that case we > + * must re-start the pick_next_entity() loop. > + */ > + new_tasks = idle_balance(rq); > + > + /* > + * See pick_next_task(); we return rq->idle to restart task selection. > + */ > + if (rq->nr_running != rq->cfs.h_nr_running) > + return rq->idle; > + > + if (new_tasks) > goto again; > > return NULL; > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -1360,8 +1360,16 @@ pick_next_task_rt(struct rq *rq, struct > struct task_struct *p; > struct rt_rq *rt_rq = &rq->rt; > > - if (need_pull_rt_task(rq, prev)) > + if (need_pull_rt_task(rq, prev)) { > pull_rt_task(rq); > + /* > + * pull_rt_task() can drop (and re-acquire) rq->lock; this > + * means a dl task can slip in, in which case we need to > + * re-start task selection. > + */ > + if (unlikely(rq->dl.dl_nr_running)) > + return rq->idle; > + } > > if (!rt_rq->rt_nr_running) > return NULL; > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] sched: make sure sched-priority after invoke idle_balance() 2014-02-17 3:31 ` Michael wang @ 2014-02-17 11:24 ` Peter Zijlstra 2014-02-18 2:42 ` Michael wang 0 siblings, 1 reply; 5+ messages in thread From: Peter Zijlstra @ 2014-02-17 11:24 UTC (permalink / raw) To: Michael wang; +Cc: Ingo Molnar, LKML, Steven Rostedt, Juri Lelli On Mon, Feb 17, 2014 at 11:31:16AM +0800, Michael wang wrote: > > pick_next_task(struct rq *rq, struct task_struct *prev) > > { > > - const struct sched_class *class; > > + const struct sched_class *class = &fair_sched_class; > > struct task_struct *p; > > > > /* > > * Optimization: we know that if all tasks are in > > * the fair class we can call that function directly: > > */ > > - if (likely(prev->sched_class == &fair_sched_class && > > + if (likely(prev->sched_class == class && > > rq->nr_running == rq->cfs.h_nr_running)) { > > p = fair_sched_class.pick_next_task(rq, prev); > > if (likely(p)) > > - return p; > > + goto got_task; > > Since idle_balance() won't happen in the loop, may be we could use: > > if p && p->sched_class == class > return p > > in here, let it fall down into the loop if p is idle, since that means > we got RT/DL and will do this anyway, could save two jump work may be? > (and may could combine some code below if so?) Maybe; we'd have to look at whatever GCC does with it. But yes I think I like the code better that way. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] sched: make sure sched-priority after invoke idle_balance() 2014-02-17 11:24 ` Peter Zijlstra @ 2014-02-18 2:42 ` Michael wang 0 siblings, 0 replies; 5+ messages in thread From: Michael wang @ 2014-02-18 2:42 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, Steven Rostedt, Juri Lelli On 02/17/2014 07:24 PM, Peter Zijlstra wrote: [snip] >> Since idle_balance() won't happen in the loop, may be we could use: >> >> if p && p->sched_class == class >> return p >> >> in here, let it fall down into the loop if p is idle, since that means >> we got RT/DL and will do this anyway, could save two jump work may be? >> (and may could combine some code below if so?) > > Maybe; we'd have to look at whatever GCC does with it. Exactly, alien code appear when in binary... Regards, Michael Wang > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-02-18 2:43 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-14 4:54 [RFC PATCH] sched: make sure sched-priority after invoke idle_balance() Michael wang 2014-02-14 12:38 ` Peter Zijlstra 2014-02-17 3:31 ` Michael wang 2014-02-17 11:24 ` Peter Zijlstra 2014-02-18 2:42 ` Michael wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox