* rt scheduler may calculate wrong rt_time @ 2011-04-21 12:55 Thomas Giesel 2011-04-22 8:21 ` Mike Galbraith 0 siblings, 1 reply; 6+ messages in thread From: Thomas Giesel @ 2011-04-21 12:55 UTC (permalink / raw) To: linux-kernel Friends of the scheduler, I found that the current (well, at least 2.6.38) scheduler calculates a wrong rt_time for realtime tasks in certain situations. Example scenario: - HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other setups, but that's what I did) - a high priority rt task (A) gets packets from Ethernet about every 10 ms - a low priority rt task (B) unfortunately runs for a longer time (here: endlessly :) - no other tasks running (i.e. about 5 ms idle left per period) When the runtime of the realtime tasks is exceeded (e.g. by (B)), they are throttled. During this time idle is scheduled. When in idle, tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes update_rq_clock() _not_ to be called for a while. When a realtime task is woken up during this time (e.g. (A) by network traffic), update_rq_clock() is called from enqueue_task(). The task is not picked yet, because it is still throttled. After a while sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle will call schedule(). schedule() picks (A) which has been woken up a while ago. _pick_next_task_rt() sets exec_start to rq->clock_task. But this has been updated last time when the task was woken up, which could have been up to 5 ms ago in my example. So exec_start contains a time _before_ the task was actually started. As a result of this, rt_time is calculated too large which makes the rt tasks being throttled even earlier in the next period. This error may even increase from interval to interval, because the throttle-window (initially 5 ms) also increases. IMHO the best place to update clock_task would be to call a function from tick_nohz_restart_sched_tick(). But currently I don't see a suitable interface to the scheduler to do this. Currently I call update_rq_clock(rq) just before put_prev_task() in schedule(). This solves the issue and causes rt_runtime to be kept quite accurately. (Well, same result would be to remove "if (...)" in put_prev_task()) What do you think is the best way to solve this issue? Thomas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time 2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel @ 2011-04-22 8:21 ` Mike Galbraith 2011-04-22 20:52 ` Thomas Giesel 2011-04-27 17:51 ` Thomas Giesel 0 siblings, 2 replies; 6+ messages in thread From: Mike Galbraith @ 2011-04-22 8:21 UTC (permalink / raw) To: Thomas Giesel; +Cc: linux-kernel, Peter Zijlstra On Thu, 2011-04-21 at 14:55 +0200, Thomas Giesel wrote: > Friends of the scheduler, > > I found that the current (well, at least 2.6.38) scheduler calculates a > wrong rt_time for realtime tasks in certain situations. > > Example scenario: > - HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other > setups, but that's what I did) > - a high priority rt task (A) gets packets from Ethernet about every 10 > ms > - a low priority rt task (B) unfortunately runs for a longer time > (here: endlessly :) > - no other tasks running (i.e. about 5 ms idle left per period) > > When the runtime of the realtime tasks is exceeded (e.g. by (B)), they > are throttled. During this time idle is scheduled. When in idle, > tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes > update_rq_clock() _not_ to be called for a while. When a realtime task > is woken up during this time (e.g. (A) by network traffic), > update_rq_clock() is called from enqueue_task(). The task is not picked > yet, because it is still throttled. After a while > sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle > will call schedule(). > > schedule() picks (A) which has been woken up a while ago. > _pick_next_task_rt() sets exec_start to rq->clock_task. But this has > been updated last time when the task was woken up, which could have > been up to 5 ms ago in my example. So exec_start contains a time > _before_ the task was actually started. As a result of this, rt_time is > calculated too large which makes the rt tasks being throttled even > earlier in the next period. This error may even increase from interval > to interval, because the throttle-window (initially 5 ms) also > increases. > > IMHO the best place to update clock_task would be to call a function > from tick_nohz_restart_sched_tick(). But currently I don't see a > suitable interface to the scheduler to do this. Currently I call > update_rq_clock(rq) just before put_prev_task() in schedule(). This > solves the issue and causes rt_runtime to be kept quite accurately. > (Well, same result would be to remove "if (...)" in put_prev_task()) Hm. Does forcing a clock update if we're idle when we release the throttle do the trick? diff --git a/kernel/sched.c b/kernel/sched.c index 8cb0a57..97245ef 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -464,7 +464,7 @@ struct rq { u64 nohz_stamp; unsigned char nohz_balance_kick; #endif - unsigned int skip_clock_update; + int skip_clock_update; /* capture load from *all* tasks on this cpu: */ struct load_weight load; @@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *rq) { s64 delta; - if (rq->skip_clock_update) + if (rq->skip_clock_update > 0) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; @@ -4131,7 +4131,7 @@ static inline void schedule_debug(struct task_struct *prev) static void put_prev_task(struct rq *rq, struct task_struct *prev) { - if (prev->on_rq) + if (prev->on_rq || rq->skip_clock_update < 0) update_rq_clock(rq); prev->sched_class->put_prev_task(rq, prev); } diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 19ecb31..3276b94 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -572,8 +572,15 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) enqueue = 1; } - if (enqueue) + if (enqueue) { + /* + * Tag a forced clock update if we're coming out of idle + * so rq->clock_task will be updated when we schedule(). + */ + if (rq->curr == rq->idle) + rq->skip_clock_update = -1; sched_rt_rq_enqueue(rt_rq); + } raw_spin_unlock(&rq->lock); } ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time 2011-04-22 8:21 ` Mike Galbraith @ 2011-04-22 20:52 ` Thomas Giesel 2011-04-27 17:51 ` Thomas Giesel 1 sibling, 0 replies; 6+ messages in thread From: Thomas Giesel @ 2011-04-22 20:52 UTC (permalink / raw) To: Mike Galbraith; +Cc: linux-kernel, Peter Zijlstra Mike, > Hm. Does forcing a clock update if we're idle when we release the > throttle do the trick? Thanks. I'm on holidays currently. I'll check it next week when I'm back. But I think it should work in this way. Thomas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time 2011-04-22 8:21 ` Mike Galbraith 2011-04-22 20:52 ` Thomas Giesel @ 2011-04-27 17:51 ` Thomas Giesel 2011-04-29 6:36 ` [patch] " Mike Galbraith 1 sibling, 1 reply; 6+ messages in thread From: Thomas Giesel @ 2011-04-27 17:51 UTC (permalink / raw) To: Mike Galbraith; +Cc: linux-kernel, Peter Zijlstra > Hm. Does forcing a clock update if we're idle when we release the > throttle do the trick? It does. I tested it today and it works as expected. Even with ftrace I couldn't see any suspicious behaviour anymore. Mike: Can you send the patch to the right people to get it into the kernel or should I do it? Or is Peter the right one already? Thanks for your help. Thomas ^ permalink raw reply [flat|nested] 6+ messages in thread
* [patch] Re: rt scheduler may calculate wrong rt_time 2011-04-27 17:51 ` Thomas Giesel @ 2011-04-29 6:36 ` Mike Galbraith 2011-05-16 10:37 ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith 0 siblings, 1 reply; 6+ messages in thread From: Mike Galbraith @ 2011-04-29 6:36 UTC (permalink / raw) To: Thomas Giesel; +Cc: linux-kernel, Peter Zijlstra On Wed, 2011-04-27 at 19:51 +0200, Thomas Giesel wrote: > > Hm. Does forcing a clock update if we're idle when we release the > > throttle do the trick? > > It does. I tested it today and it works as expected. Even with ftrace I > couldn't see any suspicious behaviour anymore. > > Mike: Can you send the patch to the right people to get it into the > kernel or should I do it? Or is Peter the right one already? Peter is the right one. Below is an ever so slightly different version. sched, rt: update rq clock when unthrottling of an otherwise idle CPU If an RT task is awakened while it's rt_rq is throttled, the time between wakeup/enqueue and unthrottle/selection may be accounted as rt_time if the CPU is idle. Set rq->skip_clock_update negative upon throttle release to tell put_prev_task() that we need a clock update. Signed-off-by: Mike Galbraith <efault@gmx.de> Reported-by: Thomas Giesel <skoe@directbox.com> --- kernel/sched.c | 6 +++--- kernel/sched_rt.c | 7 +++++++ 2 files changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -464,7 +464,7 @@ struct rq { u64 nohz_stamp; unsigned char nohz_balance_kick; #endif - unsigned int skip_clock_update; + int skip_clock_update; /* capture load from *all* tasks on this cpu: */ struct load_weight load; @@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *r { s64 delta; - if (rq->skip_clock_update) + if (rq->skip_clock_update > 0) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; @@ -4125,7 +4125,7 @@ static inline void schedule_debug(struct static void put_prev_task(struct rq *rq, struct task_struct *prev) { - if (prev->on_rq) + if (prev->on_rq || rq->skip_clock_update < 0) update_rq_clock(rq); prev->sched_class->put_prev_task(rq, prev); } Index: linux-2.6/kernel/sched_rt.c =================================================================== --- linux-2.6.orig/kernel/sched_rt.c +++ linux-2.6/kernel/sched_rt.c @@ -562,6 +562,13 @@ static int do_sched_rt_period_timer(stru if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) { rt_rq->rt_throttled = 0; enqueue = 1; + + /* + * Force a clock update if the CPU was idle, + * lest wakeup -> unthrottle time accumulate. + */ + if (rt_rq->rt_nr_running && rq->curr == rq->idle) + rq->skip_clock_update = -1; } if (rt_rq->rt_time || rt_rq->rt_nr_running) idle = 0; ^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU 2011-04-29 6:36 ` [patch] " Mike Galbraith @ 2011-05-16 10:37 ` tip-bot for Mike Galbraith 0 siblings, 0 replies; 6+ messages in thread From: tip-bot for Mike Galbraith @ 2011-05-16 10:37 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, skoe, hpa, mingo, a.p.zijlstra, efault, tglx, mingo Commit-ID: 61eadef6a9bde9ea62fda724a9cb501ce9bc925a Gitweb: http://git.kernel.org/tip/61eadef6a9bde9ea62fda724a9cb501ce9bc925a Author: Mike Galbraith <efault@gmx.de> AuthorDate: Fri, 29 Apr 2011 08:36:50 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Mon, 16 May 2011 11:01:17 +0200 sched, rt: Update rq clock when unthrottling of an otherwise idle CPU If an RT task is awakened while it's rt_rq is throttled, the time between wakeup/enqueue and unthrottle/selection may be accounted as rt_time if the CPU is idle. Set rq->skip_clock_update negative upon throttle release to tell put_prev_task() that we need a clock update. Reported-by: Thomas Giesel <skoe@directbox.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1304059010.7472.1.camel@marge.simson.net Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 6 +++--- kernel/sched_rt.c | 7 +++++++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index f9778c0..b8b9a7d 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -466,7 +466,7 @@ struct rq { u64 nohz_stamp; unsigned char nohz_balance_kick; #endif - unsigned int skip_clock_update; + int skip_clock_update; /* capture load from *all* tasks on this cpu: */ struct load_weight load; @@ -652,7 +652,7 @@ static void update_rq_clock(struct rq *rq) { s64 delta; - if (rq->skip_clock_update) + if (rq->skip_clock_update > 0) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; @@ -4127,7 +4127,7 @@ static inline void schedule_debug(struct task_struct *prev) static void put_prev_task(struct rq *rq, struct task_struct *prev) { - if (prev->on_rq) + if (prev->on_rq || rq->skip_clock_update < 0) update_rq_clock(rq); prev->sched_class->put_prev_task(rq, prev); } diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 19ecb31..0943ed7 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -562,6 +562,13 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) { rt_rq->rt_throttled = 0; enqueue = 1; + + /* + * Force a clock update if the CPU was idle, + * lest wakeup -> unthrottle time accumulate. + */ + if (rt_rq->rt_nr_running && rq->curr == rq->idle) + rq->skip_clock_update = -1; } if (rt_rq->rt_time || rt_rq->rt_nr_running) idle = 0; ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-05-16 10:37 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel 2011-04-22 8:21 ` Mike Galbraith 2011-04-22 20:52 ` Thomas Giesel 2011-04-27 17:51 ` Thomas Giesel 2011-04-29 6:36 ` [patch] " Mike Galbraith 2011-05-16 10:37 ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.