* rt scheduler may calculate wrong rt_time @ 2011-04-21 12:55 Thomas Giesel 2011-04-22 8:21 ` Mike Galbraith 0 siblings, 1 reply; 6+ messages in thread From: Thomas Giesel @ 2011-04-21 12:55 UTC (permalink / raw) To: linux-kernel Friends of the scheduler, I found that the current (well, at least 2.6.38) scheduler calculates a wrong rt_time for realtime tasks in certain situations. Example scenario: - HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other setups, but that's what I did) - a high priority rt task (A) gets packets from Ethernet about every 10 ms - a low priority rt task (B) unfortunately runs for a longer time (here: endlessly :) - no other tasks running (i.e. about 5 ms idle left per period) When the runtime of the realtime tasks is exceeded (e.g. by (B)), they are throttled. During this time idle is scheduled. When in idle, tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes update_rq_clock() _not_ to be called for a while. When a realtime task is woken up during this time (e.g. (A) by network traffic), update_rq_clock() is called from enqueue_task(). The task is not picked yet, because it is still throttled. After a while sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle will call schedule(). schedule() picks (A) which has been woken up a while ago. _pick_next_task_rt() sets exec_start to rq->clock_task. But this has been updated last time when the task was woken up, which could have been up to 5 ms ago in my example. So exec_start contains a time _before_ the task was actually started. As a result of this, rt_time is calculated too large which makes the rt tasks being throttled even earlier in the next period. This error may even increase from interval to interval, because the throttle-window (initially 5 ms) also increases. IMHO the best place to update clock_task would be to call a function from tick_nohz_restart_sched_tick(). But currently I don't see a suitable interface to the scheduler to do this. Currently I call update_rq_clock(rq) just before put_prev_task() in schedule(). This solves the issue and causes rt_runtime to be kept quite accurately. (Well, same result would be to remove "if (...)" in put_prev_task()) What do you think is the best way to solve this issue? Thomas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time 2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel @ 2011-04-22 8:21 ` Mike Galbraith 2011-04-22 20:52 ` Thomas Giesel 2011-04-27 17:51 ` Thomas Giesel 0 siblings, 2 replies; 6+ messages in thread From: Mike Galbraith @ 2011-04-22 8:21 UTC (permalink / raw) To: Thomas Giesel; +Cc: linux-kernel, Peter Zijlstra On Thu, 2011-04-21 at 14:55 +0200, Thomas Giesel wrote: > Friends of the scheduler, > > I found that the current (well, at least 2.6.38) scheduler calculates a > wrong rt_time for realtime tasks in certain situations. > > Example scenario: > - HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other > setups, but that's what I did) > - a high priority rt task (A) gets packets from Ethernet about every 10 > ms > - a low priority rt task (B) unfortunately runs for a longer time > (here: endlessly :) > - no other tasks running (i.e. about 5 ms idle left per period) > > When the runtime of the realtime tasks is exceeded (e.g. by (B)), they > are throttled. During this time idle is scheduled. When in idle, > tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes > update_rq_clock() _not_ to be called for a while. When a realtime task > is woken up during this time (e.g. (A) by network traffic), > update_rq_clock() is called from enqueue_task(). The task is not picked > yet, because it is still throttled. After a while > sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle > will call schedule(). > > schedule() picks (A) which has been woken up a while ago. > _pick_next_task_rt() sets exec_start to rq->clock_task. But this has > been updated last time when the task was woken up, which could have > been up to 5 ms ago in my example. So exec_start contains a time > _before_ the task was actually started. As a result of this, rt_time is > calculated too large which makes the rt tasks being throttled even > earlier in the next period. This error may even increase from interval > to interval, because the throttle-window (initially 5 ms) also > increases. > > IMHO the best place to update clock_task would be to call a function > from tick_nohz_restart_sched_tick(). But currently I don't see a > suitable interface to the scheduler to do this. Currently I call > update_rq_clock(rq) just before put_prev_task() in schedule(). This > solves the issue and causes rt_runtime to be kept quite accurately. > (Well, same result would be to remove "if (...)" in put_prev_task()) Hm. Does forcing a clock update if we're idle when we release the throttle do the trick? diff --git a/kernel/sched.c b/kernel/sched.c index 8cb0a57..97245ef 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -464,7 +464,7 @@ struct rq { u64 nohz_stamp; unsigned char nohz_balance_kick; #endif - unsigned int skip_clock_update; + int skip_clock_update; /* capture load from *all* tasks on this cpu: */ struct load_weight load; @@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *rq) { s64 delta; - if (rq->skip_clock_update) + if (rq->skip_clock_update > 0) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; @@ -4131,7 +4131,7 @@ static inline void schedule_debug(struct task_struct *prev) static void put_prev_task(struct rq *rq, struct task_struct *prev) { - if (prev->on_rq) + if (prev->on_rq || rq->skip_clock_update < 0) update_rq_clock(rq); prev->sched_class->put_prev_task(rq, prev); } diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 19ecb31..3276b94 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -572,8 +572,15 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) enqueue = 1; } - if (enqueue) + if (enqueue) { + /* + * Tag a forced clock update if we're coming out of idle + * so rq->clock_task will be updated when we schedule(). + */ + if (rq->curr == rq->idle) + rq->skip_clock_update = -1; sched_rt_rq_enqueue(rt_rq); + } raw_spin_unlock(&rq->lock); } ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time 2011-04-22 8:21 ` Mike Galbraith @ 2011-04-22 20:52 ` Thomas Giesel 2011-04-27 17:51 ` Thomas Giesel 1 sibling, 0 replies; 6+ messages in thread From: Thomas Giesel @ 2011-04-22 20:52 UTC (permalink / raw) To: Mike Galbraith; +Cc: linux-kernel, Peter Zijlstra Mike, > Hm. Does forcing a clock update if we're idle when we release the > throttle do the trick? Thanks. I'm on holidays currently. I'll check it next week when I'm back. But I think it should work in this way. Thomas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time 2011-04-22 8:21 ` Mike Galbraith 2011-04-22 20:52 ` Thomas Giesel @ 2011-04-27 17:51 ` Thomas Giesel 2011-04-29 6:36 ` [patch] " Mike Galbraith 1 sibling, 1 reply; 6+ messages in thread From: Thomas Giesel @ 2011-04-27 17:51 UTC (permalink / raw) To: Mike Galbraith; +Cc: linux-kernel, Peter Zijlstra > Hm. Does forcing a clock update if we're idle when we release the > throttle do the trick? It does. I tested it today and it works as expected. Even with ftrace I couldn't see any suspicious behaviour anymore. Mike: Can you send the patch to the right people to get it into the kernel or should I do it? Or is Peter the right one already? Thanks for your help. Thomas ^ permalink raw reply [flat|nested] 6+ messages in thread
* [patch] Re: rt scheduler may calculate wrong rt_time 2011-04-27 17:51 ` Thomas Giesel @ 2011-04-29 6:36 ` Mike Galbraith 2011-05-16 10:37 ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith 0 siblings, 1 reply; 6+ messages in thread From: Mike Galbraith @ 2011-04-29 6:36 UTC (permalink / raw) To: Thomas Giesel; +Cc: linux-kernel, Peter Zijlstra On Wed, 2011-04-27 at 19:51 +0200, Thomas Giesel wrote: > > Hm. Does forcing a clock update if we're idle when we release the > > throttle do the trick? > > It does. I tested it today and it works as expected. Even with ftrace I > couldn't see any suspicious behaviour anymore. > > Mike: Can you send the patch to the right people to get it into the > kernel or should I do it? Or is Peter the right one already? Peter is the right one. Below is an ever so slightly different version. sched, rt: update rq clock when unthrottling of an otherwise idle CPU If an RT task is awakened while it's rt_rq is throttled, the time between wakeup/enqueue and unthrottle/selection may be accounted as rt_time if the CPU is idle. Set rq->skip_clock_update negative upon throttle release to tell put_prev_task() that we need a clock update. Signed-off-by: Mike Galbraith <efault@gmx.de> Reported-by: Thomas Giesel <skoe@directbox.com> --- kernel/sched.c | 6 +++--- kernel/sched_rt.c | 7 +++++++ 2 files changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -464,7 +464,7 @@ struct rq { u64 nohz_stamp; unsigned char nohz_balance_kick; #endif - unsigned int skip_clock_update; + int skip_clock_update; /* capture load from *all* tasks on this cpu: */ struct load_weight load; @@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *r { s64 delta; - if (rq->skip_clock_update) + if (rq->skip_clock_update > 0) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; @@ -4125,7 +4125,7 @@ static inline void schedule_debug(struct static void put_prev_task(struct rq *rq, struct task_struct *prev) { - if (prev->on_rq) + if (prev->on_rq || rq->skip_clock_update < 0) update_rq_clock(rq); prev->sched_class->put_prev_task(rq, prev); } Index: linux-2.6/kernel/sched_rt.c =================================================================== --- linux-2.6.orig/kernel/sched_rt.c +++ linux-2.6/kernel/sched_rt.c @@ -562,6 +562,13 @@ static int do_sched_rt_period_timer(stru if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) { rt_rq->rt_throttled = 0; enqueue = 1; + + /* + * Force a clock update if the CPU was idle, + * lest wakeup -> unthrottle time accumulate. + */ + if (rt_rq->rt_nr_running && rq->curr == rq->idle) + rq->skip_clock_update = -1; } if (rt_rq->rt_time || rt_rq->rt_nr_running) idle = 0; ^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU 2011-04-29 6:36 ` [patch] " Mike Galbraith @ 2011-05-16 10:37 ` tip-bot for Mike Galbraith 0 siblings, 0 replies; 6+ messages in thread From: tip-bot for Mike Galbraith @ 2011-05-16 10:37 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, skoe, hpa, mingo, a.p.zijlstra, efault, tglx, mingo Commit-ID: 61eadef6a9bde9ea62fda724a9cb501ce9bc925a Gitweb: http://git.kernel.org/tip/61eadef6a9bde9ea62fda724a9cb501ce9bc925a Author: Mike Galbraith <efault@gmx.de> AuthorDate: Fri, 29 Apr 2011 08:36:50 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Mon, 16 May 2011 11:01:17 +0200 sched, rt: Update rq clock when unthrottling of an otherwise idle CPU If an RT task is awakened while it's rt_rq is throttled, the time between wakeup/enqueue and unthrottle/selection may be accounted as rt_time if the CPU is idle. Set rq->skip_clock_update negative upon throttle release to tell put_prev_task() that we need a clock update. Reported-by: Thomas Giesel <skoe@directbox.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1304059010.7472.1.camel@marge.simson.net Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 6 +++--- kernel/sched_rt.c | 7 +++++++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index f9778c0..b8b9a7d 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -466,7 +466,7 @@ struct rq { u64 nohz_stamp; unsigned char nohz_balance_kick; #endif - unsigned int skip_clock_update; + int skip_clock_update; /* capture load from *all* tasks on this cpu: */ struct load_weight load; @@ -652,7 +652,7 @@ static void update_rq_clock(struct rq *rq) { s64 delta; - if (rq->skip_clock_update) + if (rq->skip_clock_update > 0) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; @@ -4127,7 +4127,7 @@ static inline void schedule_debug(struct task_struct *prev) static void put_prev_task(struct rq *rq, struct task_struct *prev) { - if (prev->on_rq) + if (prev->on_rq || rq->skip_clock_update < 0) update_rq_clock(rq); prev->sched_class->put_prev_task(rq, prev); } diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 19ecb31..0943ed7 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -562,6 +562,13 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) { rt_rq->rt_throttled = 0; enqueue = 1; + + /* + * Force a clock update if the CPU was idle, + * lest wakeup -> unthrottle time accumulate. + */ + if (rt_rq->rt_nr_running && rq->curr == rq->idle) + rq->skip_clock_update = -1; } if (rt_rq->rt_time || rt_rq->rt_nr_running) idle = 0; ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-05-16 10:37 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel 2011-04-22 8:21 ` Mike Galbraith 2011-04-22 20:52 ` Thomas Giesel 2011-04-27 17:51 ` Thomas Giesel 2011-04-29 6:36 ` [patch] " Mike Galbraith 2011-05-16 10:37 ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox