* rt scheduler may calculate wrong rt_time
@ 2011-04-21 12:55 Thomas Giesel
2011-04-22 8:21 ` Mike Galbraith
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Giesel @ 2011-04-21 12:55 UTC (permalink / raw)
To: linux-kernel
Friends of the scheduler,
I found that the current (well, at least 2.6.38) scheduler calculates a
wrong rt_time for realtime tasks in certain situations.
Example scenario:
- HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other
setups, but that's what I did)
- a high priority rt task (A) gets packets from Ethernet about every 10
ms
- a low priority rt task (B) unfortunately runs for a longer time
(here: endlessly :)
- no other tasks running (i.e. about 5 ms idle left per period)
When the runtime of the realtime tasks is exceeded (e.g. by (B)), they
are throttled. During this time idle is scheduled. When in idle,
tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes
update_rq_clock() _not_ to be called for a while. When a realtime task
is woken up during this time (e.g. (A) by network traffic),
update_rq_clock() is called from enqueue_task(). The task is not picked
yet, because it is still throttled. After a while
sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle
will call schedule().
schedule() picks (A) which has been woken up a while ago.
_pick_next_task_rt() sets exec_start to rq->clock_task. But this has
been updated last time when the task was woken up, which could have
been up to 5 ms ago in my example. So exec_start contains a time
_before_ the task was actually started. As a result of this, rt_time is
calculated too large which makes the rt tasks being throttled even
earlier in the next period. This error may even increase from interval
to interval, because the throttle-window (initially 5 ms) also
increases.
IMHO the best place to update clock_task would be to call a function
from tick_nohz_restart_sched_tick(). But currently I don't see a
suitable interface to the scheduler to do this. Currently I call
update_rq_clock(rq) just before put_prev_task() in schedule(). This
solves the issue and causes rt_runtime to be kept quite accurately.
(Well, same result would be to remove "if (...)" in put_prev_task())
What do you think is the best way to solve this issue?
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time
2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel
@ 2011-04-22 8:21 ` Mike Galbraith
2011-04-22 20:52 ` Thomas Giesel
2011-04-27 17:51 ` Thomas Giesel
0 siblings, 2 replies; 6+ messages in thread
From: Mike Galbraith @ 2011-04-22 8:21 UTC (permalink / raw)
To: Thomas Giesel; +Cc: linux-kernel, Peter Zijlstra
On Thu, 2011-04-21 at 14:55 +0200, Thomas Giesel wrote:
> Friends of the scheduler,
>
> I found that the current (well, at least 2.6.38) scheduler calculates a
> wrong rt_time for realtime tasks in certain situations.
>
> Example scenario:
> - HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other
> setups, but that's what I did)
> - a high priority rt task (A) gets packets from Ethernet about every 10
> ms
> - a low priority rt task (B) unfortunately runs for a longer time
> (here: endlessly :)
> - no other tasks running (i.e. about 5 ms idle left per period)
>
> When the runtime of the realtime tasks is exceeded (e.g. by (B)), they
> are throttled. During this time idle is scheduled. When in idle,
> tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes
> update_rq_clock() _not_ to be called for a while. When a realtime task
> is woken up during this time (e.g. (A) by network traffic),
> update_rq_clock() is called from enqueue_task(). The task is not picked
> yet, because it is still throttled. After a while
> sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle
> will call schedule().
>
> schedule() picks (A) which has been woken up a while ago.
> _pick_next_task_rt() sets exec_start to rq->clock_task. But this has
> been updated last time when the task was woken up, which could have
> been up to 5 ms ago in my example. So exec_start contains a time
> _before_ the task was actually started. As a result of this, rt_time is
> calculated too large which makes the rt tasks being throttled even
> earlier in the next period. This error may even increase from interval
> to interval, because the throttle-window (initially 5 ms) also
> increases.
>
> IMHO the best place to update clock_task would be to call a function
> from tick_nohz_restart_sched_tick(). But currently I don't see a
> suitable interface to the scheduler to do this. Currently I call
> update_rq_clock(rq) just before put_prev_task() in schedule(). This
> solves the issue and causes rt_runtime to be kept quite accurately.
> (Well, same result would be to remove "if (...)" in put_prev_task())
Hm. Does forcing a clock update if we're idle when we release the
throttle do the trick?
diff --git a/kernel/sched.c b/kernel/sched.c
index 8cb0a57..97245ef 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -464,7 +464,7 @@ struct rq {
u64 nohz_stamp;
unsigned char nohz_balance_kick;
#endif
- unsigned int skip_clock_update;
+ int skip_clock_update;
/* capture load from *all* tasks on this cpu: */
struct load_weight load;
@@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *rq)
{
s64 delta;
- if (rq->skip_clock_update)
+ if (rq->skip_clock_update > 0)
return;
delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -4131,7 +4131,7 @@ static inline void schedule_debug(struct task_struct *prev)
static void put_prev_task(struct rq *rq, struct task_struct *prev)
{
- if (prev->on_rq)
+ if (prev->on_rq || rq->skip_clock_update < 0)
update_rq_clock(rq);
prev->sched_class->put_prev_task(rq, prev);
}
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 19ecb31..3276b94 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -572,8 +572,15 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
enqueue = 1;
}
- if (enqueue)
+ if (enqueue) {
+ /*
+ * Tag a forced clock update if we're coming out of idle
+ * so rq->clock_task will be updated when we schedule().
+ */
+ if (rq->curr == rq->idle)
+ rq->skip_clock_update = -1;
sched_rt_rq_enqueue(rt_rq);
+ }
raw_spin_unlock(&rq->lock);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time
2011-04-22 8:21 ` Mike Galbraith
@ 2011-04-22 20:52 ` Thomas Giesel
2011-04-27 17:51 ` Thomas Giesel
1 sibling, 0 replies; 6+ messages in thread
From: Thomas Giesel @ 2011-04-22 20:52 UTC (permalink / raw)
To: Mike Galbraith; +Cc: linux-kernel, Peter Zijlstra
Mike,
> Hm. Does forcing a clock update if we're idle when we release the
> throttle do the trick?
Thanks. I'm on holidays currently. I'll check it next week when I'm
back. But I think it should work in this way.
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rt scheduler may calculate wrong rt_time
2011-04-22 8:21 ` Mike Galbraith
2011-04-22 20:52 ` Thomas Giesel
@ 2011-04-27 17:51 ` Thomas Giesel
2011-04-29 6:36 ` [patch] " Mike Galbraith
1 sibling, 1 reply; 6+ messages in thread
From: Thomas Giesel @ 2011-04-27 17:51 UTC (permalink / raw)
To: Mike Galbraith; +Cc: linux-kernel, Peter Zijlstra
> Hm. Does forcing a clock update if we're idle when we release the
> throttle do the trick?
It does. I tested it today and it works as expected. Even with ftrace I
couldn't see any suspicious behaviour anymore.
Mike: Can you send the patch to the right people to get it into the
kernel or should I do it? Or is Peter the right one already?
Thanks for your help.
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* [patch] Re: rt scheduler may calculate wrong rt_time
2011-04-27 17:51 ` Thomas Giesel
@ 2011-04-29 6:36 ` Mike Galbraith
2011-05-16 10:37 ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith
0 siblings, 1 reply; 6+ messages in thread
From: Mike Galbraith @ 2011-04-29 6:36 UTC (permalink / raw)
To: Thomas Giesel; +Cc: linux-kernel, Peter Zijlstra
On Wed, 2011-04-27 at 19:51 +0200, Thomas Giesel wrote:
> > Hm. Does forcing a clock update if we're idle when we release the
> > throttle do the trick?
>
> It does. I tested it today and it works as expected. Even with ftrace I
> couldn't see any suspicious behaviour anymore.
>
> Mike: Can you send the patch to the right people to get it into the
> kernel or should I do it? Or is Peter the right one already?
Peter is the right one. Below is an ever so slightly different version.
sched, rt: update rq clock when unthrottling of an otherwise idle CPU
If an RT task is awakened while it's rt_rq is throttled, the time between
wakeup/enqueue and unthrottle/selection may be accounted as rt_time
if the CPU is idle. Set rq->skip_clock_update negative upon throttle
release to tell put_prev_task() that we need a clock update.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reported-by: Thomas Giesel <skoe@directbox.com>
---
kernel/sched.c | 6 +++---
kernel/sched_rt.c | 7 +++++++
2 files changed, 10 insertions(+), 3 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -464,7 +464,7 @@ struct rq {
u64 nohz_stamp;
unsigned char nohz_balance_kick;
#endif
- unsigned int skip_clock_update;
+ int skip_clock_update;
/* capture load from *all* tasks on this cpu: */
struct load_weight load;
@@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *r
{
s64 delta;
- if (rq->skip_clock_update)
+ if (rq->skip_clock_update > 0)
return;
delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -4125,7 +4125,7 @@ static inline void schedule_debug(struct
static void put_prev_task(struct rq *rq, struct task_struct *prev)
{
- if (prev->on_rq)
+ if (prev->on_rq || rq->skip_clock_update < 0)
update_rq_clock(rq);
prev->sched_class->put_prev_task(rq, prev);
}
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -562,6 +562,13 @@ static int do_sched_rt_period_timer(stru
if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
rt_rq->rt_throttled = 0;
enqueue = 1;
+
+ /*
+ * Force a clock update if the CPU was idle,
+ * lest wakeup -> unthrottle time accumulate.
+ */
+ if (rt_rq->rt_nr_running && rq->curr == rq->idle)
+ rq->skip_clock_update = -1;
}
if (rt_rq->rt_time || rt_rq->rt_nr_running)
idle = 0;
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
2011-04-29 6:36 ` [patch] " Mike Galbraith
@ 2011-05-16 10:37 ` tip-bot for Mike Galbraith
0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Mike Galbraith @ 2011-05-16 10:37 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, skoe, hpa, mingo, a.p.zijlstra, efault, tglx, mingo
Commit-ID: 61eadef6a9bde9ea62fda724a9cb501ce9bc925a
Gitweb: http://git.kernel.org/tip/61eadef6a9bde9ea62fda724a9cb501ce9bc925a
Author: Mike Galbraith <efault@gmx.de>
AuthorDate: Fri, 29 Apr 2011 08:36:50 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 16 May 2011 11:01:17 +0200
sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
If an RT task is awakened while it's rt_rq is throttled, the time between
wakeup/enqueue and unthrottle/selection may be accounted as rt_time
if the CPU is idle. Set rq->skip_clock_update negative upon throttle
release to tell put_prev_task() that we need a clock update.
Reported-by: Thomas Giesel <skoe@directbox.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1304059010.7472.1.camel@marge.simson.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 6 +++---
kernel/sched_rt.c | 7 +++++++
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index f9778c0..b8b9a7d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -466,7 +466,7 @@ struct rq {
u64 nohz_stamp;
unsigned char nohz_balance_kick;
#endif
- unsigned int skip_clock_update;
+ int skip_clock_update;
/* capture load from *all* tasks on this cpu: */
struct load_weight load;
@@ -652,7 +652,7 @@ static void update_rq_clock(struct rq *rq)
{
s64 delta;
- if (rq->skip_clock_update)
+ if (rq->skip_clock_update > 0)
return;
delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -4127,7 +4127,7 @@ static inline void schedule_debug(struct task_struct *prev)
static void put_prev_task(struct rq *rq, struct task_struct *prev)
{
- if (prev->on_rq)
+ if (prev->on_rq || rq->skip_clock_update < 0)
update_rq_clock(rq);
prev->sched_class->put_prev_task(rq, prev);
}
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 19ecb31..0943ed7 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -562,6 +562,13 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
rt_rq->rt_throttled = 0;
enqueue = 1;
+
+ /*
+ * Force a clock update if the CPU was idle,
+ * lest wakeup -> unthrottle time accumulate.
+ */
+ if (rt_rq->rt_nr_running && rq->curr == rq->idle)
+ rq->skip_clock_update = -1;
}
if (rt_rq->rt_time || rt_rq->rt_nr_running)
idle = 0;
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-05-16 10:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel
2011-04-22 8:21 ` Mike Galbraith
2011-04-22 20:52 ` Thomas Giesel
2011-04-27 17:51 ` Thomas Giesel
2011-04-29 6:36 ` [patch] " Mike Galbraith
2011-05-16 10:37 ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.