All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Thomas Giesel <skoe@directbox.com>
Cc: linux-kernel@vger.kernel.org, Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: rt scheduler may calculate wrong rt_time
Date: Fri, 22 Apr 2011 10:21:31 +0200	[thread overview]
Message-ID: <1303460491.28545.12.camel@marge.simson.net> (raw)
In-Reply-To: <20110421145510.28cb7b78@skoe.de>

On Thu, 2011-04-21 at 14:55 +0200, Thomas Giesel wrote:
> Friends of the scheduler,
> 
> I found that the current (well, at least 2.6.38) scheduler calculates a
> wrong rt_time for realtime tasks in certain situations.
> 
> Example scenario:
> - HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other
>   setups, but that's what I did)
> - a high priority rt task (A) gets packets from Ethernet about every 10
>   ms
> - a low priority rt task (B) unfortunately runs for a longer time
>   (here: endlessly :)
> - no other tasks running (i.e. about 5 ms idle left per period)
> 
> When the runtime of the realtime tasks is exceeded (e.g. by (B)), they
> are throttled. During this time idle is scheduled. When in idle,
> tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes
> update_rq_clock() _not_ to be called for a while. When a realtime task
> is woken up during this time (e.g. (A) by network traffic),
> update_rq_clock() is called from enqueue_task(). The task is not picked
> yet, because it is still throttled. After a while
> sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle
> will call schedule().
> 
> schedule() picks (A) which has been woken up a while ago.
> _pick_next_task_rt() sets exec_start to rq->clock_task. But this has
> been updated last time when the task was woken up, which could have
> been up to 5 ms ago in my example. So exec_start contains a time
> _before_ the task was actually started. As a result of this, rt_time is
> calculated too large which makes the rt tasks being throttled even
> earlier in the next period. This error may even increase from interval
> to interval, because the throttle-window (initially 5 ms) also
> increases.
> 
> IMHO the best place to update clock_task would be to call a function
> from tick_nohz_restart_sched_tick(). But currently I don't see a
> suitable interface to the scheduler to do this. Currently I call
> update_rq_clock(rq) just before put_prev_task() in schedule(). This
> solves the issue and causes rt_runtime to be kept quite accurately.
> (Well, same result would be to remove "if (...)" in put_prev_task())

Hm.  Does forcing a clock update if we're idle when we release the
throttle do the trick?

diff --git a/kernel/sched.c b/kernel/sched.c
index 8cb0a57..97245ef 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -464,7 +464,7 @@ struct rq {
 	u64 nohz_stamp;
 	unsigned char nohz_balance_kick;
 #endif
-	unsigned int skip_clock_update;
+	int skip_clock_update;
 
 	/* capture load from *all* tasks on this cpu: */
 	struct load_weight load;
@@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *rq)
 {
 	s64 delta;
 
-	if (rq->skip_clock_update)
+	if (rq->skip_clock_update > 0)
 		return;
 
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -4131,7 +4131,7 @@ static inline void schedule_debug(struct task_struct *prev)
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	if (prev->on_rq)
+	if (prev->on_rq || rq->skip_clock_update < 0)
 		update_rq_clock(rq);
 	prev->sched_class->put_prev_task(rq, prev);
 }
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 19ecb31..3276b94 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -572,8 +572,15 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 				enqueue = 1;
 		}
 
-		if (enqueue)
+		if (enqueue) {
+			/*
+			 * Tag a forced clock update if we're coming out of idle
+			 * so rq->clock_task will be updated when we schedule().
+			 */
+			if (rq->curr == rq->idle)
+				rq->skip_clock_update = -1;
 			sched_rt_rq_enqueue(rt_rq);
+		}
 		raw_spin_unlock(&rq->lock);
 	}
 




  reply	other threads:[~2011-04-22  8:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel
2011-04-22  8:21 ` Mike Galbraith [this message]
2011-04-22 20:52   ` Thomas Giesel
2011-04-27 17:51   ` Thomas Giesel
2011-04-29  6:36     ` [patch] " Mike Galbraith
2011-05-16 10:37       ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1303460491.28545.12.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=skoe@directbox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.