[PATCH] sched/deadline: Always calculate end of period on sched

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched/deadline: Always calculate end of period on sched_yield()
@ 2016-02-12 23:10 Steven Rostedt
  2016-02-15 10:18 ` Juri Lelli
  2016-02-23 12:28 ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Steven Rostedt @ 2016-02-12 23:10 UTC (permalink / raw)
  To: LKML
  Cc: Juri Lelli, Peter Zijlstra, Ingo Molnar, Clark Williams,
	Daniel Bristot de Oliveira, John Kacur

I'm writing a test case for SCHED_DEADLINE, and notice a strange
anomaly. Every so often, a deadline is missed and when I looked into
it, it happened because the sched_yield() had no effect (it didn't end
the previous period and let the start of the next runtime happen on the
end of the old period).

deadline-2228    7...1   116.778420: sys_enter_sched_yield: 
deadline-2228    7d..3   116.778421: hrtimer_cancel:       hrtimer=0xffff88011ebd79a0
deadline-2228    7d..2   116.778422: rcu_utilization:      Start context switch
deadline-2228    7d..2   116.778423: rcu_utilization:      End context switch
deadline-2228    7d..4   116.778423: hrtimer_start:        hrtimer=0xffff88011ebd79a0 function=hrtick/0x0 expires=116124420428 softexpires=116124420428
deadline-2228    7...1   116.778425: sys_exit_sched_yield: 0x0

Schedule was never called. A added some trace_printks() and discovered
that this happens when sched_yield() is called right after a tick that
updates its current bandwidth.

When the schedule tick happens that updates the current bandwidth,
update_curr_dl() is called, where it updates curr->se.exec_start to
rq_clock_task(rq).

The rq_clock_task(rq) gets updated by update_rq_clock_task() that gets
update by various points in the scheduler.

Now, if the user task calls sched_yield() just after a bandwidth update
synced curr->se.exec_start to rq_clock_task(rq), when sched_yield()
calls into update_curr_dl() we have:

	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
	if (unlikely((s64)delta_exec <= 0))
		return;

Coming in here from a sched_yield() will have delta_exec == 0 if the
sched_yield() was called after a DL tick and before another
update_rq_clock_task() is called.

This means that the task will not release its remaining runtime, and
the will start off in the current period when it expected to be in the
next period.

The fix that appears to work for me is to add a test in
update_curr_dl() to not exit if delta_exec is zero and
dl_se->dl_yielded is true.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index cd64c979d0e1..1dd180cda574 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
 	 * approach need further study.
 	 */
 	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
-	if (unlikely((s64)delta_exec <= 0))
+	if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
 		return;

 	schedstat_set(curr->se.statistics.exec_max,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Always calculate end of period on sched_yield()
  2016-02-12 23:10 [PATCH] sched/deadline: Always calculate end of period on sched_yield() Steven Rostedt
@ 2016-02-15 10:18 ` Juri Lelli
  2016-02-15 12:37   ` Daniel Bristot de Oliveira
  2016-02-15 16:22   ` Steven Rostedt
  2016-02-23 12:28 ` Peter Zijlstra
  1 sibling, 2 replies; 8+ messages in thread
From: Juri Lelli @ 2016-02-15 10:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Juri Lelli, Peter Zijlstra, Ingo Molnar, Clark Williams,
	Daniel Bristot de Oliveira, John Kacur

Hi,

On 12/02/16 18:10, Steven Rostedt wrote:
> I'm writing a test case for SCHED_DEADLINE, and notice a strange
> anomaly. Every so often, a deadline is missed and when I looked into
> it, it happened because the sched_yield() had no effect (it didn't end
> the previous period and let the start of the next runtime happen on the
> end of the old period).
> 
> deadline-2228    7...1   116.778420: sys_enter_sched_yield: 
> deadline-2228    7d..3   116.778421: hrtimer_cancel:       hrtimer=0xffff88011ebd79a0
> deadline-2228    7d..2   116.778422: rcu_utilization:      Start context switch
> deadline-2228    7d..2   116.778423: rcu_utilization:      End context switch
> deadline-2228    7d..4   116.778423: hrtimer_start:        hrtimer=0xffff88011ebd79a0 function=hrtick/0x0 expires=116124420428 softexpires=116124420428
> deadline-2228    7...1   116.778425: sys_exit_sched_yield: 0x0
> 
> 
> Schedule was never called. A added some trace_printks() and discovered
> that this happens when sched_yield() is called right after a tick that
> updates its current bandwidth.
> 
> When the schedule tick happens that updates the current bandwidth,
> update_curr_dl() is called, where it updates curr->se.exec_start to
> rq_clock_task(rq).
> 
> The rq_clock_task(rq) gets updated by update_rq_clock_task() that gets
> update by various points in the scheduler.
> 
> Now, if the user task calls sched_yield() just after a bandwidth update
> synced curr->se.exec_start to rq_clock_task(rq), when sched_yield()
> calls into update_curr_dl() we have:
> 
> 	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> 	if (unlikely((s64)delta_exec <= 0))
> 		return;
> 
> Coming in here from a sched_yield() will have delta_exec == 0 if the
> sched_yield() was called after a DL tick and before another
> update_rq_clock_task() is called.
> 
> This means that the task will not release its remaining runtime, and
> the will start off in the current period when it expected to be in the
> next period.
> 
> The fix that appears to work for me is to add a test in
> update_curr_dl() to not exit if delta_exec is zero and
> dl_se->dl_yielded is true.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cd64c979d0e1..1dd180cda574 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
>  	 * approach need further study.
>  	 */
>  	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> -	if (unlikely((s64)delta_exec <= 0))
> +	if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
>  		return;
>

This looks good to me. Do you think we could also skip some of the
following updates/accounting in this case? Not sure we win anything by
doing that, though.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Always calculate end of period on sched_yield()
  2016-02-15 10:18 ` Juri Lelli
@ 2016-02-15 12:37   ` Daniel Bristot de Oliveira
  2016-02-15 16:22   ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Bristot de Oliveira @ 2016-02-15 12:37 UTC (permalink / raw)
  To: Juri Lelli, Steven Rostedt
  Cc: LKML, Juri Lelli, Peter Zijlstra, Ingo Molnar, Clark Williams,
	John Kacur



On 02/15/2016 08:18 AM, Juri Lelli wrote:
> Do you think we could also skip some of the
> following updates/accounting in this case? Not sure we win anything by
> doing that, though.

I reviewed rostedt's patch and the following updates/accounting
operations. I agree with rostedt's patch, and also agree that
if (delta_exec == 0) it is a good idea to skip some += 0 and
function calls of the next updates/accounting operations,
before the if (dl_runtime_exeeded...).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Always calculate end of period on sched_yield()
  2016-02-15 10:18 ` Juri Lelli
  2016-02-15 12:37   ` Daniel Bristot de Oliveira
@ 2016-02-15 16:22   ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2016-02-15 16:22 UTC (permalink / raw)
  To: Juri Lelli
  Cc: LKML, Juri Lelli, Peter Zijlstra, Ingo Molnar, Clark Williams,
	Daniel Bristot de Oliveira, John Kacur

On Mon, 15 Feb 2016 10:18:24 +0000
Juri Lelli <juri.lelli@arm.com> wrote:

> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index cd64c979d0e1..1dd180cda574 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
> >  	 * approach need further study.
> >  	 */
> >  	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> > -	if (unlikely((s64)delta_exec <= 0))
> > +	if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
> >  		return;
> >  
> 
> This looks good to me. Do you think we could also skip some of the
> following updates/accounting in this case? Not sure we win anything by
> doing that, though.
>

Well, I would say we get this patch in first and think about other
updates second. This fixes one bug, might as well pull it in.

I'm now looking into a second bug. I'm getting:

 RT throttling activated

and

 DL replenish lagged to much

messages, back to back, when I'm only using 50% of the band width.
Looks to be a leak of how much is being used. The big issue here is
that these messages kill the test due to the latency caused to perform
the printk(). After the messages are splatted out (they only print once
per boot), the tests run fine again. IOW, there seems to be no real
issue of something doing too much bandwidth.

I get this with or without this current patch.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Always calculate end of period on sched_yield()
  2016-02-12 23:10 [PATCH] sched/deadline: Always calculate end of period on sched_yield() Steven Rostedt
  2016-02-15 10:18 ` Juri Lelli
@ 2016-02-23 12:28 ` Peter Zijlstra
  2016-02-23 13:12   ` Steven Rostedt
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: Peter Zijlstra @ 2016-02-23 12:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Juri Lelli, Ingo Molnar, Clark Williams,
	Daniel Bristot de Oliveira, John Kacur

On Fri, Feb 12, 2016 at 06:10:20PM -0500, Steven Rostedt wrote:
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cd64c979d0e1..1dd180cda574 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
>  	 * approach need further study.
>  	 */
>  	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> -	if (unlikely((s64)delta_exec <= 0))
> +	if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
>  		return;
>  
>  	schedstat_set(curr->se.statistics.exec_max,


Would something like this make sense instead?

It also retains the ->runtime while yielded, and would actually 'fix' a
case where, when we call yield, we would have had a negative runtime
after update_curr_dl().

The current code will 'gift' us extra runtime in that case.

---
 kernel/sched/deadline.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 57b939c81bce..c2bca80d3388 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -399,6 +399,9 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
 		dl_se->runtime = pi_se->dl_runtime;
 	}
 
+	if (dl_se->dl_yielded && dl_se->runtime > 0)
+		dl_se->runtime = 0;
+
 	/*
 	 * We keep moving the deadline away until we get some
 	 * available runtime for the entity. This ensures correct
@@ -735,8 +738,11 @@ static void update_curr_dl(struct rq *rq)
 	 * approach need further study.
 	 */
 	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
-	if (unlikely((s64)delta_exec <= 0))
+	if (unlikely((s64)delta_exec <= 0)) {
+		if (unlikely(dl_se->dl_yielded))
+			goto throttle;
 		return;
+	}
 
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
@@ -749,8 +755,10 @@ static void update_curr_dl(struct rq *rq)
 
 	sched_rt_avg_update(rq, delta_exec);
 
-	dl_se->runtime -= dl_se->dl_yielded ? 0 : delta_exec;
-	if (dl_runtime_exceeded(dl_se)) {
+	dl_se->runtime -= delta_exec;
+
+throttle:
+	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
 		dl_se->dl_throttled = 1;
 		__dequeue_task_dl(rq, curr, 0);
 		if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr)))
@@ -1002,10 +1010,8 @@ static void yield_task_dl(struct rq *rq)
 	 * it and the bandwidth timer will wake it up and will give it
 	 * new scheduling parameters (thanks to dl_yielded=1).
 	 */
-	if (p->dl.runtime > 0) {
-		rq->curr->dl.dl_yielded = 1;
-		p->dl.runtime = 0;
-	}
+	rq->curr->dl.dl_yielded = 1;
+
 	update_rq_clock(rq);
 	update_curr_dl(rq);
 	/*

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Always calculate end of period on sched_yield()
  2016-02-23 12:28 ` Peter Zijlstra
@ 2016-02-23 13:12   ` Steven Rostedt
  2016-02-23 15:04   ` Steven Rostedt
  2016-02-29 11:14   ` [tip:sched/core] " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2016-02-23 13:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Juri Lelli, Ingo Molnar, Clark Williams,
	Daniel Bristot de Oliveira, John Kacur

On Tue, 23 Feb 2016 13:28:22 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Feb 12, 2016 at 06:10:20PM -0500, Steven Rostedt wrote:
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index cd64c979d0e1..1dd180cda574 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
> >  	 * approach need further study.
> >  	 */
> >  	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> > -	if (unlikely((s64)delta_exec <= 0))
> > +	if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
> >  		return;
> >  
> >  	schedstat_set(curr->se.statistics.exec_max,  
> 
> 
> Would something like this make sense instead?
> 

I'll test it and see if it works.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Always calculate end of period on sched_yield()
  2016-02-23 12:28 ` Peter Zijlstra
  2016-02-23 13:12   ` Steven Rostedt
@ 2016-02-23 15:04   ` Steven Rostedt
  2016-02-29 11:14   ` [tip:sched/core] " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2016-02-23 15:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Juri Lelli, Ingo Molnar, Clark Williams,
	Daniel Bristot de Oliveira, John Kacur

On Tue, 23 Feb 2016 13:28:22 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Would something like this make sense instead?

It works perfectly.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>

Thanks!

-- Steve

> 
> It also retains the ->runtime while yielded, and would actually 'fix' a
> case where, when we call yield, we would have had a negative runtime
> after update_curr_dl().
> 
> The current code will 'gift' us extra runtime in that case.
> 
> ---

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/core] sched/deadline: Always calculate end of period on sched_yield()
  2016-02-23 12:28 ` Peter Zijlstra
  2016-02-23 13:12   ` Steven Rostedt
  2016-02-23 15:04   ` Steven Rostedt
@ 2016-02-29 11:14   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-02-29 11:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: rostedt, linux-kernel, williams, juri.lelli, hpa, torvalds, tglx,
	jkacur, mingo, peterz, bristot

Commit-ID:  48be3a67da7413d62e5efbcf2c73a9dddf61fb96
Gitweb:     http://git.kernel.org/tip/48be3a67da7413d62e5efbcf2c73a9dddf61fb96
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 23 Feb 2016 13:28:22 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 29 Feb 2016 09:41:51 +0100

sched/deadline: Always calculate end of period on sched_yield()

Steven noticed that occasionally a sched_yield() call would not result
in a wait for the next period edge as expected.

It turns out that when we call update_curr_dl() and end up with
delta_exec <= 0, we will bail early and fail to throttle.

Further inspection of the yield code revealed that yield_task_dl()
clearing dl.runtime is wrong too, it will not account the last bit of
runtime which could result in dl.runtime < 0, which in turn means that
replenish would gift us with too much runtime.

Fix both issues by not relying on the dl.runtime value for yield.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160223122822.GP6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/deadline.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 57b939c..04a569c 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -399,6 +399,9 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
 		dl_se->runtime = pi_se->dl_runtime;
 	}
 
+	if (dl_se->dl_yielded && dl_se->runtime > 0)
+		dl_se->runtime = 0;
+
 	/*
 	 * We keep moving the deadline away until we get some
 	 * available runtime for the entity. This ensures correct
@@ -735,8 +738,11 @@ static void update_curr_dl(struct rq *rq)
 	 * approach need further study.
 	 */
 	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
-	if (unlikely((s64)delta_exec <= 0))
+	if (unlikely((s64)delta_exec <= 0)) {
+		if (unlikely(dl_se->dl_yielded))
+			goto throttle;
 		return;
+	}
 
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
@@ -749,8 +755,10 @@ static void update_curr_dl(struct rq *rq)
 
 	sched_rt_avg_update(rq, delta_exec);
 
-	dl_se->runtime -= dl_se->dl_yielded ? 0 : delta_exec;
-	if (dl_runtime_exceeded(dl_se)) {
+	dl_se->runtime -= delta_exec;
+
+throttle:
+	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
 		dl_se->dl_throttled = 1;
 		__dequeue_task_dl(rq, curr, 0);
 		if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr)))
@@ -994,18 +1002,14 @@ static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
  */
 static void yield_task_dl(struct rq *rq)
 {
-	struct task_struct *p = rq->curr;
-
 	/*
 	 * We make the task go to sleep until its current deadline by
 	 * forcing its runtime to zero. This way, update_curr_dl() stops
 	 * it and the bandwidth timer will wake it up and will give it
 	 * new scheduling parameters (thanks to dl_yielded=1).
 	 */
-	if (p->dl.runtime > 0) {
-		rq->curr->dl.dl_yielded = 1;
-		p->dl.runtime = 0;
-	}
+	rq->curr->dl.dl_yielded = 1;
+
 	update_rq_clock(rq);
 	update_curr_dl(rq);
 	/*

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-02-29 11:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-12 23:10 [PATCH] sched/deadline: Always calculate end of period on sched_yield() Steven Rostedt
2016-02-15 10:18 ` Juri Lelli
2016-02-15 12:37   ` Daniel Bristot de Oliveira
2016-02-15 16:22   ` Steven Rostedt
2016-02-23 12:28 ` Peter Zijlstra
2016-02-23 13:12   ` Steven Rostedt
2016-02-23 15:04   ` Steven Rostedt
2016-02-29 11:14   ` [tip:sched/core] " tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox