public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Yong Zhang <yong.zhang0@gmail.com>,
	"Bjoern B. Brandenburg" <bbb.lst@gmail.com>,
	Ingo Molnar <mingo@elte.hu>,
	Andrea Bastoni <bastoni@sprg.uniroma2.it>,
	"James H. Anderson" <anderson@cs.unc.edu>,
	linux-kernel@vger.kernel.org
Subject: Re: [patchlet] Re: Scheduler bug related to rq->skip_clock_update?
Date: Wed, 08 Dec 2010 11:05:42 +0100	[thread overview]
Message-ID: <1291802742.1417.9.camel@marge.simson.net> (raw)
In-Reply-To: <1291748109.7475.20.camel@marge.simson.net>

On Tue, 2010-12-07 at 19:55 +0100, Mike Galbraith wrote:
> On Tue, 2010-12-07 at 17:41 +0100, Peter Zijlstra wrote:
> > On Mon, 2010-12-06 at 09:32 +0100, Mike Galbraith wrote:
> > 
> > >  kernel/fork.c  |    1 +
> > >  kernel/sched.c |    6 +++---
> > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > Index: linux-2.6.37.git/kernel/sched.c
> > > ===================================================================
> > > --- linux-2.6.37.git.orig/kernel/sched.c
> > > +++ linux-2.6.37.git/kernel/sched.c
> > > @@ -660,6 +660,7 @@ inline void update_rq_clock(struct rq *r
> > >  
> > >  		sched_irq_time_avg_update(rq, irq_time);
> > >  	}
> > > +	rq->skip_clock_update = 0;
> > >  }
> > >  
> > >  /*
> > 
> > Shouldn't we do that at the end of schedule()? Since the purpose of
> > ->skip_clock_update is to avoid multiple calls to:
> >   - avoid overhead
> >   - ensure scheduling is accounted at a single point
> > 
> > [ for that latter purpose it might also make sense to put that point
> > somewhere around context_switch() but due to the fact that we need a
> > clock update early that's a bit impractical. ]
> > 
> > Hmm?
> 
> Yeah, could do that instead.  There's no gain in any call that may
> happen in the interval between.  Think I'll measure though, this bug was
> a surprise :)

Did that, all is well.

> > OK.. have we looked if there's more TIF flags that could do with a
> > reset?
> 
> mmm, no.

Crawling over flags had to be left for another day.

Sched: fix skip_clock_update optimization

idle_balance() drops/retakes rq->lock, leaving the previous task
vulnerable to set_tsk_need_resched().  Clear it after we return
from balancing instead, and in setup_thread_stack() as well, so
no successfully descheduled or never scheduled task has it set.
 
Need resched confused the skip_clock_update logic, which assumes
that the next call to update_rq_clock() will come nearly immediately
after being set.  Make the optimization robust against the waking
a sleeper before it sucessfully deschedules case by checking that
the current task has not been dequeued before setting the flag,
since it is that useless clock update we're trying to save, and
clear unconditionally in schedule() proper instead of conditionally
in put_prev_task().

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Bjoern B. Brandenburg <bbb.lst@gmail.com>
Reported-by: Bjoern B. Brandenburg <bbb.lst@gmail.com>
Tested-by: Yong Zhang <yong.zhang0@gmail.com>

---
 kernel/fork.c  |    1 +
 kernel/sched.c |   28 +++++++++++++++-------------
 2 files changed, 16 insertions(+), 13 deletions(-)

Index: linux-2.6.37.git/kernel/sched.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sched.c
+++ linux-2.6.37.git/kernel/sched.c
@@ -649,17 +649,18 @@ static void sched_irq_time_avg_update(st
 
 inline void update_rq_clock(struct rq *rq)
 {
-	if (!rq->skip_clock_update) {
-		int cpu = cpu_of(rq);
-		u64 irq_time;
-
-		rq->clock = sched_clock_cpu(cpu);
-		irq_time = irq_time_cpu(cpu);
-		if (rq->clock - irq_time > rq->clock_task)
-			rq->clock_task = rq->clock - irq_time;
+	int cpu = cpu_of(rq);
+	u64 irq_time;
 
-		sched_irq_time_avg_update(rq, irq_time);
-	}
+	if (rq->skip_clock_update)
+		return;
+
+	rq->clock = sched_clock_cpu(cpu);
+	irq_time = irq_time_cpu(cpu);
+	if (rq->clock - irq_time > rq->clock_task)
+		rq->clock_task = rq->clock - irq_time;
+
+	sched_irq_time_avg_update(rq, irq_time);
 }
 
 /*
@@ -2138,7 +2139,7 @@ static void check_preempt_curr(struct rq
 	 * A queue event has occurred, and we're going to schedule.  In
 	 * this case, we can save a useless back to back clock update.
 	 */
-	if (test_tsk_need_resched(rq->curr))
+	if (rq->curr->se.on_rq && test_tsk_need_resched(rq->curr))
 		rq->skip_clock_update = 1;
 }
 
@@ -3854,7 +3855,6 @@ static void put_prev_task(struct rq *rq,
 {
 	if (prev->se.on_rq)
 		update_rq_clock(rq);
-	rq->skip_clock_update = 0;
 	prev->sched_class->put_prev_task(rq, prev);
 }
 
@@ -3912,7 +3912,6 @@ need_resched_nonpreemptible:
 		hrtick_clear(rq);
 
 	raw_spin_lock_irq(&rq->lock);
-	clear_tsk_need_resched(prev);
 
 	switch_count = &prev->nivcsw;
 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
@@ -3944,6 +3943,8 @@ need_resched_nonpreemptible:
 
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
+	clear_tsk_need_resched(prev);
+	rq->skip_clock_update = 0;
 
 	if (likely(prev != next)) {
 		sched_info_switch(prev, next);
@@ -3952,6 +3953,7 @@ need_resched_nonpreemptible:
 		rq->nr_switches++;
 		rq->curr = next;
 		++*switch_count;
+		WARN_ON_ONCE(test_tsk_need_resched(next));
 
 		context_switch(rq, prev, next); /* unlocks the rq */
 		/*
Index: linux-2.6.37.git/kernel/fork.c
===================================================================
--- linux-2.6.37.git.orig/kernel/fork.c
+++ linux-2.6.37.git/kernel/fork.c
@@ -275,6 +275,7 @@ static struct task_struct *dup_task_stru
 
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
+	clear_tsk_need_resched(tsk);
 	stackend = end_of_stack(tsk);
 	*stackend = STACK_END_MAGIC;	/* for overflow detection */
 



  reply	other threads:[~2010-12-08 10:05 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-21  4:22 Scheduler bug related to rq->skip_clock_update? Bjoern B. Brandenburg
2010-11-21 17:14 ` Mike Galbraith
2010-11-22  4:29   ` Bjoern B. Brandenburg
2010-11-22 16:19     ` Mike Galbraith
2010-11-22 18:14       ` Bjoern B. Brandenburg
2010-12-04  7:42         ` Yong Zhang
2010-12-04 14:05           ` Yong Zhang
2010-12-04 14:08         ` Yong Zhang
2010-12-04 14:33           ` Yong Zhang
2010-12-05  5:28         ` Yong Zhang
2010-12-06  5:33           ` Mike Galbraith
2010-12-06  7:59             ` Yong Zhang
2010-12-06  8:32               ` [patchlet] " Mike Galbraith
2010-12-07 16:41                 ` Peter Zijlstra
2010-12-07 18:55                   ` Mike Galbraith
2010-12-08 10:05                     ` Mike Galbraith [this message]
2010-12-08 11:12                       ` Peter Zijlstra
2010-12-08 20:40                       ` [tip:sched/urgent] Sched: fix skip_clock_update optimization tip-bot for Mike Galbraith
2010-12-09 15:32                         ` Peter Zijlstra
2010-12-10  2:33                           ` Yong Zhang
2010-12-10 16:17                             ` Peter Zijlstra
2010-12-06 15:40             ` Scheduler bug related to rq->skip_clock_update? Bjoern B. Brandenburg
2010-12-03 12:41 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291802742.1417.9.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=anderson@cs.unc.edu \
    --cc=bastoni@sprg.uniroma2.it \
    --cc=bbb.lst@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=yong.zhang0@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox