public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Brian Rogers <brian@xyzw.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [BUG] How to get real-time priority using idle priority
Date: Thu, 15 Jan 2009 10:28:43 +0100	[thread overview]
Message-ID: <1232011723.26761.36.camel@marge.simson.net> (raw)
In-Reply-To: <496BE8F6.1040308@xyzw.org>

On Mon, 2009-01-12 at 17:05 -0800, Brian Rogers wrote:
> Ingo Molnar wrote:
> > i've applied your fix to tip/sched/urgent and merged it into tip/master.
> > Brian, you might want to test tip/master, as per:
> >
> >  http://people.redhat.com/mingo/tip.git/README
> >
> > which now has this fix included. Can you make it break?
> >   
> Yeah, I was able to trigger the same freeze again once on my desktop, 
> but it appears to be harder to trigger now. I couldn't get my program to 
> freeze the system, but one of the times I suspended then resumed BOINC, 
> it happened.

OK, the below seems to cure all of the problems I've encountered, please
take it out for a spin.

The rate of advance thing in set_task_cpu() seems to have been a case of
fixing the symptom instead of the problem.  Perhaps this needs more
thought, but my box says "Red-Herring" ;-)

The real problem (excluding the SCHED_IDLE specific problems), is that
update_min_vruntime() doesn't work quite as intended, and will slam
min_vruntime far right if load balancing etc places a task which is far
right of the currently running task on the runqueue.  If the currently
running, and up to this point the min_vruntime pace setter, is a hog,
any task waking to this runqueue after min_vruntime leaps forward has to
wait for the hog to consume the gap.  In the case of SCHED_IDLE tasks,
that gap can be huge, but even with a nice 19 tasks it can be quite
large and painful.

Removing the if (vruntime == cfs_rq->min_vruntime) test, which will be
true if the currently running task is the pace setter, cured it for me.

If this cures your woes (and Peter acks it), I'll split and submit.

diff --git a/kernel/sched.c b/kernel/sched.c
index deb5ac8..e9f0762 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1320,8 +1320,8 @@ static inline void update_load_sub(struct load_weight *lw, unsigned long dec)
  * slice expiry etc.
  */
 
-#define WEIGHT_IDLEPRIO		2
-#define WMULT_IDLEPRIO		(1 << 31)
+#define WEIGHT_IDLEPRIO		3
+#define WMULT_IDLEPRIO		1431655765
 
 /*
  * Nice levels are multiplicative, with a gentle 10% change for every
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8e1352c..761071d 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -283,10 +283,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 						   struct sched_entity,
 						   run_node);
 
-		if (vruntime == cfs_rq->min_vruntime)
-			vruntime = se->vruntime;
-		else
-			vruntime = min_vruntime(vruntime, se->vruntime);
+		vruntime = min_vruntime(vruntime, se->vruntime);
 	}
 
 	cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
@@ -677,9 +674,13 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 			unsigned long thresh = sysctl_sched_latency;
 
 			/*
-			 * convert the sleeper threshold into virtual time
+			 * Convert the sleeper threshold into virtual time.
+			 * SCHED_IDLE is a special sub-class.  We care about
+			 * fairness only relative to other SCHED_IDLE tasks,
+			 * all of which have the same weight.
 			 */
-			if (sched_feat(NORMALIZED_SLEEPER))
+			if (sched_feat(NORMALIZED_SLEEPER) &&
+					task_of(se)->policy != SCHED_IDLE)
 				thresh = calc_delta_fair(thresh, se);
 
 			vruntime -= thresh;
@@ -1340,14 +1341,18 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
 
 static void set_last_buddy(struct sched_entity *se)
 {
-	for_each_sched_entity(se)
-		cfs_rq_of(se)->last = se;
+	for_each_sched_entity(se) {
+		if (likely(task_of(se)->policy != SCHED_IDLE))
+			cfs_rq_of(se)->last = se;
+	}
 }
 
 static void set_next_buddy(struct sched_entity *se)
 {
-	for_each_sched_entity(se)
-		cfs_rq_of(se)->next = se;
+	for_each_sched_entity(se) {
+		if (likely(task_of(se)->policy != SCHED_IDLE))
+			cfs_rq_of(se)->next = se;
+	}
 }
 
 /*
@@ -1393,12 +1398,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
 		return;
 
 	/*
-	 * Batch tasks do not preempt (their preemption is driven by
+	 * Batch and idle tasks do not preempt (their preemption is driven by
 	 * the tick):
 	 */
-	if (unlikely(p->policy == SCHED_BATCH))
+	if (unlikely(p->policy != SCHED_NORMAL))
 		return;
 
+	/* Idle tasks are by definition preempted by everybody. */
+	if (unlikely(curr->policy == SCHED_IDLE)) {
+		resched_task(curr);
+		return;
+	}
+
 	if (!sched_feat(WAKEUP_PREEMPT))
 		return;
 



  parent reply	other threads:[~2009-01-15  9:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-11 10:58 [BUG] How to get real-time priority using idle priority Brian Rogers
2009-01-12  5:09 ` Mike Galbraith
2009-01-12 13:03   ` Mike Galbraith
2009-01-12 13:14     ` Ingo Molnar
2009-01-12 15:23       ` Mike Galbraith
2009-01-12 15:24         ` Ingo Molnar
2009-01-13  1:05       ` Brian Rogers
2009-01-13  2:58         ` Mike Galbraith
2009-01-14  5:13           ` Mike Galbraith
2009-01-14  5:31             ` Ingo Molnar
2009-01-14  6:02               ` Mike Galbraith
2009-01-14  7:35                 ` Ingo Molnar
2009-01-15  9:28         ` Mike Galbraith [this message]
2009-01-15 10:14           ` Peter Zijlstra
2009-01-15 10:30             ` Mike Galbraith
2009-01-15 11:37               ` Mike Galbraith
2009-01-15 11:41                 ` Peter Zijlstra
2009-01-15 12:54                   ` Ingo Molnar
2009-01-15 13:05                     ` Peter Zijlstra
2009-01-15 13:15                       ` Ingo Molnar
2009-01-15 13:16                     ` Peter Zijlstra
2009-01-15 12:07           ` Brian Rogers
2009-01-12 20:46     ` [patch take 2] " Mike Galbraith
2009-01-12 20:50       ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1232011723.26761.36.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=brian@xyzw.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox