Re: fair group scheduler not so fair?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: "Chris Friesen" <cfriesen@nortel.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	a.p.zijlstra@chello.nl, pj@sgi.com,
	Balbir Singh <balbir@in.ibm.com>,
	aneesh.kumar@linux.vnet.ibm.com, dhaval@linux.vnet.ibm.com
Subject: Re: fair group scheduler not so fair?
Date: Wed, 28 May 2008 22:03:18 +0530	[thread overview]
Message-ID: <20080528163318.GG30285@linux.vnet.ibm.com> (raw)
In-Reply-To: <483C4F5A.2010104@nortel.com>

On Tue, May 27, 2008 at 12:13:46PM -0600, Chris Friesen wrote:
>> Can you check if this makes a difference for you as well?
>
> Initially it looked promising.  I put pid 2498 in group A, and pids 2499 
> and 2500 in group B.  2498 got basically a full cpu, and the other two got 
> 50% each.
>
> However, I then moved pid 2499 from group B to group A, and the system got 
> stuck in the following behaviour:
>
> 2498 cfriesen  20   0  3800  392  336 R 99.7  0.0   3:00.22 cat
> 2500 cfriesen  20   0  3800  392  336 R 66.7  0.0   1:39.10 cat
> 2499 cfriesen  20   0  3800  392  336 R 33.0  0.0   1:24.31 cat
>
> I reproduced this a number of times.

Thanks for trying this combination. I discovered a task-leak in this loop
(__load_balance_iterator):

        /* Skip over entities that are not tasks */
        do {
                se = list_entry(next, struct sched_entity, group_node);
                next = next->next;
        } while (next != &cfs_rq->tasks && !entity_is_task(se));

        if (next == &cfs_rq->tasks)
                return NULL;

We seem to be skipping the last element in the task list always. In your
case, the lone task in Group a/b is always skipped because of this.

The following hunk seems to fix this:

@@ -1386,9 +1386,6 @@ __load_balance_iterator(struct cfs_rq *c
 		next = next->next;
 	} while (next != &cfs_rq->tasks && !entity_is_task(se));
 
-	if (next == &cfs_rq->tasks)
-		return NULL;
-
 	cfs_rq->balance_iterator = next;
 
 	if (entity_is_task(se))


Updated patch (on top of 2.6.26-rc3 +
http://programming.kicks-ass.net/kernel-patches/sched-smp-group-fixes/)
below.  Pls let me know how it fares!


---
 include/linux/sched.h |    4 ++++
 init/Kconfig          |    2 +-
 kernel/sched.c        |    5 ++++-
 kernel/sched_debug.c  |    3 ++-
 kernel/sched_fair.c   |    3 ---
 5 files changed, 11 insertions(+), 6 deletions(-)

Index: current/include/linux/sched.h
===================================================================
--- current.orig/include/linux/sched.h
+++ current/include/linux/sched.h
@@ -698,7 +698,11 @@ enum cpu_idle_type {
 #define SCHED_LOAD_SHIFT	10
 #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+#define SCHED_LOAD_SCALE_FUZZ	0
+#else
 #define SCHED_LOAD_SCALE_FUZZ	SCHED_LOAD_SCALE
+#endif
 
 #ifdef CONFIG_SMP
 #define SD_LOAD_BALANCE		1	/* Do load balancing on this domain. */
Index: current/init/Kconfig
===================================================================
--- current.orig/init/Kconfig
+++ current/init/Kconfig
@@ -349,7 +349,7 @@ config RT_GROUP_SCHED
 	  See Documentation/sched-rt-group.txt for more information.
 
 choice
-	depends on GROUP_SCHED
+	depends on GROUP_SCHED && (FAIR_GROUP_SCHED || RT_GROUP_SCHED)
 	prompt "Basis for grouping tasks"
 	default USER_SCHED
 
Index: current/kernel/sched.c
===================================================================
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -1534,6 +1534,9 @@ tg_shares_up(struct task_group *tg, int 
 	unsigned long shares = 0;
 	int i;
 
+	if (!tg->parent)
+		return;
+
 	for_each_cpu_mask(i, sd->span) {
 		rq_weight += tg->cfs_rq[i]->load.weight;
 		shares += tg->cfs_rq[i]->shares;
@@ -2919,7 +2922,7 @@ next:
 	 * skip a task if it will be the highest priority task (i.e. smallest
 	 * prio value) on its new queue regardless of its load weight
 	 */
-	skip_for_load = (p->se.load.weight >> 1) > rem_load_move +
+	skip_for_load = (p->se.load.weight >> 1) >= rem_load_move +
 							 SCHED_LOAD_SCALE_FUZZ;
 	if ((skip_for_load && p->prio >= *this_best_prio) ||
 	    !can_migrate_task(p, busiest, this_cpu, sd, idle, &pinned)) {
Index: current/kernel/sched_debug.c
===================================================================
--- current.orig/kernel/sched_debug.c
+++ current/kernel/sched_debug.c
@@ -119,7 +119,7 @@ void print_cfs_rq(struct seq_file *m, in
 	struct sched_entity *last;
 	unsigned long flags;
 
-#if !defined(CONFIG_CGROUP_SCHED) || !defined(CONFIG_USER_SCHED)
+#ifndef CONFIG_CGROUP_SCHED
 	SEQ_printf(m, "\ncfs_rq[%d]:\n", cpu);
 #else
 	char path[128] = "";
@@ -170,6 +170,7 @@ void print_cfs_rq(struct seq_file *m, in
 #ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_SMP
 	SEQ_printf(m, "  .%-30s: %lu\n", "shares", cfs_rq->shares);
+	SEQ_printf(m, "  .%-30s: %lu\n", "h_load", cfs_rq->h_load);
 #endif
 #endif
 }
Index: current/kernel/sched_fair.c
===================================================================
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -1386,9 +1386,6 @@ __load_balance_iterator(struct cfs_rq *c
 		next = next->next;
 	} while (next != &cfs_rq->tasks && !entity_is_task(se));
 
-	if (next == &cfs_rq->tasks)
-		return NULL;
-
 	cfs_rq->balance_iterator = next;
 
 	if (entity_is_task(se))


-- 
Regards,
vatsa

next prev parent reply	other threads:[~2008-05-28 16:24 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-21 23:59 fair group scheduler not so fair? Chris Friesen
2008-05-22  6:56 ` Peter Zijlstra
2008-05-22 20:02   ` Chris Friesen
2008-05-22 20:07     ` Peter Zijlstra
2008-05-22 20:18       ` Li, Tong N
2008-05-22 21:13         ` Peter Zijlstra
2008-05-23  0:17           ` Chris Friesen
2008-05-23  7:44             ` Srivatsa Vaddagiri
2008-05-23  9:42         ` Srivatsa Vaddagiri
2008-05-23  9:39           ` Peter Zijlstra
2008-05-23 10:19             ` Srivatsa Vaddagiri
2008-05-23 10:16               ` Peter Zijlstra
2008-05-27 17:15 ` Srivatsa Vaddagiri
2008-05-27 18:13   ` Chris Friesen
2008-05-28 16:33     ` Srivatsa Vaddagiri [this message]
2008-05-28 18:35       ` Chris Friesen
2008-05-28 18:47         ` Dhaval Giani
2008-05-29  2:50         ` Srivatsa Vaddagiri
2008-05-29 16:46         ` Srivatsa Vaddagiri
2008-05-29 16:47           ` Srivatsa Vaddagiri
2008-05-29 21:30           ` Chris Friesen
2008-05-30  6:43             ` Dhaval Giani
2008-05-30 10:21               ` Srivatsa Vaddagiri
2008-05-30 11:36             ` Srivatsa Vaddagiri
2008-06-02 20:03               ` Chris Friesen
2008-05-27 17:28 ` Srivatsa Vaddagiri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080528163318.GG30285@linux.vnet.ibm.com \
    --to=vatsa@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=balbir@in.ibm.com \
    --cc=cfriesen@nortel.com \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.