From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: "Chris Friesen" <cfriesen@nortel.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
a.p.zijlstra@chello.nl, pj@sgi.com,
Balbir Singh <balbir@in.ibm.com>,
aneesh.kumar@linux.vnet.ibm.com, dhaval@linux.vnet.ibm.com
Subject: Re: fair group scheduler not so fair?
Date: Wed, 28 May 2008 22:03:18 +0530 [thread overview]
Message-ID: <20080528163318.GG30285@linux.vnet.ibm.com> (raw)
In-Reply-To: <483C4F5A.2010104@nortel.com>
On Tue, May 27, 2008 at 12:13:46PM -0600, Chris Friesen wrote:
>> Can you check if this makes a difference for you as well?
>
> Initially it looked promising. I put pid 2498 in group A, and pids 2499
> and 2500 in group B. 2498 got basically a full cpu, and the other two got
> 50% each.
>
> However, I then moved pid 2499 from group B to group A, and the system got
> stuck in the following behaviour:
>
> 2498 cfriesen 20 0 3800 392 336 R 99.7 0.0 3:00.22 cat
> 2500 cfriesen 20 0 3800 392 336 R 66.7 0.0 1:39.10 cat
> 2499 cfriesen 20 0 3800 392 336 R 33.0 0.0 1:24.31 cat
>
> I reproduced this a number of times.
Thanks for trying this combination. I discovered a task-leak in this loop
(__load_balance_iterator):
/* Skip over entities that are not tasks */
do {
se = list_entry(next, struct sched_entity, group_node);
next = next->next;
} while (next != &cfs_rq->tasks && !entity_is_task(se));
if (next == &cfs_rq->tasks)
return NULL;
We seem to be skipping the last element in the task list always. In your
case, the lone task in Group a/b is always skipped because of this.
The following hunk seems to fix this:
@@ -1386,9 +1386,6 @@ __load_balance_iterator(struct cfs_rq *c
next = next->next;
} while (next != &cfs_rq->tasks && !entity_is_task(se));
- if (next == &cfs_rq->tasks)
- return NULL;
-
cfs_rq->balance_iterator = next;
if (entity_is_task(se))
Updated patch (on top of 2.6.26-rc3 +
http://programming.kicks-ass.net/kernel-patches/sched-smp-group-fixes/)
below. Pls let me know how it fares!
---
include/linux/sched.h | 4 ++++
init/Kconfig | 2 +-
kernel/sched.c | 5 ++++-
kernel/sched_debug.c | 3 ++-
kernel/sched_fair.c | 3 ---
5 files changed, 11 insertions(+), 6 deletions(-)
Index: current/include/linux/sched.h
===================================================================
--- current.orig/include/linux/sched.h
+++ current/include/linux/sched.h
@@ -698,7 +698,11 @@ enum cpu_idle_type {
#define SCHED_LOAD_SHIFT 10
#define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)
+#ifdef CONFIG_FAIR_GROUP_SCHED
+#define SCHED_LOAD_SCALE_FUZZ 0
+#else
#define SCHED_LOAD_SCALE_FUZZ SCHED_LOAD_SCALE
+#endif
#ifdef CONFIG_SMP
#define SD_LOAD_BALANCE 1 /* Do load balancing on this domain. */
Index: current/init/Kconfig
===================================================================
--- current.orig/init/Kconfig
+++ current/init/Kconfig
@@ -349,7 +349,7 @@ config RT_GROUP_SCHED
See Documentation/sched-rt-group.txt for more information.
choice
- depends on GROUP_SCHED
+ depends on GROUP_SCHED && (FAIR_GROUP_SCHED || RT_GROUP_SCHED)
prompt "Basis for grouping tasks"
default USER_SCHED
Index: current/kernel/sched.c
===================================================================
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -1534,6 +1534,9 @@ tg_shares_up(struct task_group *tg, int
unsigned long shares = 0;
int i;
+ if (!tg->parent)
+ return;
+
for_each_cpu_mask(i, sd->span) {
rq_weight += tg->cfs_rq[i]->load.weight;
shares += tg->cfs_rq[i]->shares;
@@ -2919,7 +2922,7 @@ next:
* skip a task if it will be the highest priority task (i.e. smallest
* prio value) on its new queue regardless of its load weight
*/
- skip_for_load = (p->se.load.weight >> 1) > rem_load_move +
+ skip_for_load = (p->se.load.weight >> 1) >= rem_load_move +
SCHED_LOAD_SCALE_FUZZ;
if ((skip_for_load && p->prio >= *this_best_prio) ||
!can_migrate_task(p, busiest, this_cpu, sd, idle, &pinned)) {
Index: current/kernel/sched_debug.c
===================================================================
--- current.orig/kernel/sched_debug.c
+++ current/kernel/sched_debug.c
@@ -119,7 +119,7 @@ void print_cfs_rq(struct seq_file *m, in
struct sched_entity *last;
unsigned long flags;
-#if !defined(CONFIG_CGROUP_SCHED) || !defined(CONFIG_USER_SCHED)
+#ifndef CONFIG_CGROUP_SCHED
SEQ_printf(m, "\ncfs_rq[%d]:\n", cpu);
#else
char path[128] = "";
@@ -170,6 +170,7 @@ void print_cfs_rq(struct seq_file *m, in
#ifdef CONFIG_FAIR_GROUP_SCHED
#ifdef CONFIG_SMP
SEQ_printf(m, " .%-30s: %lu\n", "shares", cfs_rq->shares);
+ SEQ_printf(m, " .%-30s: %lu\n", "h_load", cfs_rq->h_load);
#endif
#endif
}
Index: current/kernel/sched_fair.c
===================================================================
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -1386,9 +1386,6 @@ __load_balance_iterator(struct cfs_rq *c
next = next->next;
} while (next != &cfs_rq->tasks && !entity_is_task(se));
- if (next == &cfs_rq->tasks)
- return NULL;
-
cfs_rq->balance_iterator = next;
if (entity_is_task(se))
--
Regards,
vatsa
next prev parent reply other threads:[~2008-05-28 16:24 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-21 23:59 fair group scheduler not so fair? Chris Friesen
2008-05-22 6:56 ` Peter Zijlstra
2008-05-22 20:02 ` Chris Friesen
2008-05-22 20:07 ` Peter Zijlstra
2008-05-22 20:18 ` Li, Tong N
2008-05-22 21:13 ` Peter Zijlstra
2008-05-23 0:17 ` Chris Friesen
2008-05-23 7:44 ` Srivatsa Vaddagiri
2008-05-23 9:42 ` Srivatsa Vaddagiri
2008-05-23 9:39 ` Peter Zijlstra
2008-05-23 10:19 ` Srivatsa Vaddagiri
2008-05-23 10:16 ` Peter Zijlstra
2008-05-27 17:15 ` Srivatsa Vaddagiri
2008-05-27 18:13 ` Chris Friesen
2008-05-28 16:33 ` Srivatsa Vaddagiri [this message]
2008-05-28 18:35 ` Chris Friesen
2008-05-28 18:47 ` Dhaval Giani
2008-05-29 2:50 ` Srivatsa Vaddagiri
2008-05-29 16:46 ` Srivatsa Vaddagiri
2008-05-29 16:47 ` Srivatsa Vaddagiri
2008-05-29 21:30 ` Chris Friesen
2008-05-30 6:43 ` Dhaval Giani
2008-05-30 10:21 ` Srivatsa Vaddagiri
2008-05-30 11:36 ` Srivatsa Vaddagiri
2008-06-02 20:03 ` Chris Friesen
2008-05-27 17:28 ` Srivatsa Vaddagiri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080528163318.GG30285@linux.vnet.ibm.com \
--to=vatsa@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=balbir@in.ibm.com \
--cc=cfriesen@nortel.com \
--cc=dhaval@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.