All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>,
	Alex Shi <alex.shi@intel.com>, Ying <ying.huang@intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: load balancing regression since commit 367456c7
Date: Wed, 25 Apr 2012 16:56:03 +0200	[thread overview]
Message-ID: <1335365763.28150.267.camel@twins> (raw)
In-Reply-To: <1334943202.2463.71.camel@laptop>

On Fri, 2012-04-20 at 19:33 +0200, Peter Zijlstra wrote:
> 
> OK, I'll go stare at the cgroup part then.. Thanks!
> 
Ok, I could reproduce when using cgroups, the below fixes it for me, can
you confirm?

---
Subject: sched: Fix more load-balance fallout
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Apr 17 13:38:40 CEST 2012

Commits 367456c756a6 ("sched: Ditch per cgroup task lists for
load-balancing") and 5d6523ebd ("sched: Fix load-balance wreckage")
left some more wreckage.

By setting loop_max unconditionally to ->nr_running load-balancing
could take a lot of time on very long runqueues (hackbench!). So keep
the sysctl as max limit of the amount of tasks we'll iterate.

Furthermore, the min load filter for migration completely fails with
cgroups since inequality in per-cpu state can easily lead to such
small loads :/

Furthermore the change to add new tasks to the tail of the queue
instead of the head seems to have some effect.. not quite sure I
understand why.

Combined these fixes solve the huge hackbench regression reported by
Tim when hackbench is ran in a cgroup.

Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched/fair.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -784,7 +784,7 @@ account_entity_enqueue(struct cfs_rq *cf
 		update_load_add(&rq_of(cfs_rq)->load, se->load.weight);
 #ifdef CONFIG_SMP
 	if (entity_is_task(se))
-		list_add_tail(&se->group_node, &rq_of(cfs_rq)->cfs_tasks);
+		list_add(&se->group_node, &rq_of(cfs_rq)->cfs_tasks);
 #endif
 	cfs_rq->nr_running++;
 }
@@ -3215,6 +3215,14 @@ static int move_one_task(struct lb_env *
 
 static unsigned long task_h_load(struct task_struct *p);
 
+static const unsigned int sched_nr_migrate_break =
+#ifdef CONFIG_PREEMPT
+	8
+#else
+	32
+#endif
+	;
+
 /*
  * move_tasks tries to move up to load_move weighted load from busiest to
  * this_rq, as part of a balancing operation within domain "sd".
@@ -3242,7 +3250,7 @@ static int move_tasks(struct lb_env *env
 
 		/* take a breather every nr_migrate tasks */
 		if (env->loop > env->loop_break) {
-			env->loop_break += sysctl_sched_nr_migrate;
+			env->loop_break += sched_nr_migrate_break;
 			env->flags |= LBF_NEED_BREAK;
 			break;
 		}
@@ -3252,7 +3260,7 @@ static int move_tasks(struct lb_env *env
 
 		load = task_h_load(p);
 
-		if (load < 16 && !env->sd->nr_balance_failed)
+		if (sched_feat(LB_MIN) && load < 16 && !env->sd->nr_balance_failed)
 			goto next;
 
 		if ((load / 2) > env->load_move)
@@ -4407,7 +4415,7 @@ static int load_balance(int this_cpu, st
 		.dst_cpu	= this_cpu,
 		.dst_rq		= this_rq,
 		.idle		= idle,
-		.loop_break	= sysctl_sched_nr_migrate,
+		.loop_break	= sched_nr_migrate_break,
 	};
 
 	cpumask_copy(cpus, cpu_active_mask);
@@ -4448,7 +4456,8 @@ static int load_balance(int this_cpu, st
 		env.load_move = imbalance;
 		env.src_cpu = busiest->cpu;
 		env.src_rq = busiest;
-		env.loop_max = busiest->nr_running;
+		env.loop_max = min_t(unsigned long,
+				sysctl_sched_nr_migrate, busiest->nr_running);
 
 more_balance:
 		local_irq_save(flags);


  reply	other threads:[~2012-04-25 14:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-11  1:06 load balancing regression since commit 367456c7 Tim Chen
2012-04-17 11:43 ` Peter Zijlstra
2012-04-17 12:09 ` Peter Zijlstra
2012-04-17 16:44   ` Tim Chen
2012-04-20 14:00     ` Peter Zijlstra
2012-04-20 16:40       ` Tim Chen
2012-04-20 16:53         ` Peter Zijlstra
2012-04-20 17:13           ` Tim Chen
2012-04-20 17:33             ` Peter Zijlstra
2012-04-25 14:56               ` Peter Zijlstra [this message]
2012-04-25 17:38                 ` Tim Chen
2012-04-25 17:43                   ` Peter Zijlstra
2012-04-25 17:58                     ` Tim Chen
2012-04-26 11:56                 ` [tip:sched/urgent] sched: Fix more load-balancing fallout tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1335365763.28150.267.camel@twins \
    --to=peterz@infradead.org \
    --cc=alex.shi@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=suresh.b.siddha@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.