public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] optimization, clean-up about fair.c
@ 2013-08-06  8:36 Joonsoo Kim
  2013-08-06  8:36 ` [PATCH v3 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Joonsoo Kim @ 2013-08-06  8:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim, Joonsoo Kim

Patch 1 is for removing one division operation with multiplication.
Patch 2,3 is for clean-up related to load_balance(), there is improvement
in terms of code size and readability may be also improved.

Feel free to give a comment for this patchset.

It's tested on v3.10.
On v3.11-rc3, it can be compiled properly.

* Change from v2
[2/3]: set initial value 1 to should_balance in rebalance_domains()
	as Vincent commented
[3/3]: not overwrite sum_weighted_load to represent load_per_task.
	Instead, use newly load_per_task as Preeti commented

* Change from v1
Remove 2 patches, because I'm not sure they are right.

Joonsoo Kim (3):
  sched: remove one division operation in find_buiest_queue()
  sched: factor out code to should_we_balance()
  sched: clean-up struct sd_lb_stat

 kernel/sched/fair.c |  326 +++++++++++++++++++++++++--------------------------
 1 file changed, 162 insertions(+), 164 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/3] sched: remove one division operation in find_buiest_queue()
  2013-08-06  8:36 [PATCH v3 0/3] optimization, clean-up about fair.c Joonsoo Kim
@ 2013-08-06  8:36 ` Joonsoo Kim
  2013-09-02  7:40   ` [tip:sched/core] sched: Remove one division operation in find_busiest_queue() tip-bot for Joonsoo Kim
  2013-08-06  8:36 ` [PATCH v3 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
  2013-08-06  8:36 ` [PATCH v3 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
  2 siblings, 1 reply; 16+ messages in thread
From: Joonsoo Kim @ 2013-08-06  8:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim, Joonsoo Kim

Remove one division operation in find_buiest_queue().

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9565645..52898dc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4968,7 +4968,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 				     struct sched_group *group)
 {
 	struct rq *busiest = NULL, *rq;
-	unsigned long max_load = 0;
+	unsigned long busiest_load = 0, busiest_power = SCHED_POWER_SCALE;
 	int i;
 
 	for_each_cpu(i, sched_group_cpus(group)) {
@@ -4999,10 +4999,9 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		 * the load can be moved away from the cpu that is potentially
 		 * running at a lower capacity.
 		 */
-		wl = (wl * SCHED_POWER_SCALE) / power;
-
-		if (wl > max_load) {
-			max_load = wl;
+		if (wl * busiest_power > busiest_load * power) {
+			busiest_load = wl;
+			busiest_power = power;
 			busiest = rq;
 		}
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/3] sched: factor out code to should_we_balance()
  2013-08-06  8:36 [PATCH v3 0/3] optimization, clean-up about fair.c Joonsoo Kim
  2013-08-06  8:36 ` [PATCH v3 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
@ 2013-08-06  8:36 ` Joonsoo Kim
  2013-08-15 11:19   ` Peter Zijlstra
                     ` (2 more replies)
  2013-08-06  8:36 ` [PATCH v3 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
  2 siblings, 3 replies; 16+ messages in thread
From: Joonsoo Kim @ 2013-08-06  8:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim, Joonsoo Kim

Now checking whether this cpu is appropriate to balance or not
is embedded into update_sg_lb_stats() and this checking has no direct
relationship to this function. There is not enough reason to place
this checking at update_sg_lb_stats(), except saving one iteration
for sched_group_cpus.

In this patch, I factor out this checking to should_we_balance() function.
And before doing actual work for load_balancing, check whether this cpu is
appropriate to balance via should_we_balance(). If this cpu is not
a candidate for balancing, it quit the work immediately.

With this change, we can save two memset cost and can expect better
compiler optimization.

Below is result of this patch.

* Vanilla *
   text	   data	    bss	    dec	    hex	filename
  34499	   1136	    116	  35751	   8ba7	kernel/sched/fair.o

* Patched *
   text	   data	    bss	    dec	    hex	filename
  34243	   1136	    116	  35495	   8aa7	kernel/sched/fair.o

In addition, rename @balance to @should_balance in order to represent
its purpose more clearly.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 52898dc..c6732d2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4463,22 +4463,17 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
  * @group: sched_group whose statistics are to be updated.
  * @load_idx: Load index of sched_domain of this_cpu for load calc.
  * @local_group: Does group contain this_cpu.
- * @balance: Should we balance.
  * @sgs: variable to hold the statistics for this group.
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
 			struct sched_group *group, int load_idx,
-			int local_group, int *balance, struct sg_lb_stats *sgs)
+			int local_group, struct sg_lb_stats *sgs)
 {
 	unsigned long nr_running, max_nr_running, min_nr_running;
 	unsigned long load, max_cpu_load, min_cpu_load;
-	unsigned int balance_cpu = -1, first_idle_cpu = 0;
 	unsigned long avg_load_per_task = 0;
 	int i;
 
-	if (local_group)
-		balance_cpu = group_balance_cpu(group);
-
 	/* Tally up the load of all CPUs in the group */
 	max_cpu_load = 0;
 	min_cpu_load = ~0UL;
@@ -4491,15 +4486,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 		nr_running = rq->nr_running;
 
 		/* Bias balancing toward cpus of our domain */
-		if (local_group) {
-			if (idle_cpu(i) && !first_idle_cpu &&
-					cpumask_test_cpu(i, sched_group_mask(group))) {
-				first_idle_cpu = 1;
-				balance_cpu = i;
-			}
-
+		if (local_group)
 			load = target_load(i, load_idx);
-		} else {
+		else {
 			load = source_load(i, load_idx);
 			if (load > max_cpu_load)
 				max_cpu_load = load;
@@ -4519,22 +4508,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			sgs->idle_cpus++;
 	}
 
-	/*
-	 * First idle cpu or the first cpu(busiest) in this sched group
-	 * is eligible for doing load balancing at this and above
-	 * domains. In the newly idle case, we will allow all the cpu's
-	 * to do the newly idle load balance.
-	 */
-	if (local_group) {
-		if (env->idle != CPU_NEWLY_IDLE) {
-			if (balance_cpu != env->dst_cpu) {
-				*balance = 0;
-				return;
-			}
-			update_group_power(env->sd, env->dst_cpu);
-		} else if (time_after_eq(jiffies, group->sgp->next_update))
-			update_group_power(env->sd, env->dst_cpu);
-	}
+	if (local_group && (env->idle != CPU_NEWLY_IDLE ||
+			time_after_eq(jiffies, group->sgp->next_update)))
+		update_group_power(env->sd, env->dst_cpu);
 
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / group->sgp->power;
@@ -4613,7 +4589,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
  * @sds: variable to hold the statistics for this sched_domain.
  */
 static inline void update_sd_lb_stats(struct lb_env *env,
-					int *balance, struct sd_lb_stats *sds)
+					struct sd_lb_stats *sds)
 {
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
@@ -4630,10 +4606,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 
 		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
 		memset(&sgs, 0, sizeof(sgs));
-		update_sg_lb_stats(env, sg, load_idx, local_group, balance, &sgs);
-
-		if (local_group && !(*balance))
-			return;
+		update_sg_lb_stats(env, sg, load_idx, local_group, &sgs);
 
 		sds->total_load += sgs.group_load;
 		sds->total_pwr += sg->sgp->power;
@@ -4866,8 +4839,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
  * to restore balance.
  *
  * @env: The load balancing environment.
- * @balance: Pointer to a variable indicating if this_cpu
- *	is the appropriate cpu to perform load balancing at this_level.
  *
  * Returns:	- the busiest group if imbalance exists.
  *		- If no imbalance and user has opted for power-savings balance,
@@ -4875,7 +4846,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
  *		   put to idle by rebalancing its tasks onto our group.
  */
 static struct sched_group *
-find_busiest_group(struct lb_env *env, int *balance)
+find_busiest_group(struct lb_env *env)
 {
 	struct sd_lb_stats sds;
 
@@ -4885,14 +4856,7 @@ find_busiest_group(struct lb_env *env, int *balance)
 	 * Compute the various statistics relavent for load balancing at
 	 * this level.
 	 */
-	update_sd_lb_stats(env, balance, &sds);
-
-	/*
-	 * this_cpu is not the appropriate cpu to perform load balancing at
-	 * this level.
-	 */
-	if (!(*balance))
-		goto ret;
+	update_sd_lb_stats(env, &sds);
 
 	if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
 	    check_asym_packing(env, &sds))
@@ -4956,7 +4920,6 @@ force_balance:
 	return sds.busiest;
 
 out_balanced:
-ret:
 	env->imbalance = 0;
 	return NULL;
 }
@@ -5038,13 +5001,47 @@ static int need_active_balance(struct lb_env *env)
 
 static int active_load_balance_cpu_stop(void *data);
 
+static int should_we_balance(struct lb_env *env)
+{
+	struct sched_group *sg = env->sd->groups;
+	struct cpumask *sg_cpus, *sg_mask;
+	int cpu, balance_cpu = -1;
+
+	/*
+	 * In the newly idle case, we will allow all the cpu's
+	 * to do the newly idle load balance.
+	 */
+	if (env->idle == CPU_NEWLY_IDLE)
+		return 1;
+
+	sg_cpus = sched_group_cpus(sg);
+	sg_mask = sched_group_mask(sg);
+	/* Try to find first idle cpu */
+	for_each_cpu_and(cpu, sg_cpus, env->cpus) {
+		if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))
+			continue;
+
+		balance_cpu = cpu;
+		break;
+	}
+
+	if (balance_cpu == -1)
+		balance_cpu = group_balance_cpu(sg);
+
+	/*
+	 * First idle cpu or the first cpu(busiest) in this sched group
+	 * is eligible for doing load balancing at this and above domains.
+	 */
+	return balance_cpu != env->dst_cpu;
+}
+
 /*
  * Check this_cpu to ensure it is balanced within domain. Attempt to move
  * tasks if there is an imbalance.
  */
 static int load_balance(int this_cpu, struct rq *this_rq,
 			struct sched_domain *sd, enum cpu_idle_type idle,
-			int *balance)
+			int *should_balance)
 {
 	int ld_moved, cur_ld_moved, active_balance = 0;
 	struct sched_group *group;
@@ -5073,12 +5070,14 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 
 	schedstat_inc(sd, lb_count[idle]);
 
-redo:
-	group = find_busiest_group(&env, balance);
-
-	if (*balance == 0)
+	if (!should_we_balance(&env)) {
+		*should_balance = 0;
 		goto out_balanced;
+	} else
+		*should_balance = 1;
 
+redo:
+	group = find_busiest_group(&env);
 	if (!group) {
 		schedstat_inc(sd, lb_nobusyg[idle]);
 		goto out_balanced;
@@ -5291,7 +5290,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	rcu_read_lock();
 	for_each_domain(this_cpu, sd) {
 		unsigned long interval;
-		int balance = 1;
+		int should_balance;
 
 		if (!(sd->flags & SD_LOAD_BALANCE))
 			continue;
@@ -5299,7 +5298,8 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 		if (sd->flags & SD_BALANCE_NEWIDLE) {
 			/* If we've pulled tasks over stop searching: */
 			pulled_task = load_balance(this_cpu, this_rq,
-						   sd, CPU_NEWLY_IDLE, &balance);
+						   sd, CPU_NEWLY_IDLE,
+						   &should_balance);
 		}
 
 		interval = msecs_to_jiffies(sd->balance_interval);
@@ -5537,7 +5537,7 @@ void update_max_interval(void)
  */
 static void rebalance_domains(int cpu, enum cpu_idle_type idle)
 {
-	int balance = 1;
+	int should_balance = 1;
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long interval;
 	struct sched_domain *sd;
@@ -5569,7 +5569,7 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle)
 		}
 
 		if (time_after_eq(jiffies, sd->last_balance + interval)) {
-			if (load_balance(cpu, rq, sd, idle, &balance)) {
+			if (load_balance(cpu, rq, sd, idle, &should_balance)) {
 				/*
 				 * The LBF_SOME_PINNED logic could have changed
 				 * env->dst_cpu, so we can't know our idle
@@ -5592,7 +5592,7 @@ out:
 		 * CPU in our sched group which is doing load balancing more
 		 * actively.
 		 */
-		if (!balance)
+		if (!should_balance)
 			break;
 	}
 	rcu_read_unlock();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/3] sched: clean-up struct sd_lb_stat
  2013-08-06  8:36 [PATCH v3 0/3] optimization, clean-up about fair.c Joonsoo Kim
  2013-08-06  8:36 ` [PATCH v3 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
  2013-08-06  8:36 ` [PATCH v3 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
@ 2013-08-06  8:36 ` Joonsoo Kim
  2013-08-15 11:09   ` Peter Zijlstra
                     ` (2 more replies)
  2 siblings, 3 replies; 16+ messages in thread
From: Joonsoo Kim @ 2013-08-06  8:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim, Joonsoo Kim

There is no reason to maintain separate variables for this_group
and busiest_group in sd_lb_stat, except saving some space.
But this structure is always allocated in stack, so this saving
isn't really benificial.

This patch unify these variables, so IMO, readability may be improved.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c6732d2..f8a9660 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4232,36 +4232,6 @@ static unsigned long task_h_load(struct task_struct *p)
 
 /********** Helpers for find_busiest_group ************************/
 /*
- * sd_lb_stats - Structure to store the statistics of a sched_domain
- * 		during load balancing.
- */
-struct sd_lb_stats {
-	struct sched_group *busiest; /* Busiest group in this sd */
-	struct sched_group *this;  /* Local group in this sd */
-	unsigned long total_load;  /* Total load of all groups in sd */
-	unsigned long total_pwr;   /*	Total power of all groups in sd */
-	unsigned long avg_load;	   /* Average load across all groups in sd */
-
-	/** Statistics of this group */
-	unsigned long this_load;
-	unsigned long this_load_per_task;
-	unsigned long this_nr_running;
-	unsigned long this_has_capacity;
-	unsigned int  this_idle_cpus;
-
-	/* Statistics of the busiest group */
-	unsigned int  busiest_idle_cpus;
-	unsigned long max_load;
-	unsigned long busiest_load_per_task;
-	unsigned long busiest_nr_running;
-	unsigned long busiest_group_capacity;
-	unsigned long busiest_has_capacity;
-	unsigned int  busiest_group_weight;
-
-	int group_imb; /* Is there imbalance in this sd */
-};
-
-/*
  * sg_lb_stats - stats of a sched_group required for load_balancing
  */
 struct sg_lb_stats {
@@ -4269,6 +4239,7 @@ struct sg_lb_stats {
 	unsigned long group_load; /* Total load over the CPUs of the group */
 	unsigned long sum_nr_running; /* Nr tasks running in the group */
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
+	unsigned long load_per_task;
 	unsigned long group_capacity;
 	unsigned long idle_cpus;
 	unsigned long group_weight;
@@ -4276,6 +4247,24 @@ struct sg_lb_stats {
 	int group_has_capacity; /* Is there extra capacity in the group? */
 };
 
+/*
+ * sd_lb_stats - Structure to store the statistics of a sched_domain
+ *		during load balancing.
+ */
+struct sd_lb_stats {
+	struct sched_group *busiest; /* Busiest group in this sd */
+	struct sched_group *this;  /* Local group in this sd */
+	unsigned long total_load;  /* Total load of all groups in sd */
+	unsigned long total_pwr;   /*	Total power of all groups in sd */
+	unsigned long sd_avg_load; /* Average load across all groups in sd */
+
+	/* Statistics of this group */
+	struct sg_lb_stats this_stat;
+
+	/* Statistics of the busiest group */
+	struct sg_lb_stats busiest_stat;
+};
+
 /**
  * get_sd_load_idx - Obtain the load index for a given sched domain.
  * @sd: The sched_domain whose load_idx is to be obtained.
@@ -4556,7 +4545,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 				   struct sched_group *sg,
 				   struct sg_lb_stats *sgs)
 {
-	if (sgs->avg_load <= sds->max_load)
+	if (sgs->avg_load <= sds->busiest_stat.avg_load)
 		return false;
 
 	if (sgs->sum_nr_running > sgs->group_capacity)
@@ -4593,7 +4582,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 {
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
-	struct sg_lb_stats sgs;
+	struct sg_lb_stats tmp_sgs;
 	int load_idx, prefer_sibling = 0;
 
 	if (child && child->flags & SD_PREFER_SIBLING)
@@ -4603,13 +4592,16 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 
 	do {
 		int local_group;
+		struct sg_lb_stats *sgs;
 
 		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
-		memset(&sgs, 0, sizeof(sgs));
-		update_sg_lb_stats(env, sg, load_idx, local_group, &sgs);
-
-		sds->total_load += sgs.group_load;
-		sds->total_pwr += sg->sgp->power;
+		if (local_group)
+			sgs = &sds->this_stat;
+		else {
+			sgs = &tmp_sgs;
+			memset(sgs, 0, sizeof(*sgs));
+		}
+		update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
 
 		/*
 		 * In case the child domain prefers tasks go to siblings
@@ -4621,26 +4613,19 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 		 * heaviest group when it is already under-utilized (possible
 		 * with a large weight task outweighs the tasks on the system).
 		 */
-		if (prefer_sibling && !local_group && sds->this_has_capacity)
-			sgs.group_capacity = min(sgs.group_capacity, 1UL);
+		if (prefer_sibling && !local_group &&
+				sds->this && sds->this_stat.group_has_capacity)
+			sgs->group_capacity = min(sgs->group_capacity, 1UL);
 
-		if (local_group) {
-			sds->this_load = sgs.avg_load;
+		/* Now, start updating sd_lb_stats */
+		sds->total_load += sgs->group_load;
+		sds->total_pwr += sg->sgp->power;
+
+		if (local_group)
 			sds->this = sg;
-			sds->this_nr_running = sgs.sum_nr_running;
-			sds->this_load_per_task = sgs.sum_weighted_load;
-			sds->this_has_capacity = sgs.group_has_capacity;
-			sds->this_idle_cpus = sgs.idle_cpus;
-		} else if (update_sd_pick_busiest(env, sds, sg, &sgs)) {
-			sds->max_load = sgs.avg_load;
+		else if (update_sd_pick_busiest(env, sds, sg, sgs)) {
 			sds->busiest = sg;
-			sds->busiest_nr_running = sgs.sum_nr_running;
-			sds->busiest_idle_cpus = sgs.idle_cpus;
-			sds->busiest_group_capacity = sgs.group_capacity;
-			sds->busiest_load_per_task = sgs.sum_weighted_load;
-			sds->busiest_has_capacity = sgs.group_has_capacity;
-			sds->busiest_group_weight = sgs.group_weight;
-			sds->group_imb = sgs.group_imb;
+			sds->busiest_stat = *sgs;
 		}
 
 		sg = sg->next;
@@ -4684,8 +4669,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (env->dst_cpu > busiest_cpu)
 		return 0;
 
-	env->imbalance = DIV_ROUND_CLOSEST(
-		sds->max_load * sds->busiest->sgp->power, SCHED_POWER_SCALE);
+	env->imbalance = DIV_ROUND_CLOSEST(sds->busiest_stat.avg_load *
+				sds->busiest->sgp->power, SCHED_POWER_SCALE);
 
 	return 1;
 }
@@ -4703,24 +4688,22 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 	unsigned long tmp, pwr_now = 0, pwr_move = 0;
 	unsigned int imbn = 2;
 	unsigned long scaled_busy_load_per_task;
+	struct sg_lb_stats *this, *busiest;
 
-	if (sds->this_nr_running) {
-		sds->this_load_per_task /= sds->this_nr_running;
-		if (sds->busiest_load_per_task >
-				sds->this_load_per_task)
+	this = &sds->this_stat;
+	busiest = &sds->busiest_stat;
+
+	if (!this->sum_nr_running)
+		this->load_per_task = cpu_avg_load_per_task(env->dst_cpu);
+	else if (busiest->load_per_task > this->load_per_task)
 			imbn = 1;
-	} else {
-		sds->this_load_per_task =
-			cpu_avg_load_per_task(env->dst_cpu);
-	}
 
-	scaled_busy_load_per_task = sds->busiest_load_per_task
-					 * SCHED_POWER_SCALE;
-	scaled_busy_load_per_task /= sds->busiest->sgp->power;
+	scaled_busy_load_per_task = (busiest->load_per_task *
+			SCHED_POWER_SCALE) / sds->busiest->sgp->power;
 
-	if (sds->max_load - sds->this_load + scaled_busy_load_per_task >=
-			(scaled_busy_load_per_task * imbn)) {
-		env->imbalance = sds->busiest_load_per_task;
+	if (busiest->avg_load - this->avg_load + scaled_busy_load_per_task >=
+		(scaled_busy_load_per_task * imbn)) {
+		env->imbalance = busiest->load_per_task;
 		return;
 	}
 
@@ -4731,33 +4714,34 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 	 */
 
 	pwr_now += sds->busiest->sgp->power *
-			min(sds->busiest_load_per_task, sds->max_load);
+			min(busiest->load_per_task, busiest->avg_load);
 	pwr_now += sds->this->sgp->power *
-			min(sds->this_load_per_task, sds->this_load);
+			min(this->load_per_task, this->avg_load);
 	pwr_now /= SCHED_POWER_SCALE;
 
 	/* Amount of load we'd subtract */
-	tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
+	tmp = (busiest->load_per_task * SCHED_POWER_SCALE) /
 		sds->busiest->sgp->power;
-	if (sds->max_load > tmp)
+	if (busiest->avg_load > tmp)
 		pwr_move += sds->busiest->sgp->power *
-			min(sds->busiest_load_per_task, sds->max_load - tmp);
+			min(busiest->load_per_task,
+				busiest->avg_load - tmp);
 
 	/* Amount of load we'd add */
-	if (sds->max_load * sds->busiest->sgp->power <
-		sds->busiest_load_per_task * SCHED_POWER_SCALE)
-		tmp = (sds->max_load * sds->busiest->sgp->power) /
+	if (busiest->avg_load * sds->busiest->sgp->power <
+		busiest->load_per_task * SCHED_POWER_SCALE)
+		tmp = (busiest->avg_load * sds->busiest->sgp->power) /
 			sds->this->sgp->power;
 	else
-		tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
+		tmp = (busiest->load_per_task * SCHED_POWER_SCALE) /
 			sds->this->sgp->power;
 	pwr_move += sds->this->sgp->power *
-			min(sds->this_load_per_task, sds->this_load + tmp);
+			min(this->load_per_task, this->avg_load + tmp);
 	pwr_move /= SCHED_POWER_SCALE;
 
 	/* Move if we gain throughput */
 	if (pwr_move > pwr_now)
-		env->imbalance = sds->busiest_load_per_task;
+		env->imbalance = busiest->load_per_task;
 }
 
 /**
@@ -4769,11 +4753,21 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 {
 	unsigned long max_pull, load_above_capacity = ~0UL;
+	struct sg_lb_stats *this, *busiest;
+
+	this = &sds->this_stat;
+	if (this->sum_nr_running) {
+		this->load_per_task = this->sum_weighted_load /
+						this->sum_nr_running;
+	}
 
-	sds->busiest_load_per_task /= sds->busiest_nr_running;
-	if (sds->group_imb) {
-		sds->busiest_load_per_task =
-			min(sds->busiest_load_per_task, sds->avg_load);
+	busiest = &sds->busiest_stat;
+	/* busiest must have some tasks */
+	busiest->load_per_task = busiest->sum_weighted_load /
+						busiest->sum_nr_running;
+	if (busiest->group_imb) {
+		busiest->load_per_task =
+			min(busiest->load_per_task, sds->sd_avg_load);
 	}
 
 	/*
@@ -4781,17 +4775,17 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * max load less than avg load(as we skip the groups at or below
 	 * its cpu_power, while calculating max_load..)
 	 */
-	if (sds->max_load < sds->avg_load) {
+	if (busiest->avg_load < this->avg_load) {
 		env->imbalance = 0;
 		return fix_small_imbalance(env, sds);
 	}
 
-	if (!sds->group_imb) {
+	if (!busiest->group_imb) {
 		/*
 		 * Don't want to pull so many tasks that a group would go idle.
 		 */
-		load_above_capacity = (sds->busiest_nr_running -
-						sds->busiest_group_capacity);
+		load_above_capacity = (busiest->sum_nr_running -
+						busiest->group_capacity);
 
 		load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_POWER_SCALE);
 
@@ -4808,12 +4802,13 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * Be careful of negative numbers as they'll appear as very large values
 	 * with unsigned longs.
 	 */
-	max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
+	max_pull = min(busiest->avg_load - sds->sd_avg_load,
+						load_above_capacity);
 
 	/* How much load to actually move to equalise the imbalance */
 	env->imbalance = min(max_pull * sds->busiest->sgp->power,
-		(sds->avg_load - sds->this_load) * sds->this->sgp->power)
-			/ SCHED_POWER_SCALE;
+		(sds->sd_avg_load - this->avg_load) *
+			sds->this->sgp->power) / SCHED_POWER_SCALE;
 
 	/*
 	 * if *imbalance is less than the average load per runnable task
@@ -4821,7 +4816,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * a think about bumping its value to force at least one task to be
 	 * moved
 	 */
-	if (env->imbalance < sds->busiest_load_per_task)
+	if (env->imbalance < busiest->load_per_task)
 		return fix_small_imbalance(env, sds);
 
 }
@@ -4849,6 +4844,7 @@ static struct sched_group *
 find_busiest_group(struct lb_env *env)
 {
 	struct sd_lb_stats sds;
+	struct sg_lb_stats *this, *busiest;
 
 	memset(&sds, 0, sizeof(sds));
 
@@ -4857,42 +4853,44 @@ find_busiest_group(struct lb_env *env)
 	 * this level.
 	 */
 	update_sd_lb_stats(env, &sds);
+	this = &sds.this_stat;
+	busiest = &sds.busiest_stat;
 
 	if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
 	    check_asym_packing(env, &sds))
 		return sds.busiest;
 
 	/* There is no busy sibling group to pull tasks from */
-	if (!sds.busiest || sds.busiest_nr_running == 0)
+	if (!sds.busiest || busiest->sum_nr_running == 0)
 		goto out_balanced;
 
-	sds.avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;
+	sds.sd_avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;
 
 	/*
 	 * If the busiest group is imbalanced the below checks don't
 	 * work because they assumes all things are equal, which typically
 	 * isn't true due to cpus_allowed constraints and the like.
 	 */
-	if (sds.group_imb)
+	if (busiest->group_imb)
 		goto force_balance;
 
 	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE && sds.this_has_capacity &&
-			!sds.busiest_has_capacity)
+	if (env->idle == CPU_NEWLY_IDLE &&
+		this->group_has_capacity && !busiest->group_has_capacity)
 		goto force_balance;
 
 	/*
 	 * If the local group is more busy than the selected busiest group
 	 * don't try and pull any tasks.
 	 */
-	if (sds.this_load >= sds.max_load)
+	if (this->avg_load >= busiest->avg_load)
 		goto out_balanced;
 
 	/*
 	 * Don't pull any tasks if this group is already above the domain
 	 * average load.
 	 */
-	if (sds.this_load >= sds.avg_load)
+	if (this->avg_load >= sds.sd_avg_load)
 		goto out_balanced;
 
 	if (env->idle == CPU_IDLE) {
@@ -4902,15 +4900,16 @@ find_busiest_group(struct lb_env *env)
 		 * there is no imbalance between this and busiest group
 		 * wrt to idle cpu's, it is balanced.
 		 */
-		if ((sds.this_idle_cpus <= sds.busiest_idle_cpus + 1) &&
-		    sds.busiest_nr_running <= sds.busiest_group_weight)
+		if ((this->idle_cpus <= busiest->idle_cpus + 1) &&
+			busiest->sum_nr_running <= busiest->group_weight)
 			goto out_balanced;
 	} else {
 		/*
 		 * In the CPU_NEWLY_IDLE, CPU_NOT_IDLE cases, use
 		 * imbalance_pct to be conservative.
 		 */
-		if (100 * sds.max_load <= env->sd->imbalance_pct * sds.this_load)
+		if (100 * busiest->avg_load <=
+				env->sd->imbalance_pct * this->avg_load)
 			goto out_balanced;
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/3] sched: clean-up struct sd_lb_stat
  2013-08-06  8:36 ` [PATCH v3 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
@ 2013-08-15 11:09   ` Peter Zijlstra
  2013-08-15 11:14     ` Peter Zijlstra
  2013-08-16 17:07     ` JoonSoo Kim
  2013-08-15 18:05   ` Peter Zijlstra
  2013-09-02  7:40   ` [tip:sched/core] sched: Clean-up " tip-bot for Joonsoo Kim
  2 siblings, 2 replies; 16+ messages in thread
From: Peter Zijlstra @ 2013-08-15 11:09 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim

On Tue, Aug 06, 2013 at 05:36:43PM +0900, Joonsoo Kim wrote:
> There is no reason to maintain separate variables for this_group
> and busiest_group in sd_lb_stat, except saving some space.
> But this structure is always allocated in stack, so this saving
> isn't really benificial.
> 
> This patch unify these variables, so IMO, readability may be improved.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

So I like the idea I had to reformat some of the code and I think we can
do less memsets. See how the below reduces the sds memset by the two
sgs. If we never set busiest we'll never look at sds->busiest_stat so we
don't need to clear that. And if we make the sgs memset in
update_sd_lb_stats() unconditional we'll cover sds->this_stats while
reducing complexity.

Still need to stare at your patches in a little more detail.. its far
too easy to mess this up :/

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4567,10 +4567,12 @@ static inline void update_sg_lb_stats(st
 	    (max_nr_running - min_nr_running) > 1)
 		sgs->group_imb = 1;
 
-	sgs->group_capacity = DIV_ROUND_CLOSEST(group->sgp->power,
-						SCHED_POWER_SCALE);
+	sgs->group_capacity = 
+		DIV_ROUND_CLOSEST(group->sgp->power, SCHED_POWER_SCALE);
+
 	if (!sgs->group_capacity)
 		sgs->group_capacity = fix_small_capacity(env->sd, group);
+
 	sgs->group_weight = group->group_weight;
 
 	if (sgs->group_capacity > sgs->sum_nr_running)
@@ -4641,16 +4643,14 @@ static inline void update_sd_lb_stats(st
 	load_idx = get_sd_load_idx(env->sd, env->idle);
 
 	do {
+		struct sg_lb_stats *sgs = &tmp_sgs;
 		int local_group;
-		struct sg_lb_stats *sgs;
 
 		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
 		if (local_group)
 			sgs = &sds->this_stat;
-		else {
-			sgs = &tmp_sgs;
-			memset(sgs, 0, sizeof(*sgs));
-		}
+
+		memset(sgs, 0, sizeof(*sgs));
 		update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
 
 		/*
@@ -4746,13 +4746,14 @@ void fix_small_imbalance(struct lb_env *
 	if (!this->sum_nr_running)
 		this->load_per_task = cpu_avg_load_per_task(env->dst_cpu);
 	else if (busiest->load_per_task > this->load_per_task)
-			imbn = 1;
+		imbn = 1;
 
-	scaled_busy_load_per_task = (busiest->load_per_task *
-			SCHED_POWER_SCALE) / sds->busiest->sgp->power;
+	scaled_busy_load_per_task = 
+		(busiest->load_per_task * SCHED_POWER_SCALE) / 
+		sds->busiest->sgp->power;
 
 	if (busiest->avg_load - this->avg_load + scaled_busy_load_per_task >=
-		(scaled_busy_load_per_task * imbn)) {
+	    (scaled_busy_load_per_task * imbn)) {
 		env->imbalance = busiest->load_per_task;
 		return;
 	}
@@ -4772,19 +4773,21 @@ void fix_small_imbalance(struct lb_env *
 	/* Amount of load we'd subtract */
 	tmp = (busiest->load_per_task * SCHED_POWER_SCALE) /
 		sds->busiest->sgp->power;
-	if (busiest->avg_load > tmp)
+	if (busiest->avg_load > tmp) {
 		pwr_move += sds->busiest->sgp->power *
-			min(busiest->load_per_task,
+			    min(busiest->load_per_task,
 				busiest->avg_load - tmp);
+	}
 
 	/* Amount of load we'd add */
 	if (busiest->avg_load * sds->busiest->sgp->power <
-		busiest->load_per_task * SCHED_POWER_SCALE)
+	    busiest->load_per_task * SCHED_POWER_SCALE) {
 		tmp = (busiest->avg_load * sds->busiest->sgp->power) /
 			sds->this->sgp->power;
-	else
+	} else {
 		tmp = (busiest->load_per_task * SCHED_POWER_SCALE) /
 			sds->this->sgp->power;
+	}
 	pwr_move += sds->this->sgp->power *
 			min(this->load_per_task, this->avg_load + tmp);
 	pwr_move /= SCHED_POWER_SCALE;
@@ -4807,14 +4810,15 @@ static inline void calculate_imbalance(s
 
 	this = &sds->this_stat;
 	if (this->sum_nr_running) {
-		this->load_per_task = this->sum_weighted_load /
-						this->sum_nr_running;
+		this->load_per_task = 
+			this->sum_weighted_load / this->sum_nr_running;
 	}
 
 	busiest = &sds->busiest_stat;
 	/* busiest must have some tasks */
-	busiest->load_per_task = busiest->sum_weighted_load /
-						busiest->sum_nr_running;
+	busiest->load_per_task = 
+		busiest->sum_weighted_load / busiest->sum_nr_running;
+
 	if (busiest->group_imb) {
 		busiest->load_per_task =
 			min(busiest->load_per_task, sds->sd_avg_load);
@@ -4834,11 +4838,10 @@ static inline void calculate_imbalance(s
 		/*
 		 * Don't want to pull so many tasks that a group would go idle.
 		 */
-		load_above_capacity = (busiest->sum_nr_running -
-						busiest->group_capacity);
+		load_above_capacity = 
+			(busiest->sum_nr_running - busiest->group_capacity);
 
 		load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_POWER_SCALE);
-
 		load_above_capacity /= sds->busiest->sgp->power;
 	}
 
@@ -4853,12 +4856,13 @@ static inline void calculate_imbalance(s
 	 * with unsigned longs.
 	 */
 	max_pull = min(busiest->avg_load - sds->sd_avg_load,
-						load_above_capacity);
+		       load_above_capacity);
 
 	/* How much load to actually move to equalise the imbalance */
-	env->imbalance = min(max_pull * sds->busiest->sgp->power,
-		(sds->sd_avg_load - this->avg_load) *
-			sds->this->sgp->power) / SCHED_POWER_SCALE;
+	env->imbalance = min(
+		max_pull * sds->busiest->sgp->power,
+		(sds->sd_avg_load - this->avg_load) * sds->this->sgp->power
+	) / SCHED_POWER_SCALE;
 
 	/*
 	 * if *imbalance is less than the average load per runnable task
@@ -4868,7 +4872,6 @@ static inline void calculate_imbalance(s
 	 */
 	if (env->imbalance < busiest->load_per_task)
 		return fix_small_imbalance(env, sds);
-
 }
 
 /******* find_busiest_group() helpers end here *********************/
@@ -4890,13 +4893,12 @@ static inline void calculate_imbalance(s
  *		   return the least loaded group whose CPUs can be
  *		   put to idle by rebalancing its tasks onto our group.
  */
-static struct sched_group *
-find_busiest_group(struct lb_env *env)
+static struct sched_group *find_busiest_group(struct lb_env *env)
 {
-	struct sd_lb_stats sds;
 	struct sg_lb_stats *this, *busiest;
+	struct sd_lb_stats sds;
 
-	memset(&sds, 0, sizeof(sds));
+	memset(&sds, 0, sizeof(sds) - 2*sizeof(struct sg_lb_stats));
 
 	/*
 	 * Compute the various statistics relavent for load balancing at
@@ -4925,8 +4927,8 @@ find_busiest_group(struct lb_env *env)
 		goto force_balance;
 
 	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE &&
-		this->group_has_capacity && !busiest->group_has_capacity)
+	if (env->idle == CPU_NEWLY_IDLE && this->group_has_capacity && 
+	    !busiest->group_has_capacity)
 		goto force_balance;
 
 	/*
@@ -4951,7 +4953,7 @@ find_busiest_group(struct lb_env *env)
 		 * wrt to idle cpu's, it is balanced.
 		 */
 		if ((this->idle_cpus <= busiest->idle_cpus + 1) &&
-			busiest->sum_nr_running <= busiest->group_weight)
+		    busiest->sum_nr_running <= busiest->group_weight)
 			goto out_balanced;
 	} else {
 		/*

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/3] sched: clean-up struct sd_lb_stat
  2013-08-15 11:09   ` Peter Zijlstra
@ 2013-08-15 11:14     ` Peter Zijlstra
  2013-08-16 17:07     ` JoonSoo Kim
  1 sibling, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2013-08-15 11:14 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim


Another little diff.


--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4647,8 +4647,10 @@ static inline void update_sd_lb_stats(st
 		int local_group;
 
 		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
-		if (local_group)
+		if (local_group) {
+			sds->this = sg;
 			sgs = &sds->this_stat;
+		}
 
 		memset(sgs, 0, sizeof(*sgs));
 		update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
@@ -4671,9 +4673,7 @@ static inline void update_sd_lb_stats(st
 		sds->total_load += sgs->group_load;
 		sds->total_pwr += sg->sgp->power;
 
-		if (local_group)
-			sds->this = sg;
-		else if (update_sd_pick_busiest(env, sds, sg, sgs)) {
+		if (!local_group && update_sd_pick_busiest(env, sds, sg, sgs)) {
 			sds->busiest = sg;
 			sds->busiest_stat = *sgs;
 		}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/3] sched: factor out code to should_we_balance()
  2013-08-06  8:36 ` [PATCH v3 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
@ 2013-08-15 11:19   ` Peter Zijlstra
  2013-08-16 16:52     ` JoonSoo Kim
  2013-08-15 12:34   ` Peter Zijlstra
  2013-09-02  7:40   ` [tip:sched/core] sched: Factor " tip-bot for Joonsoo Kim
  2 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2013-08-15 11:19 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim

On Tue, Aug 06, 2013 at 05:36:42PM +0900, Joonsoo Kim wrote:

Another one of these patches I should stare in more detail at..

>  static int load_balance(int this_cpu, struct rq *this_rq,
>  			struct sched_domain *sd, enum cpu_idle_type idle,
> -			int *balance)
> +			int *should_balance)
>  {
>  	int ld_moved, cur_ld_moved, active_balance = 0;
>  	struct sched_group *group;
> @@ -5073,12 +5070,14 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  
>  	schedstat_inc(sd, lb_count[idle]);
>  
> -redo:
> -	group = find_busiest_group(&env, balance);
> -
> -	if (*balance == 0)
> +	if (!should_we_balance(&env)) {
> +		*should_balance = 0;
>  		goto out_balanced;
> +	} else
> +		*should_balance = 1;

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5120,11 +5120,8 @@ static int load_balance(int this_cpu, st
 
 	schedstat_inc(sd, lb_count[idle]);
 
-	if (!should_we_balance(&env)) {
-		*should_balance = 0;
+	if (!(*should_balance = should_we_balance(&env)))
 		goto out_balanced;
-	} else
-		*should_balance = 1;
 
 redo:
 	group = find_busiest_group(&env);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/3] sched: factor out code to should_we_balance()
  2013-08-06  8:36 ` [PATCH v3 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
  2013-08-15 11:19   ` Peter Zijlstra
@ 2013-08-15 12:34   ` Peter Zijlstra
  2013-08-16 16:57     ` JoonSoo Kim
  2013-09-02  7:40   ` [tip:sched/core] sched: Factor " tip-bot for Joonsoo Kim
  2 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2013-08-15 12:34 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim

On Tue, Aug 06, 2013 at 05:36:42PM +0900, Joonsoo Kim wrote:
> -	if (local_group)
> -		balance_cpu = group_balance_cpu(group);
> -

>  		/* Bias balancing toward cpus of our domain */
> -		if (local_group) {
> -			if (idle_cpu(i) && !first_idle_cpu &&
> -					cpumask_test_cpu(i, sched_group_mask(group))) {
> -				first_idle_cpu = 1;
> -				balance_cpu = i;
> -			}
> -
>  			load = target_load(i, load_idx);
> -		} else {

>  
> -	/*
> -	 * First idle cpu or the first cpu(busiest) in this sched group
> -	 * is eligible for doing load balancing at this and above
> -	 * domains. In the newly idle case, we will allow all the cpu's
> -	 * to do the newly idle load balance.
> -	 */
> -	if (local_group) {
> -		if (env->idle != CPU_NEWLY_IDLE) {
> -			if (balance_cpu != env->dst_cpu) {
> -				*balance = 0;
> -				return;
> -			}
> -			update_group_power(env->sd, env->dst_cpu);
> -		} else if (time_after_eq(jiffies, group->sgp->next_update))
> -			update_group_power(env->sd, env->dst_cpu);
> -	}

>  		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));

> +static int should_we_balance(struct lb_env *env)
> +{
> +	struct sched_group *sg = env->sd->groups;
> +	struct cpumask *sg_cpus, *sg_mask;
> +	int cpu, balance_cpu = -1;
> +
> +	/*
> +	 * In the newly idle case, we will allow all the cpu's
> +	 * to do the newly idle load balance.
> +	 */
> +	if (env->idle == CPU_NEWLY_IDLE)
> +		return 1;
> +
> +	sg_cpus = sched_group_cpus(sg);
> +	sg_mask = sched_group_mask(sg);
> +	/* Try to find first idle cpu */
> +	for_each_cpu_and(cpu, sg_cpus, env->cpus) {
> +		if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))

Did you want to write !idle_cpu() ?

> +			continue;
> +
> +		balance_cpu = cpu;
> +		break;
> +	}
> +
> +	if (balance_cpu == -1)
> +		balance_cpu = group_balance_cpu(sg);
> +
> +	/*
> +	 * First idle cpu or the first cpu(busiest) in this sched group
> +	 * is eligible for doing load balancing at this and above domains.
> +	 */
> +	return balance_cpu != env->dst_cpu;
> +}



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/3] sched: clean-up struct sd_lb_stat
  2013-08-06  8:36 ` [PATCH v3 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
  2013-08-15 11:09   ` Peter Zijlstra
@ 2013-08-15 18:05   ` Peter Zijlstra
  2013-08-16 17:09     ` JoonSoo Kim
  2013-09-02  7:40   ` [tip:sched/core] sched: Clean-up " tip-bot for Joonsoo Kim
  2 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2013-08-15 18:05 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim

On Tue, Aug 06, 2013 at 05:36:43PM +0900, Joonsoo Kim wrote:
> -	if (sds->group_imb) {
> -		sds->busiest_load_per_task =
> -			min(sds->busiest_load_per_task, sds->avg_load);

> +	if (busiest->group_imb) {
> +		busiest->load_per_task =
> +			min(busiest->load_per_task, sds->sd_avg_load);
>  	}

> -	if (sds->max_load < sds->avg_load) {
> +	if (busiest->avg_load < this->avg_load) {

Tsk, inconsistency there.

> -	max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
> +	max_pull = min(busiest->avg_load - sds->sd_avg_load,
> +			load_above_capacity);

you just made it possible for the above subtraction to become negative.

Also not entirely sure why you renamed sds->avg_load; renaming it back.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/3] sched: factor out code to should_we_balance()
  2013-08-15 11:19   ` Peter Zijlstra
@ 2013-08-16 16:52     ` JoonSoo Kim
  0 siblings, 0 replies; 16+ messages in thread
From: JoonSoo Kim @ 2013-08-16 16:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joonsoo Kim, Ingo Molnar, LKML, Mike Galbraith, Paul Turner,
	Alex Shi, Preeti U Murthy, Vincent Guittot, Morten Rasmussen,
	Namhyung Kim

Hello, Peter.

2013/8/15 Peter Zijlstra <peterz@infradead.org>:
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5120,11 +5120,8 @@ static int load_balance(int this_cpu, st
>
>         schedstat_inc(sd, lb_count[idle]);
>
> -       if (!should_we_balance(&env)) {
> -               *should_balance = 0;
> +       if (!(*should_balance = should_we_balance(&env)))
>                 goto out_balanced;
> -       } else
> -               *should_balance = 1;
>
>  redo:
>         group = find_busiest_group(&env);
>

Looks better!

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/3] sched: factor out code to should_we_balance()
  2013-08-15 12:34   ` Peter Zijlstra
@ 2013-08-16 16:57     ` JoonSoo Kim
  0 siblings, 0 replies; 16+ messages in thread
From: JoonSoo Kim @ 2013-08-16 16:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joonsoo Kim, Ingo Molnar, LKML, Mike Galbraith, Paul Turner,
	Alex Shi, Preeti U Murthy, Vincent Guittot, Morten Rasmussen,
	Namhyung Kim

>> +static int should_we_balance(struct lb_env *env)
>> +{
>> +     struct sched_group *sg = env->sd->groups;
>> +     struct cpumask *sg_cpus, *sg_mask;
>> +     int cpu, balance_cpu = -1;
>> +
>> +     /*
>> +      * In the newly idle case, we will allow all the cpu's
>> +      * to do the newly idle load balance.
>> +      */
>> +     if (env->idle == CPU_NEWLY_IDLE)
>> +             return 1;
>> +
>> +     sg_cpus = sched_group_cpus(sg);
>> +     sg_mask = sched_group_mask(sg);
>> +     /* Try to find first idle cpu */
>> +     for_each_cpu_and(cpu, sg_cpus, env->cpus) {
>> +             if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))
>
> Did you want to write !idle_cpu() ?
>

You are right. I made a mistake when I wrote v2 for style change.

Thanks for spotting this.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/3] sched: clean-up struct sd_lb_stat
  2013-08-15 11:09   ` Peter Zijlstra
  2013-08-15 11:14     ` Peter Zijlstra
@ 2013-08-16 17:07     ` JoonSoo Kim
  1 sibling, 0 replies; 16+ messages in thread
From: JoonSoo Kim @ 2013-08-16 17:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joonsoo Kim, Ingo Molnar, LKML, Mike Galbraith, Paul Turner,
	Alex Shi, Preeti U Murthy, Vincent Guittot, Morten Rasmussen,
	Namhyung Kim

2013/8/15 Peter Zijlstra <peterz@infradead.org>:
> On Tue, Aug 06, 2013 at 05:36:43PM +0900, Joonsoo Kim wrote:
>> There is no reason to maintain separate variables for this_group
>> and busiest_group in sd_lb_stat, except saving some space.
>> But this structure is always allocated in stack, so this saving
>> isn't really benificial.
>>
>> This patch unify these variables, so IMO, readability may be improved.
>>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> So I like the idea I had to reformat some of the code and I think we can
> do less memsets. See how the below reduces the sds memset by the two
> sgs. If we never set busiest we'll never look at sds->busiest_stat so we
> don't need to clear that. And if we make the sgs memset in
> update_sd_lb_stats() unconditional we'll cover sds->this_stats while
> reducing complexity.

At first glance, below changes look good.
When I return to the office on next Monday, I will look at it in detail.
Just one comment below.

> @@ -4890,13 +4893,12 @@ static inline void calculate_imbalance(s
>   *                return the least loaded group whose CPUs can be
>   *                put to idle by rebalancing its tasks onto our group.
>   */
> -static struct sched_group *
> -find_busiest_group(struct lb_env *env)
> +static struct sched_group *find_busiest_group(struct lb_env *env)
>  {
> -       struct sd_lb_stats sds;
>         struct sg_lb_stats *this, *busiest;
> +       struct sd_lb_stats sds;
>
> -       memset(&sds, 0, sizeof(sds));
> +       memset(&sds, 0, sizeof(sds) - 2*sizeof(struct sg_lb_stats));

How about using offsetof() macro, instead of using subtraction to
calculate size?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/3] sched: clean-up struct sd_lb_stat
  2013-08-15 18:05   ` Peter Zijlstra
@ 2013-08-16 17:09     ` JoonSoo Kim
  0 siblings, 0 replies; 16+ messages in thread
From: JoonSoo Kim @ 2013-08-16 17:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joonsoo Kim, Ingo Molnar, LKML, Mike Galbraith, Paul Turner,
	Alex Shi, Preeti U Murthy, Vincent Guittot, Morten Rasmussen,
	Namhyung Kim

>> -     if (sds->max_load < sds->avg_load) {
>> +     if (busiest->avg_load < this->avg_load) {
>
> Tsk, inconsistency there.

Okay, this is my mistake.

>
>> -     max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
>> +     max_pull = min(busiest->avg_load - sds->sd_avg_load,
>> +                     load_above_capacity);
>
> you just made it possible for the above subtraction to become negative.
>
> Also not entirely sure why you renamed sds->avg_load; renaming it back.

Okay, sds->avg_load seems to be better :)

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip:sched/core] sched: Remove one division operation in find_busiest_queue()
  2013-08-06  8:36 ` [PATCH v3 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
@ 2013-09-02  7:40   ` tip-bot for Joonsoo Kim
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Joonsoo Kim @ 2013-09-02  7:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, iamjoonsoo.kim, tglx

Commit-ID:  95a79b805b935f4a7b685aa8a117d916c638323e
Gitweb:     http://git.kernel.org/tip/95a79b805b935f4a7b685aa8a117d916c638323e
Author:     Joonsoo Kim <iamjoonsoo.kim@lge.com>
AuthorDate: Tue, 6 Aug 2013 17:36:41 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 2 Sep 2013 08:26:59 +0200

sched: Remove one division operation in find_busiest_queue()

Remove one division operation in find_busiest_queue() by using
crosswise multiplication:

	wl_i / power_i > wl_j / power_j :=
	wl_i * power_j > wl_j * power_i

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
[ Expanded the changelog. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375778203-31343-2-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f918635..8aa217f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4968,7 +4968,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 				     struct sched_group *group)
 {
 	struct rq *busiest = NULL, *rq;
-	unsigned long max_load = 0;
+	unsigned long busiest_load = 0, busiest_power = 1;
 	int i;
 
 	for_each_cpu(i, sched_group_cpus(group)) {
@@ -4998,11 +4998,15 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		 * the weighted_cpuload() scaled with the cpu power, so that
 		 * the load can be moved away from the cpu that is potentially
 		 * running at a lower capacity.
+		 *
+		 * Thus we're looking for max(wl_i / power_i), crosswise
+		 * multiplication to rid ourselves of the division works out
+		 * to: wl_i * power_j > wl_j * power_i;  where j is our
+		 * previous maximum.
 		 */
-		wl = (wl * SCHED_POWER_SCALE) / power;
-
-		if (wl > max_load) {
-			max_load = wl;
+		if (wl * busiest_power > busiest_load * power) {
+			busiest_load = wl;
+			busiest_power = power;
 			busiest = rq;
 		}
 	}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:sched/core] sched: Factor out code to should_we_balance()
  2013-08-06  8:36 ` [PATCH v3 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
  2013-08-15 11:19   ` Peter Zijlstra
  2013-08-15 12:34   ` Peter Zijlstra
@ 2013-09-02  7:40   ` tip-bot for Joonsoo Kim
  2 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Joonsoo Kim @ 2013-09-02  7:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, pjt, iamjoonsoo.kim, tglx

Commit-ID:  23f0d2093c789e612185180c468fa09063834e87
Gitweb:     http://git.kernel.org/tip/23f0d2093c789e612185180c468fa09063834e87
Author:     Joonsoo Kim <iamjoonsoo.kim@lge.com>
AuthorDate: Tue, 6 Aug 2013 17:36:42 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 2 Sep 2013 08:27:34 +0200

sched: Factor out code to should_we_balance()

Now checking whether this cpu is appropriate to balance or not
is embedded into update_sg_lb_stats() and this checking has no direct
relationship to this function. There is not enough reason to place
this checking at update_sg_lb_stats(), except saving one iteration
for sched_group_cpus.

In this patch, I factor out this checking to should_we_balance() function.
And before doing actual work for load_balancing, check whether this cpu is
appropriate to balance via should_we_balance(). If this cpu is not
a candidate for balancing, it quit the work immediately.

With this change, we can save two memset cost and can expect better
compiler optimization.

Below is result of this patch.

 * Vanilla *
   text	   data	    bss	    dec	    hex	filename
  34499	   1136	    116	  35751	   8ba7	kernel/sched/fair.o

 * Patched *
   text	   data	    bss	    dec	    hex	filename
  34243	   1136	    116	  35495	   8aa7	kernel/sched/fair.o

In addition, rename @balance to @continue_balancing in order to represent
its purpose more clearly.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
[ s/should_balance/continue_balancing/g ]
Reviewed-by: Paul Turner <pjt@google.com>
[ Made style changes and a fix in should_we_balance(). ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375778203-31343-3-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 107 ++++++++++++++++++++++++++--------------------------
 1 file changed, 53 insertions(+), 54 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8aa217f..9a6daf8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4463,22 +4463,17 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
  * @group: sched_group whose statistics are to be updated.
  * @load_idx: Load index of sched_domain of this_cpu for load calc.
  * @local_group: Does group contain this_cpu.
- * @balance: Should we balance.
  * @sgs: variable to hold the statistics for this group.
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
 			struct sched_group *group, int load_idx,
-			int local_group, int *balance, struct sg_lb_stats *sgs)
+			int local_group, struct sg_lb_stats *sgs)
 {
 	unsigned long nr_running, max_nr_running, min_nr_running;
 	unsigned long load, max_cpu_load, min_cpu_load;
-	unsigned int balance_cpu = -1, first_idle_cpu = 0;
 	unsigned long avg_load_per_task = 0;
 	int i;
 
-	if (local_group)
-		balance_cpu = group_balance_cpu(group);
-
 	/* Tally up the load of all CPUs in the group */
 	max_cpu_load = 0;
 	min_cpu_load = ~0UL;
@@ -4492,12 +4487,6 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 		/* Bias balancing toward cpus of our domain */
 		if (local_group) {
-			if (idle_cpu(i) && !first_idle_cpu &&
-					cpumask_test_cpu(i, sched_group_mask(group))) {
-				first_idle_cpu = 1;
-				balance_cpu = i;
-			}
-
 			load = target_load(i, load_idx);
 		} else {
 			load = source_load(i, load_idx);
@@ -4519,22 +4508,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			sgs->idle_cpus++;
 	}
 
-	/*
-	 * First idle cpu or the first cpu(busiest) in this sched group
-	 * is eligible for doing load balancing at this and above
-	 * domains. In the newly idle case, we will allow all the cpu's
-	 * to do the newly idle load balance.
-	 */
-	if (local_group) {
-		if (env->idle != CPU_NEWLY_IDLE) {
-			if (balance_cpu != env->dst_cpu) {
-				*balance = 0;
-				return;
-			}
-			update_group_power(env->sd, env->dst_cpu);
-		} else if (time_after_eq(jiffies, group->sgp->next_update))
-			update_group_power(env->sd, env->dst_cpu);
-	}
+	if (local_group && (env->idle != CPU_NEWLY_IDLE ||
+			time_after_eq(jiffies, group->sgp->next_update)))
+		update_group_power(env->sd, env->dst_cpu);
 
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / group->sgp->power;
@@ -4613,7 +4589,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
  * @sds: variable to hold the statistics for this sched_domain.
  */
 static inline void update_sd_lb_stats(struct lb_env *env,
-					int *balance, struct sd_lb_stats *sds)
+					struct sd_lb_stats *sds)
 {
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
@@ -4630,10 +4606,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 
 		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
 		memset(&sgs, 0, sizeof(sgs));
-		update_sg_lb_stats(env, sg, load_idx, local_group, balance, &sgs);
-
-		if (local_group && !(*balance))
-			return;
+		update_sg_lb_stats(env, sg, load_idx, local_group, &sgs);
 
 		sds->total_load += sgs.group_load;
 		sds->total_pwr += sg->sgp->power;
@@ -4866,8 +4839,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
  * to restore balance.
  *
  * @env: The load balancing environment.
- * @balance: Pointer to a variable indicating if this_cpu
- *	is the appropriate cpu to perform load balancing at this_level.
  *
  * Returns:	- the busiest group if imbalance exists.
  *		- If no imbalance and user has opted for power-savings balance,
@@ -4875,7 +4846,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
  *		   put to idle by rebalancing its tasks onto our group.
  */
 static struct sched_group *
-find_busiest_group(struct lb_env *env, int *balance)
+find_busiest_group(struct lb_env *env)
 {
 	struct sd_lb_stats sds;
 
@@ -4885,14 +4856,7 @@ find_busiest_group(struct lb_env *env, int *balance)
 	 * Compute the various statistics relavent for load balancing at
 	 * this level.
 	 */
-	update_sd_lb_stats(env, balance, &sds);
-
-	/*
-	 * this_cpu is not the appropriate cpu to perform load balancing at
-	 * this level.
-	 */
-	if (!(*balance))
-		goto ret;
+	update_sd_lb_stats(env, &sds);
 
 	if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
 	    check_asym_packing(env, &sds))
@@ -4956,7 +4920,6 @@ force_balance:
 	return sds.busiest;
 
 out_balanced:
-ret:
 	env->imbalance = 0;
 	return NULL;
 }
@@ -5043,13 +5006,47 @@ static int need_active_balance(struct lb_env *env)
 
 static int active_load_balance_cpu_stop(void *data);
 
+static int should_we_balance(struct lb_env *env)
+{
+	struct sched_group *sg = env->sd->groups;
+	struct cpumask *sg_cpus, *sg_mask;
+	int cpu, balance_cpu = -1;
+
+	/*
+	 * In the newly idle case, we will allow all the cpu's
+	 * to do the newly idle load balance.
+	 */
+	if (env->idle == CPU_NEWLY_IDLE)
+		return 1;
+
+	sg_cpus = sched_group_cpus(sg);
+	sg_mask = sched_group_mask(sg);
+	/* Try to find first idle cpu */
+	for_each_cpu_and(cpu, sg_cpus, env->cpus) {
+		if (!cpumask_test_cpu(cpu, sg_mask) || !idle_cpu(cpu))
+			continue;
+
+		balance_cpu = cpu;
+		break;
+	}
+
+	if (balance_cpu == -1)
+		balance_cpu = group_balance_cpu(sg);
+
+	/*
+	 * First idle cpu or the first cpu(busiest) in this sched group
+	 * is eligible for doing load balancing at this and above domains.
+	 */
+	return balance_cpu != env->dst_cpu;
+}
+
 /*
  * Check this_cpu to ensure it is balanced within domain. Attempt to move
  * tasks if there is an imbalance.
  */
 static int load_balance(int this_cpu, struct rq *this_rq,
 			struct sched_domain *sd, enum cpu_idle_type idle,
-			int *balance)
+			int *continue_balancing)
 {
 	int ld_moved, cur_ld_moved, active_balance = 0;
 	struct sched_group *group;
@@ -5079,11 +5076,12 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 	schedstat_inc(sd, lb_count[idle]);
 
 redo:
-	group = find_busiest_group(&env, balance);
-
-	if (*balance == 0)
+	if (!should_we_balance(&env)) {
+		*continue_balancing = 0;
 		goto out_balanced;
+	}
 
+	group = find_busiest_group(&env);
 	if (!group) {
 		schedstat_inc(sd, lb_nobusyg[idle]);
 		goto out_balanced;
@@ -5296,7 +5294,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	rcu_read_lock();
 	for_each_domain(this_cpu, sd) {
 		unsigned long interval;
-		int balance = 1;
+		int continue_balancing = 1;
 
 		if (!(sd->flags & SD_LOAD_BALANCE))
 			continue;
@@ -5304,7 +5302,8 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 		if (sd->flags & SD_BALANCE_NEWIDLE) {
 			/* If we've pulled tasks over stop searching: */
 			pulled_task = load_balance(this_cpu, this_rq,
-						   sd, CPU_NEWLY_IDLE, &balance);
+						   sd, CPU_NEWLY_IDLE,
+						   &continue_balancing);
 		}
 
 		interval = msecs_to_jiffies(sd->balance_interval);
@@ -5542,7 +5541,7 @@ void update_max_interval(void)
  */
 static void rebalance_domains(int cpu, enum cpu_idle_type idle)
 {
-	int balance = 1;
+	int continue_balancing = 1;
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long interval;
 	struct sched_domain *sd;
@@ -5574,7 +5573,7 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle)
 		}
 
 		if (time_after_eq(jiffies, sd->last_balance + interval)) {
-			if (load_balance(cpu, rq, sd, idle, &balance)) {
+			if (load_balance(cpu, rq, sd, idle, &continue_balancing)) {
 				/*
 				 * The LBF_SOME_PINNED logic could have changed
 				 * env->dst_cpu, so we can't know our idle
@@ -5597,7 +5596,7 @@ out:
 		 * CPU in our sched group which is doing load balancing more
 		 * actively.
 		 */
-		if (!balance)
+		if (!continue_balancing)
 			break;
 	}
 	rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:sched/core] sched: Clean-up struct sd_lb_stat
  2013-08-06  8:36 ` [PATCH v3 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
  2013-08-15 11:09   ` Peter Zijlstra
  2013-08-15 18:05   ` Peter Zijlstra
@ 2013-09-02  7:40   ` tip-bot for Joonsoo Kim
  2 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Joonsoo Kim @ 2013-09-02  7:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, pjt, iamjoonsoo.kim, tglx

Commit-ID:  56cf515b4b1567c4e8fa9926175b40c66b9ec472
Gitweb:     http://git.kernel.org/tip/56cf515b4b1567c4e8fa9926175b40c66b9ec472
Author:     Joonsoo Kim <iamjoonsoo.kim@lge.com>
AuthorDate: Tue, 6 Aug 2013 17:36:43 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 2 Sep 2013 08:27:35 +0200

sched: Clean-up struct sd_lb_stat

There is no reason to maintain separate variables for this_group
and busiest_group in sd_lb_stat, except saving some space.
But this structure is always allocated in stack, so this saving
isn't really benificial [peterz: reducing stack space is good; in this
case readability increases enough that I think its still beneficial]

This patch unify these variables, so IMO, readability may be improved.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
[ Rename this to local -- avoids confusion between this_cpu and the C++ this pointer. ]
Reviewed-by: Paul  Turner <pjt@google.com>
[ Lots of style edits, a few fixes and a rename. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375778203-31343-4-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 229 ++++++++++++++++++++++++++--------------------------
 1 file changed, 114 insertions(+), 115 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9a6daf8..2da80a5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4232,36 +4232,6 @@ static unsigned long task_h_load(struct task_struct *p)
 
 /********** Helpers for find_busiest_group ************************/
 /*
- * sd_lb_stats - Structure to store the statistics of a sched_domain
- * 		during load balancing.
- */
-struct sd_lb_stats {
-	struct sched_group *busiest; /* Busiest group in this sd */
-	struct sched_group *this;  /* Local group in this sd */
-	unsigned long total_load;  /* Total load of all groups in sd */
-	unsigned long total_pwr;   /*	Total power of all groups in sd */
-	unsigned long avg_load;	   /* Average load across all groups in sd */
-
-	/** Statistics of this group */
-	unsigned long this_load;
-	unsigned long this_load_per_task;
-	unsigned long this_nr_running;
-	unsigned long this_has_capacity;
-	unsigned int  this_idle_cpus;
-
-	/* Statistics of the busiest group */
-	unsigned int  busiest_idle_cpus;
-	unsigned long max_load;
-	unsigned long busiest_load_per_task;
-	unsigned long busiest_nr_running;
-	unsigned long busiest_group_capacity;
-	unsigned long busiest_has_capacity;
-	unsigned int  busiest_group_weight;
-
-	int group_imb; /* Is there imbalance in this sd */
-};
-
-/*
  * sg_lb_stats - stats of a sched_group required for load_balancing
  */
 struct sg_lb_stats {
@@ -4269,6 +4239,7 @@ struct sg_lb_stats {
 	unsigned long group_load; /* Total load over the CPUs of the group */
 	unsigned long sum_nr_running; /* Nr tasks running in the group */
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
+	unsigned long load_per_task;
 	unsigned long group_capacity;
 	unsigned long idle_cpus;
 	unsigned long group_weight;
@@ -4276,6 +4247,21 @@ struct sg_lb_stats {
 	int group_has_capacity; /* Is there extra capacity in the group? */
 };
 
+/*
+ * sd_lb_stats - Structure to store the statistics of a sched_domain
+ *		 during load balancing.
+ */
+struct sd_lb_stats {
+	struct sched_group *busiest;	/* Busiest group in this sd */
+	struct sched_group *local;	/* Local group in this sd */
+	unsigned long total_load;	/* Total load of all groups in sd */
+	unsigned long total_pwr;	/* Total power of all groups in sd */
+	unsigned long avg_load;	/* Average load across all groups in sd */
+
+	struct sg_lb_stats local_stat;	/* Statistics of the local group */
+	struct sg_lb_stats busiest_stat;/* Statistics of the busiest group */
+};
+
 /**
  * get_sd_load_idx - Obtain the load index for a given sched domain.
  * @sd: The sched_domain whose load_idx is to be obtained.
@@ -4490,6 +4476,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			load = target_load(i, load_idx);
 		} else {
 			load = source_load(i, load_idx);
+
 			if (load > max_cpu_load)
 				max_cpu_load = load;
 			if (min_cpu_load > load)
@@ -4531,10 +4518,12 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	    (max_nr_running - min_nr_running) > 1)
 		sgs->group_imb = 1;
 
-	sgs->group_capacity = DIV_ROUND_CLOSEST(group->sgp->power,
-						SCHED_POWER_SCALE);
+	sgs->group_capacity =
+		DIV_ROUND_CLOSEST(group->sgp->power, SCHED_POWER_SCALE);
+
 	if (!sgs->group_capacity)
 		sgs->group_capacity = fix_small_capacity(env->sd, group);
+
 	sgs->group_weight = group->group_weight;
 
 	if (sgs->group_capacity > sgs->sum_nr_running)
@@ -4556,7 +4545,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 				   struct sched_group *sg,
 				   struct sg_lb_stats *sgs)
 {
-	if (sgs->avg_load <= sds->max_load)
+	if (sgs->avg_load <= sds->busiest_stat.avg_load)
 		return false;
 
 	if (sgs->sum_nr_running > sgs->group_capacity)
@@ -4593,7 +4582,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 {
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
-	struct sg_lb_stats sgs;
+	struct sg_lb_stats tmp_sgs;
 	int load_idx, prefer_sibling = 0;
 
 	if (child && child->flags & SD_PREFER_SIBLING)
@@ -4602,14 +4591,17 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 	load_idx = get_sd_load_idx(env->sd, env->idle);
 
 	do {
+		struct sg_lb_stats *sgs = &tmp_sgs;
 		int local_group;
 
 		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
-		memset(&sgs, 0, sizeof(sgs));
-		update_sg_lb_stats(env, sg, load_idx, local_group, &sgs);
+		if (local_group) {
+			sds->local = sg;
+			sgs = &sds->local_stat;
+		}
 
-		sds->total_load += sgs.group_load;
-		sds->total_pwr += sg->sgp->power;
+		memset(sgs, 0, sizeof(*sgs));
+		update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
 
 		/*
 		 * In case the child domain prefers tasks go to siblings
@@ -4621,26 +4613,17 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 		 * heaviest group when it is already under-utilized (possible
 		 * with a large weight task outweighs the tasks on the system).
 		 */
-		if (prefer_sibling && !local_group && sds->this_has_capacity)
-			sgs.group_capacity = min(sgs.group_capacity, 1UL);
+		if (prefer_sibling && !local_group &&
+				sds->local && sds->local_stat.group_has_capacity)
+			sgs->group_capacity = min(sgs->group_capacity, 1UL);
 
-		if (local_group) {
-			sds->this_load = sgs.avg_load;
-			sds->this = sg;
-			sds->this_nr_running = sgs.sum_nr_running;
-			sds->this_load_per_task = sgs.sum_weighted_load;
-			sds->this_has_capacity = sgs.group_has_capacity;
-			sds->this_idle_cpus = sgs.idle_cpus;
-		} else if (update_sd_pick_busiest(env, sds, sg, &sgs)) {
-			sds->max_load = sgs.avg_load;
+		/* Now, start updating sd_lb_stats */
+		sds->total_load += sgs->group_load;
+		sds->total_pwr += sg->sgp->power;
+
+		if (!local_group && update_sd_pick_busiest(env, sds, sg, sgs)) {
 			sds->busiest = sg;
-			sds->busiest_nr_running = sgs.sum_nr_running;
-			sds->busiest_idle_cpus = sgs.idle_cpus;
-			sds->busiest_group_capacity = sgs.group_capacity;
-			sds->busiest_load_per_task = sgs.sum_weighted_load;
-			sds->busiest_has_capacity = sgs.group_has_capacity;
-			sds->busiest_group_weight = sgs.group_weight;
-			sds->group_imb = sgs.group_imb;
+			sds->busiest_stat = *sgs;
 		}
 
 		sg = sg->next;
@@ -4684,8 +4667,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (env->dst_cpu > busiest_cpu)
 		return 0;
 
-	env->imbalance = DIV_ROUND_CLOSEST(
-		sds->max_load * sds->busiest->sgp->power, SCHED_POWER_SCALE);
+	env->imbalance = DIV_ROUND_CLOSEST(sds->busiest_stat.avg_load *
+				sds->busiest->sgp->power, SCHED_POWER_SCALE);
 
 	return 1;
 }
@@ -4703,24 +4686,23 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 	unsigned long tmp, pwr_now = 0, pwr_move = 0;
 	unsigned int imbn = 2;
 	unsigned long scaled_busy_load_per_task;
+	struct sg_lb_stats *local, *busiest;
 
-	if (sds->this_nr_running) {
-		sds->this_load_per_task /= sds->this_nr_running;
-		if (sds->busiest_load_per_task >
-				sds->this_load_per_task)
-			imbn = 1;
-	} else {
-		sds->this_load_per_task =
-			cpu_avg_load_per_task(env->dst_cpu);
-	}
+	local = &sds->local_stat;
+	busiest = &sds->busiest_stat;
 
-	scaled_busy_load_per_task = sds->busiest_load_per_task
-					 * SCHED_POWER_SCALE;
-	scaled_busy_load_per_task /= sds->busiest->sgp->power;
+	if (!local->sum_nr_running)
+		local->load_per_task = cpu_avg_load_per_task(env->dst_cpu);
+	else if (busiest->load_per_task > local->load_per_task)
+		imbn = 1;
 
-	if (sds->max_load - sds->this_load + scaled_busy_load_per_task >=
-			(scaled_busy_load_per_task * imbn)) {
-		env->imbalance = sds->busiest_load_per_task;
+	scaled_busy_load_per_task =
+		(busiest->load_per_task * SCHED_POWER_SCALE) /
+		sds->busiest->sgp->power;
+
+	if (busiest->avg_load - local->avg_load + scaled_busy_load_per_task >=
+	    (scaled_busy_load_per_task * imbn)) {
+		env->imbalance = busiest->load_per_task;
 		return;
 	}
 
@@ -4731,33 +4713,36 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 	 */
 
 	pwr_now += sds->busiest->sgp->power *
-			min(sds->busiest_load_per_task, sds->max_load);
-	pwr_now += sds->this->sgp->power *
-			min(sds->this_load_per_task, sds->this_load);
+			min(busiest->load_per_task, busiest->avg_load);
+	pwr_now += sds->local->sgp->power *
+			min(local->load_per_task, local->avg_load);
 	pwr_now /= SCHED_POWER_SCALE;
 
 	/* Amount of load we'd subtract */
-	tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
+	tmp = (busiest->load_per_task * SCHED_POWER_SCALE) /
 		sds->busiest->sgp->power;
-	if (sds->max_load > tmp)
+	if (busiest->avg_load > tmp) {
 		pwr_move += sds->busiest->sgp->power *
-			min(sds->busiest_load_per_task, sds->max_load - tmp);
+			    min(busiest->load_per_task,
+				busiest->avg_load - tmp);
+	}
 
 	/* Amount of load we'd add */
-	if (sds->max_load * sds->busiest->sgp->power <
-		sds->busiest_load_per_task * SCHED_POWER_SCALE)
-		tmp = (sds->max_load * sds->busiest->sgp->power) /
-			sds->this->sgp->power;
-	else
-		tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
-			sds->this->sgp->power;
-	pwr_move += sds->this->sgp->power *
-			min(sds->this_load_per_task, sds->this_load + tmp);
+	if (busiest->avg_load * sds->busiest->sgp->power <
+	    busiest->load_per_task * SCHED_POWER_SCALE) {
+		tmp = (busiest->avg_load * sds->busiest->sgp->power) /
+			sds->local->sgp->power;
+	} else {
+		tmp = (busiest->load_per_task * SCHED_POWER_SCALE) /
+			sds->local->sgp->power;
+	}
+	pwr_move += sds->local->sgp->power *
+			min(local->load_per_task, local->avg_load + tmp);
 	pwr_move /= SCHED_POWER_SCALE;
 
 	/* Move if we gain throughput */
 	if (pwr_move > pwr_now)
-		env->imbalance = sds->busiest_load_per_task;
+		env->imbalance = busiest->load_per_task;
 }
 
 /**
@@ -4769,11 +4754,22 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 {
 	unsigned long max_pull, load_above_capacity = ~0UL;
+	struct sg_lb_stats *local, *busiest;
+
+	local = &sds->local_stat;
+	if (local->sum_nr_running) {
+		local->load_per_task =
+			local->sum_weighted_load / local->sum_nr_running;
+	}
 
-	sds->busiest_load_per_task /= sds->busiest_nr_running;
-	if (sds->group_imb) {
-		sds->busiest_load_per_task =
-			min(sds->busiest_load_per_task, sds->avg_load);
+	busiest = &sds->busiest_stat;
+	/* busiest must have some tasks */
+	busiest->load_per_task =
+		busiest->sum_weighted_load / busiest->sum_nr_running;
+
+	if (busiest->group_imb) {
+		busiest->load_per_task =
+			min(busiest->load_per_task, sds->avg_load);
 	}
 
 	/*
@@ -4781,20 +4777,19 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * max load less than avg load(as we skip the groups at or below
 	 * its cpu_power, while calculating max_load..)
 	 */
-	if (sds->max_load < sds->avg_load) {
+	if (busiest->avg_load < sds->avg_load) {
 		env->imbalance = 0;
 		return fix_small_imbalance(env, sds);
 	}
 
-	if (!sds->group_imb) {
+	if (!busiest->group_imb) {
 		/*
 		 * Don't want to pull so many tasks that a group would go idle.
 		 */
-		load_above_capacity = (sds->busiest_nr_running -
-						sds->busiest_group_capacity);
+		load_above_capacity =
+			(busiest->sum_nr_running - busiest->group_capacity);
 
 		load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_POWER_SCALE);
-
 		load_above_capacity /= sds->busiest->sgp->power;
 	}
 
@@ -4808,12 +4803,14 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * Be careful of negative numbers as they'll appear as very large values
 	 * with unsigned longs.
 	 */
-	max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
+	max_pull = min(busiest->avg_load - sds->avg_load,
+		       load_above_capacity);
 
 	/* How much load to actually move to equalise the imbalance */
-	env->imbalance = min(max_pull * sds->busiest->sgp->power,
-		(sds->avg_load - sds->this_load) * sds->this->sgp->power)
-			/ SCHED_POWER_SCALE;
+	env->imbalance = min(
+		max_pull * sds->busiest->sgp->power,
+		(sds->avg_load - local->avg_load) * sds->local->sgp->power
+	) / SCHED_POWER_SCALE;
 
 	/*
 	 * if *imbalance is less than the average load per runnable task
@@ -4821,9 +4818,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * a think about bumping its value to force at least one task to be
 	 * moved
 	 */
-	if (env->imbalance < sds->busiest_load_per_task)
+	if (env->imbalance < busiest->load_per_task)
 		return fix_small_imbalance(env, sds);
-
 }
 
 /******* find_busiest_group() helpers end here *********************/
@@ -4845,9 +4841,9 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
  *		   return the least loaded group whose CPUs can be
  *		   put to idle by rebalancing its tasks onto our group.
  */
-static struct sched_group *
-find_busiest_group(struct lb_env *env)
+static struct sched_group *find_busiest_group(struct lb_env *env)
 {
+	struct sg_lb_stats *local, *busiest;
 	struct sd_lb_stats sds;
 
 	memset(&sds, 0, sizeof(sds));
@@ -4857,13 +4853,15 @@ find_busiest_group(struct lb_env *env)
 	 * this level.
 	 */
 	update_sd_lb_stats(env, &sds);
+	local = &sds.local_stat;
+	busiest = &sds.busiest_stat;
 
 	if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
 	    check_asym_packing(env, &sds))
 		return sds.busiest;
 
 	/* There is no busy sibling group to pull tasks from */
-	if (!sds.busiest || sds.busiest_nr_running == 0)
+	if (!sds.busiest || busiest->sum_nr_running == 0)
 		goto out_balanced;
 
 	sds.avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;
@@ -4873,26 +4871,26 @@ find_busiest_group(struct lb_env *env)
 	 * work because they assumes all things are equal, which typically
 	 * isn't true due to cpus_allowed constraints and the like.
 	 */
-	if (sds.group_imb)
+	if (busiest->group_imb)
 		goto force_balance;
 
 	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE && sds.this_has_capacity &&
-			!sds.busiest_has_capacity)
+	if (env->idle == CPU_NEWLY_IDLE && local->group_has_capacity &&
+	    !busiest->group_has_capacity)
 		goto force_balance;
 
 	/*
 	 * If the local group is more busy than the selected busiest group
 	 * don't try and pull any tasks.
 	 */
-	if (sds.this_load >= sds.max_load)
+	if (local->avg_load >= busiest->avg_load)
 		goto out_balanced;
 
 	/*
 	 * Don't pull any tasks if this group is already above the domain
 	 * average load.
 	 */
-	if (sds.this_load >= sds.avg_load)
+	if (local->avg_load >= sds.avg_load)
 		goto out_balanced;
 
 	if (env->idle == CPU_IDLE) {
@@ -4902,15 +4900,16 @@ find_busiest_group(struct lb_env *env)
 		 * there is no imbalance between this and busiest group
 		 * wrt to idle cpu's, it is balanced.
 		 */
-		if ((sds.this_idle_cpus <= sds.busiest_idle_cpus + 1) &&
-		    sds.busiest_nr_running <= sds.busiest_group_weight)
+		if ((local->idle_cpus < busiest->idle_cpus) &&
+		    busiest->sum_nr_running <= busiest->group_weight)
 			goto out_balanced;
 	} else {
 		/*
 		 * In the CPU_NEWLY_IDLE, CPU_NOT_IDLE cases, use
 		 * imbalance_pct to be conservative.
 		 */
-		if (100 * sds.max_load <= env->sd->imbalance_pct * sds.this_load)
+		if (100 * busiest->avg_load <=
+				env->sd->imbalance_pct * local->avg_load)
 			goto out_balanced;
 	}
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-09-02  7:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-06  8:36 [PATCH v3 0/3] optimization, clean-up about fair.c Joonsoo Kim
2013-08-06  8:36 ` [PATCH v3 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
2013-09-02  7:40   ` [tip:sched/core] sched: Remove one division operation in find_busiest_queue() tip-bot for Joonsoo Kim
2013-08-06  8:36 ` [PATCH v3 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
2013-08-15 11:19   ` Peter Zijlstra
2013-08-16 16:52     ` JoonSoo Kim
2013-08-15 12:34   ` Peter Zijlstra
2013-08-16 16:57     ` JoonSoo Kim
2013-09-02  7:40   ` [tip:sched/core] sched: Factor " tip-bot for Joonsoo Kim
2013-08-06  8:36 ` [PATCH v3 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
2013-08-15 11:09   ` Peter Zijlstra
2013-08-15 11:14     ` Peter Zijlstra
2013-08-16 17:07     ` JoonSoo Kim
2013-08-15 18:05   ` Peter Zijlstra
2013-08-16 17:09     ` JoonSoo Kim
2013-09-02  7:40   ` [tip:sched/core] sched: Clean-up " tip-bot for Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox