All of lore.kernel.org
 help / color / mirror / Atom feed
From: santosh.shilimkar@ti.com (Santosh Shilimkar)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC 3/6] sched: pack small tasks
Date: Wed, 24 Oct 2012 20:50:48 +0530	[thread overview]
Message-ID: <50880750.4060809@ti.com> (raw)
In-Reply-To: <1349595838-31274-4-git-send-email-vincent.guittot@linaro.org>

Vincent,

Few comments/questions.

On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
> During sched_domain creation, we define a pack buddy CPU if available.
>
> On a system that share the powerline at all level, the buddy is set to -1
>
> On a dual clusters / dual cores system which can powergate each core and
> cluster independantly, the buddy configuration will be :
>        | CPU0 | CPU1 | CPU2 | CPU3 |
> -----------------------------------
> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
			^
Is that a typo ? Should it be CPU2 instead of
CPU0 ?

> Small tasks tend to slip out of the periodic load balance.
> The best place to choose to migrate them is at their wake up.
>
I have tried this series since I was looking at some of these packing
bits. On Mobile workloads like OSIdle with Screen ON, MP3, gallary,
I did see some additional filtering of threads with this series
but its not making much difference in power. More on this below.

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>   kernel/sched/core.c  |    1 +
>   kernel/sched/fair.c  |  109 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   kernel/sched/sched.h |    1 +
>   3 files changed, 111 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index dab7908..70cadbe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>   	rcu_assign_pointer(rq->sd, sd);
>   	destroy_sched_domains(tmp, cpu);
>
> +	update_packing_domain(cpu);
>   	update_top_cache_domain(cpu);
>   }
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4f4a4f6..8c9d3ed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -157,6 +157,63 @@ void sched_init_granularity(void)
>   	update_sysctl();
>   }
>
> +
> +/*
> + * Save the id of the optimal CPU that should be used to pack small tasks
> + * The value -1 is used when no buddy has been found
> + */
> +DEFINE_PER_CPU(int, sd_pack_buddy);
> +
> +/* Look for the best buddy CPU that can be used to pack small tasks
> + * We make the assumption that it doesn't wort to pack on CPU that share the
s/wort/worth
> + * same powerline. We looks for the 1st sched_domain without the
> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the lowest
> + * power per core based on the assumption that their power efficiency is
> + * better */
Commenting style..
/*
  *
  */

Can you please expand the why the assumption is right ?
"it doesn't wort to pack on CPU that share the same powerline"

Think about a scenario where you have quad core, ducal cluster system

	|Cluster1|			|cluster 2|
| CPU0 | CPU1 | CPU2 | CPU3 |	| CPU0 | CPU1 | CPU2 | CPU3 |


Both clusters run from same voltage rail and have same PLL
clocking them. But the cluster have their own power domain
and all CPU's can power gate them-self to low power states.
Clusters also have their own level2 caches.

In this case, you will still save power if you try to pack
load on one cluster. No ?

> +void update_packing_domain(int cpu)
> +{
> +	struct sched_domain *sd;
> +	int id = -1;
> +
> +	sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE);
> +	if (!sd)
> +		sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
> +	else
> +		sd = sd->parent;
> +
> +	while (sd) {
> +		struct sched_group *sg = sd->groups;
> +		struct sched_group *pack = sg;
> +		struct sched_group *tmp = sg->next;
> +
> +		/* 1st CPU of the sched domain is a good candidate */
> +		if (id == -1)
> +			id = cpumask_first(sched_domain_span(sd));
> +
> +		/* loop the sched groups to find the best one */
> +		while (tmp != sg) {
> +			if (tmp->sgp->power * sg->group_weight <
> +					sg->sgp->power * tmp->group_weight)
> +				pack = tmp;
> +			tmp = tmp->next;
> +		}
> +
> +		/* we have found a better group */
> +		if (pack != sg)
> +			id = cpumask_first(sched_group_cpus(pack));
> +
> +		/* Look for another CPU than itself */
> +		if ((id != cpu)
> +		 || ((sd->parent) && !(sd->parent->flags && SD_LOAD_BALANCE)))
Is the condition "!(sd->parent->flags && SD_LOAD_BALANCE)" for
big.LITTLE kind of system where SD_LOAD_BALANCE may not be used ?

> +			break;
> +
> +		sd = sd->parent;
> +	}
> +
> +	pr_info(KERN_INFO "CPU%d packing on CPU%d\n", cpu, id);
> +	per_cpu(sd_pack_buddy, cpu) = id;
> +}
> +
>   #if BITS_PER_LONG == 32
>   # define WMULT_CONST	(~0UL)
>   #else
> @@ -3073,6 +3130,55 @@ static int select_idle_sibling(struct task_struct *p, int target)
>   	return target;
>   }
>
> +static inline bool is_buddy_busy(int cpu)
> +{
> +	struct rq *rq = cpu_rq(cpu);
> +
> +	/*
> +	 * A busy buddy is a CPU with a high load or a small load with a lot of
> +	 * running tasks.
> +	 */
> +	return ((rq->avg.usage_avg_sum << rq->nr_running) >
> +			rq->avg.runnable_avg_period);
I agree busy CPU is the one with high load, but many small threads may
not make CPU fully busy, right ? Should we just stick to the load
parameter alone here ?

> +}
> +
> +static inline bool is_light_task(struct task_struct *p)
> +{
> +	/* A light task runs less than 25% in average */
> +	return ((p->se.avg.usage_avg_sum << 2) < p->se.avg.runnable_avg_period);
> +}
Since the whole packing logic relies on the light threads only, the
overall effectiveness is not significant. Infact with multiple tries on
Dual core system, I didn't see any major improvement in power. I think
we need to be more aggressive here. From the cover letter, I noticed
that, you were concerned about any performance drop due to packing and
may be that is the reason you chose the conservative threshold. But the
fact is, if we want to save meaningful power, there will be slight
performance drop which is expected.

> +static int check_pack_buddy(int cpu, struct task_struct *p)
> +{
> +	int buddy = per_cpu(sd_pack_buddy, cpu);
> +
> +	/* No pack buddy for this CPU */
> +	if (buddy == -1)
> +		return false;
> +
> +	/*
> +	 * If a task is waiting for running on the CPU which is its own buddy,
> +	 * let the default behavior to look for a better CPU if available
> +	 * The threshold has been set to 37.5%
> +	 */
> +	if ((buddy == cpu)
> +	 && ((p->se.avg.usage_avg_sum << 3) < (p->se.avg.runnable_avg_sum * 5)))
> +		return false;
I lost you here on better CPU , 37.5 % and last two conditions.
Isn't the first condition 'buddy==cpu' enough to return since nothing 
really needs to be done in that case. Can you please expand this a bit?

> +
> +	/* buddy is not an allowed CPU */
> +	if (!cpumask_test_cpu(buddy, tsk_cpus_allowed(p)))
> +		return false;
> +
> +	/*
> +	 * If the task is a small one and the buddy is not overloaded,
> +	 * we use buddy cpu
> +	 */
> +	 if (!is_light_task(p) || is_buddy_busy(buddy))
> +		return false;
This is right but both the evaluation needs update to be effective.

Regards
Santosh

WARNING: multiple messages have this Message-ID (diff)
From: Santosh Shilimkar <santosh.shilimkar@ti.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linaro-dev@lists.linaro.org>, <peterz@infradead.org>,
	<mingo@redhat.com>, <pjt@google.com>, <linux@arm.linux.org.uk>
Subject: Re: [RFC 3/6] sched: pack small tasks
Date: Wed, 24 Oct 2012 20:50:48 +0530	[thread overview]
Message-ID: <50880750.4060809@ti.com> (raw)
In-Reply-To: <1349595838-31274-4-git-send-email-vincent.guittot@linaro.org>

Vincent,

Few comments/questions.

On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
> During sched_domain creation, we define a pack buddy CPU if available.
>
> On a system that share the powerline at all level, the buddy is set to -1
>
> On a dual clusters / dual cores system which can powergate each core and
> cluster independantly, the buddy configuration will be :
>        | CPU0 | CPU1 | CPU2 | CPU3 |
> -----------------------------------
> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
			^
Is that a typo ? Should it be CPU2 instead of
CPU0 ?

> Small tasks tend to slip out of the periodic load balance.
> The best place to choose to migrate them is at their wake up.
>
I have tried this series since I was looking at some of these packing
bits. On Mobile workloads like OSIdle with Screen ON, MP3, gallary,
I did see some additional filtering of threads with this series
but its not making much difference in power. More on this below.

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>   kernel/sched/core.c  |    1 +
>   kernel/sched/fair.c  |  109 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   kernel/sched/sched.h |    1 +
>   3 files changed, 111 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index dab7908..70cadbe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>   	rcu_assign_pointer(rq->sd, sd);
>   	destroy_sched_domains(tmp, cpu);
>
> +	update_packing_domain(cpu);
>   	update_top_cache_domain(cpu);
>   }
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4f4a4f6..8c9d3ed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -157,6 +157,63 @@ void sched_init_granularity(void)
>   	update_sysctl();
>   }
>
> +
> +/*
> + * Save the id of the optimal CPU that should be used to pack small tasks
> + * The value -1 is used when no buddy has been found
> + */
> +DEFINE_PER_CPU(int, sd_pack_buddy);
> +
> +/* Look for the best buddy CPU that can be used to pack small tasks
> + * We make the assumption that it doesn't wort to pack on CPU that share the
s/wort/worth
> + * same powerline. We looks for the 1st sched_domain without the
> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the lowest
> + * power per core based on the assumption that their power efficiency is
> + * better */
Commenting style..
/*
  *
  */

Can you please expand the why the assumption is right ?
"it doesn't wort to pack on CPU that share the same powerline"

Think about a scenario where you have quad core, ducal cluster system

	|Cluster1|			|cluster 2|
| CPU0 | CPU1 | CPU2 | CPU3 |	| CPU0 | CPU1 | CPU2 | CPU3 |


Both clusters run from same voltage rail and have same PLL
clocking them. But the cluster have their own power domain
and all CPU's can power gate them-self to low power states.
Clusters also have their own level2 caches.

In this case, you will still save power if you try to pack
load on one cluster. No ?

> +void update_packing_domain(int cpu)
> +{
> +	struct sched_domain *sd;
> +	int id = -1;
> +
> +	sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE);
> +	if (!sd)
> +		sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
> +	else
> +		sd = sd->parent;
> +
> +	while (sd) {
> +		struct sched_group *sg = sd->groups;
> +		struct sched_group *pack = sg;
> +		struct sched_group *tmp = sg->next;
> +
> +		/* 1st CPU of the sched domain is a good candidate */
> +		if (id == -1)
> +			id = cpumask_first(sched_domain_span(sd));
> +
> +		/* loop the sched groups to find the best one */
> +		while (tmp != sg) {
> +			if (tmp->sgp->power * sg->group_weight <
> +					sg->sgp->power * tmp->group_weight)
> +				pack = tmp;
> +			tmp = tmp->next;
> +		}
> +
> +		/* we have found a better group */
> +		if (pack != sg)
> +			id = cpumask_first(sched_group_cpus(pack));
> +
> +		/* Look for another CPU than itself */
> +		if ((id != cpu)
> +		 || ((sd->parent) && !(sd->parent->flags && SD_LOAD_BALANCE)))
Is the condition "!(sd->parent->flags && SD_LOAD_BALANCE)" for
big.LITTLE kind of system where SD_LOAD_BALANCE may not be used ?

> +			break;
> +
> +		sd = sd->parent;
> +	}
> +
> +	pr_info(KERN_INFO "CPU%d packing on CPU%d\n", cpu, id);
> +	per_cpu(sd_pack_buddy, cpu) = id;
> +}
> +
>   #if BITS_PER_LONG == 32
>   # define WMULT_CONST	(~0UL)
>   #else
> @@ -3073,6 +3130,55 @@ static int select_idle_sibling(struct task_struct *p, int target)
>   	return target;
>   }
>
> +static inline bool is_buddy_busy(int cpu)
> +{
> +	struct rq *rq = cpu_rq(cpu);
> +
> +	/*
> +	 * A busy buddy is a CPU with a high load or a small load with a lot of
> +	 * running tasks.
> +	 */
> +	return ((rq->avg.usage_avg_sum << rq->nr_running) >
> +			rq->avg.runnable_avg_period);
I agree busy CPU is the one with high load, but many small threads may
not make CPU fully busy, right ? Should we just stick to the load
parameter alone here ?

> +}
> +
> +static inline bool is_light_task(struct task_struct *p)
> +{
> +	/* A light task runs less than 25% in average */
> +	return ((p->se.avg.usage_avg_sum << 2) < p->se.avg.runnable_avg_period);
> +}
Since the whole packing logic relies on the light threads only, the
overall effectiveness is not significant. Infact with multiple tries on
Dual core system, I didn't see any major improvement in power. I think
we need to be more aggressive here. From the cover letter, I noticed
that, you were concerned about any performance drop due to packing and
may be that is the reason you chose the conservative threshold. But the
fact is, if we want to save meaningful power, there will be slight
performance drop which is expected.

> +static int check_pack_buddy(int cpu, struct task_struct *p)
> +{
> +	int buddy = per_cpu(sd_pack_buddy, cpu);
> +
> +	/* No pack buddy for this CPU */
> +	if (buddy == -1)
> +		return false;
> +
> +	/*
> +	 * If a task is waiting for running on the CPU which is its own buddy,
> +	 * let the default behavior to look for a better CPU if available
> +	 * The threshold has been set to 37.5%
> +	 */
> +	if ((buddy == cpu)
> +	 && ((p->se.avg.usage_avg_sum << 3) < (p->se.avg.runnable_avg_sum * 5)))
> +		return false;
I lost you here on better CPU , 37.5 % and last two conditions.
Isn't the first condition 'buddy==cpu' enough to return since nothing 
really needs to be done in that case. Can you please expand this a bit?

> +
> +	/* buddy is not an allowed CPU */
> +	if (!cpumask_test_cpu(buddy, tsk_cpus_allowed(p)))
> +		return false;
> +
> +	/*
> +	 * If the task is a small one and the buddy is not overloaded,
> +	 * we use buddy cpu
> +	 */
> +	 if (!is_light_task(p) || is_buddy_busy(buddy))
> +		return false;
This is right but both the evaluation needs update to be effective.

Regards
Santosh

  reply	other threads:[~2012-10-24 15:20 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-07  7:43 [RFC 0/6] sched: packing small tasks Vincent Guittot
2012-10-07  7:43 ` Vincent Guittot
2012-10-07  7:43 ` [RFC 1/6] Revert "sched: introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Vincent Guittot
2012-10-07  7:43   ` Vincent Guittot
2012-10-07  7:43 ` [RFC 2/6] sched: add a new SD SHARE_POWERLINE flag for sched_domain Vincent Guittot
2012-10-07  7:43   ` Vincent Guittot
2012-10-24 15:17   ` Santosh Shilimkar
2012-10-24 15:17     ` Santosh Shilimkar
2012-10-29  9:40     ` Vincent Guittot
2012-10-29  9:40       ` Vincent Guittot
2012-10-29  9:50       ` Vincent Guittot
2012-10-29  9:50         ` Vincent Guittot
2012-11-02 10:27         ` Santosh Shilimkar
2012-11-02 10:27           ` Santosh Shilimkar
2012-10-07  7:43 ` [RFC 3/6] sched: pack small tasks Vincent Guittot
2012-10-07  7:43   ` Vincent Guittot
2012-10-24 15:20   ` Santosh Shilimkar [this message]
2012-10-24 15:20     ` Santosh Shilimkar
2012-10-29 13:12     ` Vincent Guittot
2012-10-29 13:12       ` Vincent Guittot
2012-11-02 10:53       ` Santosh Shilimkar
2012-11-02 10:53         ` Santosh Shilimkar
2012-11-09 16:46         ` Morten Rasmussen
2012-11-09 16:46           ` Morten Rasmussen
2012-11-12 13:13           ` Vincent Guittot
2012-11-12 13:13             ` Vincent Guittot
2012-11-12  9:30         ` Vincent Guittot
2012-11-12  9:30           ` Vincent Guittot
2012-11-09 17:13   ` Morten Rasmussen
2012-11-09 17:13     ` Morten Rasmussen
2012-11-12 13:51     ` Vincent Guittot
2012-11-12 13:51       ` Vincent Guittot
2012-11-20 14:28       ` Morten Rasmussen
2012-11-20 14:28         ` Morten Rasmussen
2012-11-20 16:59         ` Vincent Guittot
2012-11-20 16:59           ` Vincent Guittot
2012-10-07  7:43 ` [RFC 4/6] sched: secure access to other CPU statistics Vincent Guittot
2012-10-07  7:43   ` Vincent Guittot
2012-10-24 15:21   ` Santosh Shilimkar
2012-10-24 15:21     ` Santosh Shilimkar
2012-10-29 13:18     ` Vincent Guittot
2012-10-29 13:18       ` Vincent Guittot
2012-10-07  7:43 ` [RFC 5/6] sched: pack the idle load balance Vincent Guittot
2012-10-07  7:43   ` Vincent Guittot
2012-10-24 15:21   ` Santosh Shilimkar
2012-10-24 15:21     ` Santosh Shilimkar
2012-10-29 13:27     ` Vincent Guittot
2012-10-29 13:27       ` Vincent Guittot
2012-11-02 10:59       ` Santosh Shilimkar
2012-11-02 10:59         ` Santosh Shilimkar
2012-10-07  7:43 ` [RFC 6/6] ARM: sched: clear SD_SHARE_POWERLINE Vincent Guittot
2012-10-07  7:43   ` Vincent Guittot
2012-10-24 15:21   ` Santosh Shilimkar
2012-10-24 15:21     ` Santosh Shilimkar
2012-10-29 13:28     ` Vincent Guittot
2012-10-29 13:28       ` Vincent Guittot
2012-11-02 11:00       ` Santosh Shilimkar
2012-11-02 11:00         ` Santosh Shilimkar
2012-11-12  8:23         ` Vincent Guittot
2012-11-12  8:23           ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50880750.4060809@ti.com \
    --to=santosh.shilimkar@ti.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.