[PATCH v6 5/6] sched: replace capacity_factor by usage

All of lore.kernel.org
 help / color / mirror / Atom feed

From: kernellwp@gmail.com (Wanpeng Li)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v6 5/6] sched: replace capacity_factor by usage
Date: Sun, 23 Nov 2014 08:22:48 +0800	[thread overview]
Message-ID: <547128D8.6040500@gmail.com> (raw)
In-Reply-To: <CAKfTPtAA1thOXR7tRzMQrnyzuoEysqwmBP_mQ+RSh1yOAG7Jpg@mail.gmail.com>

Hi Vincent,
On 10/3/14, 8:50 PM, Vincent Guittot wrote:
> On 3 October 2014 11:35, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>> On Fri, Oct 03, 2014 at 08:24:23AM +0100, Vincent Guittot wrote:
>>> On 2 October 2014 18:57, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>>>> On Tue, Sep 23, 2014 at 05:08:04PM +0100, Vincent Guittot wrote:
>>>>> Below are two examples to illustrate the problem that this patch solves:
>>>>>
>>>>> 1 - capacity_factor makes the assumption that max capacity of a CPU is
>>>>> SCHED_CAPACITY_SCALE and the load of a thread is always is
>>>>> SCHED_LOAD_SCALE. It compares the output of these figures with the sum
>>>>> of nr_running to decide if a group is overloaded or not.
>>>>>
>>>>> But if the default capacity of a CPU is less than SCHED_CAPACITY_SCALE
>>>>> (640 as an example), a group of 3 CPUS will have a max capacity_factor
>>>>> of 2 ( div_round_closest(3x640/1024) = 2) which means that it will be
>>>>> seen as overloaded if we have only one task per CPU.
>>>> I did some testing on TC2 which has the setup you describe above on the
>>>> A7 cluster when the clock-frequency property is set in DT. The two A15s
>>>> have max capacities above 1024. When using sysbench with five threads I
>>>> still get three tasks on the two A15s and two tasks on the three A7
>>>> leaving one cpu idle (on average).
>>>>
>>>> Using cpu utilization (usage) does correctly identify the A15 cluster as
>>>> overloaded and it even gets as far as selecting the A15 running two
>>>> tasks as the source cpu in load_balance(). However, load_balance() bails
>>>> out without pulling the task due to calculate_imbalance() returning a
>>>> zero imbalance. calculate_imbalance() bails out just before the hunk you
>>>> changed due to comparison of the sched_group avg_loads. sgs->avg_load is
>>>> basically the sum of load in the group divided by its capacity. Since
>>>> load isn't scaled the avg_load of the overloaded A15 group is slightly
>>>> _lower_ than the partially idle A7 group. Hence calculate_imbalance()
>>>> bails out, which isn't what we want.
>>>>
>>>> I think we need to have a closer look at the imbalance calculation code
>>>> and any other users of sgs->avg_load to get rid of all code making
>>>> assumptions about cpu capacity.
>>> We already had this discussion during the review of a previous version
>>> https://lkml.org/lkml/2014/6/3/422
>>> and my answer has not changed; This patch is necessary to solve the 1
>>> task per CPU issue of the HMP system but is not enough. I have a patch
>>> for solving the imbalance calculation issue and i have planned to send
>>> it once this patch will be in a good shape for being accepted by
>>> Peter.
>>>
>>> I don't want to mix this patch and the next one because there are
>>> addressing different problem: this one is how evaluate the capacity of
>>> a system and detect when a group is overloaded and the next one will
>>> handle the case when the balance of the system can't rely on the
>>> average load figures of the group because we have a significant
>>> capacity difference between groups. Not that i haven't specifically
>>> mentioned HMP for the last patch because SMP system can also take
>>> advantage of it
>> You do mention 'big' and 'little' cores in your commit message and quote
>> example numbers with are identical to the cpu capacities for TC2. That
> By last patch, i mean the patch about imbalance that i haven't sent
> yet, but it's not related with this patch
>
>> lead me to believe that this patch would address the issues we see for
>> HMP systems. But that is clearly wrong. I would suggest that you drop
> This patch addresses one issue: correctly detect how much capacity we
> have and correctly detect when the group is overloaded; HMP system

What's the meaning of "HMP system"?

Regards,
Wanpeng Li

> clearly falls in this category but not only.
> This is the only purpose of this patch and not the "1 task per CPU
> issue" that you mentioned.
>
> The second patch is about correctly balance the tasks on system where
> the capacity of a group is significantly less than another group. It
> has nothing to do in capacity computation and overload detection but
> it will use these corrected/new metrics to make a better choice.
>
> Then, there is the "1 task per CPU issue on HMP" that you mentioned
> and this latter will be solved thanks to these 2 patchsets but this is
> not the only/main target of these patchsets so i don't want to reduce
> them into: "solve an HMP issue"
>
>> mentioning big and little cores and stick to only describing cpu
>> capacity reductions due to rt tasks and irq to avoid any confusion about
>> the purpose of the patch. Maybe explicitly say that non-default cpu
>> capacities (capacity_orig) isn't addressed yet.
> But they are addressed by this patchset. you just mixed various goal
> together (see above)
>
>> I think the two problems you describe are very much related. This patch
>> set is half the solution of the HMP balancing problem. We just need the
>> last bit for avg_load and then we can add scale-invariance on top.
> i don't see the link between scale invariance and a bad load-balancing
> choice. The fact that the current load balancer puts more tasks than
> CPUs in a group on HMP system should not influence or break the scale
> invariance load tracking. Isn't it ?
>
> Now, i could send the other patchset but i'm afraid that this will
> generate more confusion than help in the process review.
>
> Regards,
> Vincent
>
>> IMHO, it would be good to have all the bits and pieces for cpu capacity
>> scaling and scaling of per-entity load-tracking on the table so we fit
>> things together. We only have patches for parts of the solution posted
>> so far.
>>
>> Morten
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

WARNING: multiple messages have this Message-ID (diff)

From: Wanpeng Li <kernellwp@gmail.com>
To: Vincent Guittot <vincent.guittot@linaro.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "peterz@infradead.org" <peterz@infradead.org>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
	"linux@arm.linux.org.uk" <linux@arm.linux.org.uk>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"efault@gmx.de" <efault@gmx.de>,
	"nicolas.pitre@linaro.org" <nicolas.pitre@linaro.org>,
	"linaro-kernel@lists.linaro.org" <linaro-kernel@lists.linaro.org>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	Dietmar Eggemann <Dietmar.Eggemann@arm.com>,
	"pjt@google.com" <pjt@google.com>,
	"bsegall@google.com" <bsegall@google.com>
Subject: Re: [PATCH v6 5/6] sched: replace capacity_factor by usage
Date: Sun, 23 Nov 2014 08:22:48 +0800	[thread overview]
Message-ID: <547128D8.6040500@gmail.com> (raw)
In-Reply-To: <CAKfTPtAA1thOXR7tRzMQrnyzuoEysqwmBP_mQ+RSh1yOAG7Jpg@mail.gmail.com>

Hi Vincent,
On 10/3/14, 8:50 PM, Vincent Guittot wrote:
> On 3 October 2014 11:35, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>> On Fri, Oct 03, 2014 at 08:24:23AM +0100, Vincent Guittot wrote:
>>> On 2 October 2014 18:57, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>>>> On Tue, Sep 23, 2014 at 05:08:04PM +0100, Vincent Guittot wrote:
>>>>> Below are two examples to illustrate the problem that this patch solves:
>>>>>
>>>>> 1 - capacity_factor makes the assumption that max capacity of a CPU is
>>>>> SCHED_CAPACITY_SCALE and the load of a thread is always is
>>>>> SCHED_LOAD_SCALE. It compares the output of these figures with the sum
>>>>> of nr_running to decide if a group is overloaded or not.
>>>>>
>>>>> But if the default capacity of a CPU is less than SCHED_CAPACITY_SCALE
>>>>> (640 as an example), a group of 3 CPUS will have a max capacity_factor
>>>>> of 2 ( div_round_closest(3x640/1024) = 2) which means that it will be
>>>>> seen as overloaded if we have only one task per CPU.
>>>> I did some testing on TC2 which has the setup you describe above on the
>>>> A7 cluster when the clock-frequency property is set in DT. The two A15s
>>>> have max capacities above 1024. When using sysbench with five threads I
>>>> still get three tasks on the two A15s and two tasks on the three A7
>>>> leaving one cpu idle (on average).
>>>>
>>>> Using cpu utilization (usage) does correctly identify the A15 cluster as
>>>> overloaded and it even gets as far as selecting the A15 running two
>>>> tasks as the source cpu in load_balance(). However, load_balance() bails
>>>> out without pulling the task due to calculate_imbalance() returning a
>>>> zero imbalance. calculate_imbalance() bails out just before the hunk you
>>>> changed due to comparison of the sched_group avg_loads. sgs->avg_load is
>>>> basically the sum of load in the group divided by its capacity. Since
>>>> load isn't scaled the avg_load of the overloaded A15 group is slightly
>>>> _lower_ than the partially idle A7 group. Hence calculate_imbalance()
>>>> bails out, which isn't what we want.
>>>>
>>>> I think we need to have a closer look at the imbalance calculation code
>>>> and any other users of sgs->avg_load to get rid of all code making
>>>> assumptions about cpu capacity.
>>> We already had this discussion during the review of a previous version
>>> https://lkml.org/lkml/2014/6/3/422
>>> and my answer has not changed; This patch is necessary to solve the 1
>>> task per CPU issue of the HMP system but is not enough. I have a patch
>>> for solving the imbalance calculation issue and i have planned to send
>>> it once this patch will be in a good shape for being accepted by
>>> Peter.
>>>
>>> I don't want to mix this patch and the next one because there are
>>> addressing different problem: this one is how evaluate the capacity of
>>> a system and detect when a group is overloaded and the next one will
>>> handle the case when the balance of the system can't rely on the
>>> average load figures of the group because we have a significant
>>> capacity difference between groups. Not that i haven't specifically
>>> mentioned HMP for the last patch because SMP system can also take
>>> advantage of it
>> You do mention 'big' and 'little' cores in your commit message and quote
>> example numbers with are identical to the cpu capacities for TC2. That
> By last patch, i mean the patch about imbalance that i haven't sent
> yet, but it's not related with this patch
>
>> lead me to believe that this patch would address the issues we see for
>> HMP systems. But that is clearly wrong. I would suggest that you drop
> This patch addresses one issue: correctly detect how much capacity we
> have and correctly detect when the group is overloaded; HMP system

What's the meaning of "HMP system"?

Regards,
Wanpeng Li

> clearly falls in this category but not only.
> This is the only purpose of this patch and not the "1 task per CPU
> issue" that you mentioned.
>
> The second patch is about correctly balance the tasks on system where
> the capacity of a group is significantly less than another group. It
> has nothing to do in capacity computation and overload detection but
> it will use these corrected/new metrics to make a better choice.
>
> Then, there is the "1 task per CPU issue on HMP" that you mentioned
> and this latter will be solved thanks to these 2 patchsets but this is
> not the only/main target of these patchsets so i don't want to reduce
> them into: "solve an HMP issue"
>
>> mentioning big and little cores and stick to only describing cpu
>> capacity reductions due to rt tasks and irq to avoid any confusion about
>> the purpose of the patch. Maybe explicitly say that non-default cpu
>> capacities (capacity_orig) isn't addressed yet.
> But they are addressed by this patchset. you just mixed various goal
> together (see above)
>
>> I think the two problems you describe are very much related. This patch
>> set is half the solution of the HMP balancing problem. We just need the
>> last bit for avg_load and then we can add scale-invariance on top.
> i don't see the link between scale invariance and a bad load-balancing
> choice. The fact that the current load balancer puts more tasks than
> CPUs in a group on HMP system should not influence or break the scale
> invariance load tracking. Isn't it ?
>
> Now, i could send the other patchset but i'm afraid that this will
> generate more confusion than help in the process review.
>
> Regards,
> Vincent
>
>> IMHO, it would be good to have all the bits and pieces for cpu capacity
>> scaling and scaling of per-entity load-tracking on the table so we fit
>> things together. We only have patches for parts of the solution posted
>> so far.
>>
>> Morten
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

next prev parent reply	other threads:[~2014-11-23  0:22 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23 16:07 [PATCH v6 0/6] sched: consolidation of cpu_capacity Vincent Guittot
2014-09-23 16:07 ` Vincent Guittot
2014-09-23 16:08 ` [PATCH v6 1/6] sched: add per rq cpu_capacity_orig Vincent Guittot
2014-09-23 16:08   ` Vincent Guittot
2014-09-23 16:08 ` [PATCH v6 2/6] sched: move cfs task on a CPU with higher capacity Vincent Guittot
2014-09-23 16:08   ` Vincent Guittot
2014-09-23 16:08 ` [PATCH v6 3/6] sched: add utilization_avg_contrib Vincent Guittot
2014-09-23 16:08   ` Vincent Guittot
2014-10-03 14:15   ` Peter Zijlstra
2014-10-03 14:15     ` Peter Zijlstra
2014-10-03 14:44     ` Vincent Guittot
2014-10-03 14:44       ` Vincent Guittot
2014-10-03 14:36   ` Peter Zijlstra
2014-10-03 14:36     ` Peter Zijlstra
2014-10-03 14:51     ` Vincent Guittot
2014-10-03 14:51       ` Vincent Guittot
2014-10-03 15:14       ` Peter Zijlstra
2014-10-03 15:14         ` Peter Zijlstra
2014-10-03 16:05         ` Morten Rasmussen
2014-10-03 16:05           ` Morten Rasmussen
2014-09-23 16:08 ` [PATCH v6 4/6] sched: get CPU's usage statistic Vincent Guittot
2014-09-23 16:08   ` Vincent Guittot
2014-09-25 19:05   ` Dietmar Eggemann
2014-09-25 19:05     ` Dietmar Eggemann
2014-09-26 12:17     ` Vincent Guittot
2014-09-26 12:17       ` Vincent Guittot
2014-09-26 15:58       ` Morten Rasmussen
2014-09-26 15:58         ` Morten Rasmussen
2014-09-26 19:57       ` Dietmar Eggemann
2014-09-26 19:57         ` Dietmar Eggemann
2014-11-21  5:36       ` Wanpeng Li
2014-11-21  5:36         ` Wanpeng Li
2014-11-21 12:17         ` Vincent Guittot
2014-11-21 12:17           ` Vincent Guittot
2014-09-23 16:08 ` [PATCH v6 5/6] sched: replace capacity_factor by usage Vincent Guittot
2014-09-23 16:08   ` Vincent Guittot
2014-09-24 17:48   ` Dietmar Eggemann
2014-09-24 17:48     ` Dietmar Eggemann
2014-09-25  8:35     ` Vincent Guittot
2014-09-25  8:35       ` Vincent Guittot
2014-09-25 19:19       ` Dietmar Eggemann
2014-09-25 19:19         ` Dietmar Eggemann
2014-09-26 12:39         ` Vincent Guittot
2014-09-26 12:39           ` Vincent Guittot
2014-09-26 14:00           ` Dietmar Eggemann
2014-09-26 14:00             ` Dietmar Eggemann
2014-09-25  8:38   ` Vincent Guittot
2014-09-25  8:38     ` Vincent Guittot
2014-09-29 13:39   ` Dietmar Eggemann
2014-09-29 13:39     ` Dietmar Eggemann
2014-10-02 16:57   ` Morten Rasmussen
2014-10-02 16:57     ` Morten Rasmussen
2014-10-03  7:24     ` Vincent Guittot
2014-10-03  7:24       ` Vincent Guittot
2014-10-03  9:35       ` Morten Rasmussen
2014-10-03  9:35         ` Morten Rasmussen
2014-10-03 12:50         ` Vincent Guittot
2014-10-03 12:50           ` Vincent Guittot
2014-11-23  0:22           ` Wanpeng Li [this message]
2014-11-23  0:22             ` Wanpeng Li
2014-11-24  8:26             ` Vincent Guittot
2014-11-24  8:26               ` Vincent Guittot
2014-10-03 15:38   ` Peter Zijlstra
2014-10-03 15:38     ` Peter Zijlstra
2014-10-06  8:55     ` Vincent Guittot
2014-10-06  8:55       ` Vincent Guittot
2014-09-23 16:08 ` [PATCH v6 6/6] sched: add SD_PREFER_SIBLING for SMT level Vincent Guittot
2014-09-23 16:08   ` Vincent Guittot
2014-09-24 12:27   ` Preeti U Murthy
2014-09-24 12:27     ` Preeti U Murthy
2014-09-25 12:10     ` Vincent Guittot
2014-09-25 12:10       ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=547128D8.6040500@gmail.com \
    --to=kernellwp@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.