[PATCH v8 00/10] sched: consolidation of CPU capacity and usage

All of lore.kernel.org
 help / color / mirror / Atom feed

From: kernellwp@gmail.com (Wanpeng Li)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v8 00/10] sched: consolidation of CPU capacity and usage
Date: Mon, 03 Nov 2014 21:03:54 +0800	[thread overview]
Message-ID: <54577D3A.7020306@gmail.com> (raw)
In-Reply-To: <CAKfTPtDmyUBaUQ210w36fuao9iKRGP41MqSRxhhy_3O6k4UNrg@mail.gmail.com>


On 14/11/3 ??6:55, Vincent Guittot wrote:
> On 3 November 2014 03:12, Wanpeng Li <kernellwp@gmail.com> wrote:
>> Hi Vincent,
>> On 14/10/31 ??4:47, Vincent Guittot wrote:
>>> This patchset consolidates several changes in the capacity and the usage
>>> tracking of the CPU. It provides a frequency invariant metric of the usage
>>> of
>>> CPUs and generally improves the accuracy of load/usage tracking in the
>>> scheduler. The frequency invariant metric is the foundation required for
>>> the
>>> consolidation of cpufreq and implementation of a fully invariant load
>>> tracking.
>>> These are currently WIP and require several changes to the load balancer
>>> (including how it will use and interprets load and capacity metrics) and
>>> extensive validation. The frequency invariance is done with
>>> arch_scale_freq_capacity and this patchset doesn't provide the backends of
>>> the function which are architecture dependent.
>>>
>>> As discussed at LPC14, Morten and I have consolidated our changes into a
>>> single
>>> patchset to make it easier to review and merge.
>>>
>>> During load balance, the scheduler evaluates the number of tasks that a
>>> group
>>> of CPUs can handle. The current method assumes that tasks have a fix load
>>> of
>>> SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
>>> This assumption generates wrong decision by creating ghost cores or by
>>
>> I don't know the history, could you explain what's the meaning of 'ghost
>> cores' ?
> The capacity_factor gives the number of tasks that can be handled by a
> group of CPUs by dividing the group's capacity by SCHED_CAPACITY_SCALE
>
> For a system with SMT, the default capacity of a core is 1178 so the
> capacity of each CPU for a dual threads per core is 589.
>
> At CPU level we have a capacity_factor of 1  = div_round_closest(589, 1024)
> At core level we still have a capacity_factor of 1  =
> div_round_closest(1178, 1024).  This is a intended behavior to promote
> 1 task per core
> Then, if we have 4 cores in a node, the capacity_factor is 5 =
> div_round_closest(4712, 1024) whereas we should have 4. So a 5th ghost
> core has appeared in the group and the load balancer will not
> considered the group as overloaded if there is 5 tasks whereas it
> should in order to try to move this 5th task on an idle core (if there
> is one)
> Patch [0] solves some use cases by ensuring that we will not have more
> cores than possible so we can't have more than 4 core for the previous
> example.
> Now, if some RT tasks are running and using almost 1 core (1024 as an
> example), the capacity_factor is still 4 = div_round_closest(3688,
> 1024) whereas a core is nearly fully used and the capacity_factor
> should be 3
>
> [0] https://lkml.org/lkml/2013/8/28/194

Got it, thanks for your great explanation.

Regards,
Wanpeng Li

>
> Regards,
> Vincent
>
>> Regards,
>> Wanpeng Li
>>
>>
>>> removing real ones when the original capacity of CPUs is different from
>>> the
>>> default SCHED_CAPACITY_SCALE. With this patch set, we don't try anymore to
>>> evaluate the number of available cores based on the group_capacity but
>>> instead
>>> we evaluate the usage of a group and compare it with its capacity.
>>>
>>> This patchset mainly replaces the old capacity_factor method by a new one
>>> and
>>> keeps the general policy almost unchanged. These new metrics will be also
>>> used
>>> in later patches.
>>>
> [snip]

WARNING: multiple messages have this Message-ID (diff)

From: Wanpeng Li <kernellwp@gmail.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Preeti U Murthy <preeti@linux.vnet.ibm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	LAK <linux-arm-kernel@lists.infradead.org>,
	Rik van Riel <riel@redhat.com>, Mike Galbraith <efault@gmx.de>,
	Nicolas Pitre <nicolas.pitre@linaro.org>,
	"linaro-kernel@lists.linaro.org" <linaro-kernel@lists.linaro.org>
Subject: Re: [PATCH v8 00/10] sched: consolidation of CPU capacity and usage
Date: Mon, 03 Nov 2014 21:03:54 +0800	[thread overview]
Message-ID: <54577D3A.7020306@gmail.com> (raw)
In-Reply-To: <CAKfTPtDmyUBaUQ210w36fuao9iKRGP41MqSRxhhy_3O6k4UNrg@mail.gmail.com>


On 14/11/3 下午6:55, Vincent Guittot wrote:
> On 3 November 2014 03:12, Wanpeng Li <kernellwp@gmail.com> wrote:
>> Hi Vincent,
>> On 14/10/31 下午4:47, Vincent Guittot wrote:
>>> This patchset consolidates several changes in the capacity and the usage
>>> tracking of the CPU. It provides a frequency invariant metric of the usage
>>> of
>>> CPUs and generally improves the accuracy of load/usage tracking in the
>>> scheduler. The frequency invariant metric is the foundation required for
>>> the
>>> consolidation of cpufreq and implementation of a fully invariant load
>>> tracking.
>>> These are currently WIP and require several changes to the load balancer
>>> (including how it will use and interprets load and capacity metrics) and
>>> extensive validation. The frequency invariance is done with
>>> arch_scale_freq_capacity and this patchset doesn't provide the backends of
>>> the function which are architecture dependent.
>>>
>>> As discussed at LPC14, Morten and I have consolidated our changes into a
>>> single
>>> patchset to make it easier to review and merge.
>>>
>>> During load balance, the scheduler evaluates the number of tasks that a
>>> group
>>> of CPUs can handle. The current method assumes that tasks have a fix load
>>> of
>>> SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
>>> This assumption generates wrong decision by creating ghost cores or by
>>
>> I don't know the history, could you explain what's the meaning of 'ghost
>> cores' ?
> The capacity_factor gives the number of tasks that can be handled by a
> group of CPUs by dividing the group's capacity by SCHED_CAPACITY_SCALE
>
> For a system with SMT, the default capacity of a core is 1178 so the
> capacity of each CPU for a dual threads per core is 589.
>
> At CPU level we have a capacity_factor of 1  = div_round_closest(589, 1024)
> At core level we still have a capacity_factor of 1  =
> div_round_closest(1178, 1024).  This is a intended behavior to promote
> 1 task per core
> Then, if we have 4 cores in a node, the capacity_factor is 5 =
> div_round_closest(4712, 1024) whereas we should have 4. So a 5th ghost
> core has appeared in the group and the load balancer will not
> considered the group as overloaded if there is 5 tasks whereas it
> should in order to try to move this 5th task on an idle core (if there
> is one)
> Patch [0] solves some use cases by ensuring that we will not have more
> cores than possible so we can't have more than 4 core for the previous
> example.
> Now, if some RT tasks are running and using almost 1 core (1024 as an
> example), the capacity_factor is still 4 = div_round_closest(3688,
> 1024) whereas a core is nearly fully used and the capacity_factor
> should be 3
>
> [0] https://lkml.org/lkml/2013/8/28/194

Got it, thanks for your great explanation.

Regards,
Wanpeng Li

>
> Regards,
> Vincent
>
>> Regards,
>> Wanpeng Li
>>
>>
>>> removing real ones when the original capacity of CPUs is different from
>>> the
>>> default SCHED_CAPACITY_SCALE. With this patch set, we don't try anymore to
>>> evaluate the number of available cores based on the group_capacity but
>>> instead
>>> we evaluate the usage of a group and compare it with its capacity.
>>>
>>> This patchset mainly replaces the old capacity_factor method by a new one
>>> and
>>> keeps the general policy almost unchanged. These new metrics will be also
>>> used
>>> in later patches.
>>>
> [snip]

next prev parent reply	other threads:[~2014-11-03 13:03 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-31  8:47 [PATCH v8 00/10] sched: consolidation of CPU capacity and usage Vincent Guittot
2014-10-31  8:47 ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 01/10] sched: add per rq cpu_capacity_orig Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 02/10] sched: remove frequency scaling from cpu_capacity Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 03/10] sched: move cfs task on a CPU with higher capacity Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-11-04  8:30   ` Wanpeng Li
2014-11-04  8:30     ` Wanpeng Li
2014-11-04  9:41     ` Vincent Guittot
2014-11-04  9:41       ` Vincent Guittot
2014-11-04 10:42       ` Wanpeng Li
2014-11-04 10:42         ` Wanpeng Li
2014-11-04 11:10         ` Vincent Guittot
2014-11-04 11:10           ` Vincent Guittot
2014-11-18 10:47   ` Wanpeng Li
2014-11-18 10:47     ` Wanpeng Li
2014-11-18 11:00     ` Vincent Guittot
2014-11-18 11:00       ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 04/10] sched: add utilization_avg_contrib Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 05/10] sched: Track group sched_entity usage contributions Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 06/10] sched: get CPU's usage statistic Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 07/10] sched: replace capacity_factor by usage Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-11-03  7:01   ` Wanpeng Li
2014-11-03  7:01     ` Wanpeng Li
2014-11-03 10:59     ` Vincent Guittot
2014-11-03 10:59       ` Vincent Guittot
2014-11-03 15:29   ` Peter Zijlstra
2014-11-03 15:29     ` Peter Zijlstra
2014-10-31  8:47 ` [PATCH v8 08/10] sched: add SD_PREFER_SIBLING for SMT level Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-11-04  3:21   ` Wanpeng Li
2014-11-04  3:21     ` Wanpeng Li
2014-11-04  8:58     ` Vincent Guittot
2014-11-04  8:58       ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 09/10] sched: Make sched entity usage tracking scale-invariant Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-10-31  8:47 ` [PATCH v8 10/10] sched: make scale_rt invariant with frequency Vincent Guittot
2014-10-31  8:47   ` Vincent Guittot
2014-11-03 15:51   ` Peter Zijlstra
2014-11-03 15:51     ` Peter Zijlstra
2014-11-03 16:14     ` Vincent Guittot
2014-11-03 16:14       ` Vincent Guittot
2014-11-03  2:12 ` [PATCH v8 00/10] sched: consolidation of CPU capacity and usage Wanpeng Li
2014-11-03  2:12   ` Wanpeng Li
2014-11-03 10:55   ` Vincent Guittot
2014-11-03 10:55     ` Vincent Guittot
2014-11-03 13:03     ` Wanpeng Li [this message]
2014-11-03 13:03       ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54577D3A.7020306@gmail.com \
    --to=kernellwp@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.