linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] arm: remove cpu_efficiency
@ 2017-10-24 10:27 Dietmar Eggemann
  2017-10-24 10:27 ` [PATCH v3 1/2] arm: topology: " Dietmar Eggemann
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Dietmar Eggemann @ 2017-10-24 10:27 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel; +Cc: Russell King, Vincent Guittot, Juri Lelli

v2: review results [1]:

Vincent pointed out that there are missing of_node_put() calls for cpu
device node in parse_dt_topology(). Added them with an extra patch.

v1: review results [2]:

Vincent pointed out that there is a superfluous continue statement in
parse_dt_topology(). Got rid of it.
Krzysztof Kozlowski took the exynos and Simon Horman the renesas related
dt bits into their for-next (v4.15) branches.

***

For Cortex-A15/A7 arm big.LITTLE systems there are currently two ways to
set the cpu capacity.

The first one (commit 06073ee26775 "ARM: 8621/3: parse cpu
capacity-dmips-mhz from DT") is based on dt 'cpu capacity-dmips-mhz'
bindings and the appropriate dt parsing code in
drivers/base/arch_topology.c. It further takes differences in maximum
cpu frequency values into consideration, normalizes the maximum cpu
capacity to SCHED_CAPACITY_SCALE (1024) and scales all the cpus
accordingly.

  cpu capacity = (capacity-dmips-mhz * max cpu frequency) / 
                 (max capacity-dmips-mhz * max (max cpu frequency)

This solution is shared between arm and arm64 and works for other
combinations of big and little cpus (besides Cortex-A15/A7) as well.

The second one (commit 339ca09d7ada "ARM: 7463/1: topology: Update
cpu_power according to DT information" is based on the 'struct
cpu_efficiency table_efficiency[]' and the dt parsing code in
arch/arm/kernel/topology.c. It further requires a clock-frequency
property per cpu node, calculates a so called middle frequency for an
average cpu in the system which is as close as possible to
SCHED_CAPACITY_SCALE (1024) and uses this to compute the cpu capacity
values.

  cpu capacity = (cpu efficiency * clock frequency) / middle capacity

This solution only works for Cortex-A15/A7 arm big.LITTLE systems.

The aim of this patch is to have only one solution for all arm and arm64
big.LITTLE platforms.

(1) Therefore, it removes the code for the 'cpu_efficiency/
    clock-frequency dt property' (second) solution.

(2) Moreover, it will also assure that the highest original cpu capacity
    (rq->cpu_capacity_orig) in a non-smt system is SCHED_CAPACITY_SCALE
    (1024).

(3) And finally, another advantage is the dynamic detection of the max
    cpu frequency which comes with the first solution instead of the
    static clock-frequency dt property value.

Currently, the arm dt parsing code in parse_dt_topology() checks if the
dt uses the capacity-dmips-mhz property. If this is the case it uses
the first, otherwise the second solution. This patch removes the code
for the second solution from arch/arm/kernel/topology.c.

With the dt related patches for exynos and renesas now in the
appropriated for-next branches for v4.15 there are no Cortex-A15/A7 arm
big.LITTLE systems left relying on the 'cpu_efficiency/clock-frequency
dt property' based solution anymore.

This patch has been tested on TC2 and Samsung Chromebook 2 13"
(peach-pi, Exynos 5800).

[1] https://marc.info/?l=linux-kernel&m=150781686132670&w=2
[2] https://marc.info/?l=linux-kernel&m=150410411807050&w=2

Changes v2->v3:

  - Rebase on top of v4.14-rc6
  - Added missing of_node_put() for cpu device node [02/02]

Changes v1->v2:

  - Rebase on top of v4.14-rc4
  - Remove superfluous continue statement in parse_dt_topology()
    [01/04]
  - Remove 'cpu capacity scale management' and 'cpu capacity table'
    related comments [01/04]
  - Remove dt related patches [02-04/04]

Dietmar Eggemann (2):
  arm: topology: remove cpu_efficiency
  arm: topology: add missing of_node_put() for cpu device node

 arch/arm/kernel/topology.c | 135 ++-------------------------------------------
 1 file changed, 6 insertions(+), 129 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] arm: topology: remove cpu_efficiency
  2017-10-24 10:27 [PATCH v3 0/2] arm: remove cpu_efficiency Dietmar Eggemann
@ 2017-10-24 10:27 ` Dietmar Eggemann
  2017-10-24 10:27 ` [PATCH v3 2/2] arm: topology: add missing of_node_put() for cpu device node Dietmar Eggemann
  2017-10-24 10:52 ` [PATCH v3 0/2] arm: remove cpu_efficiency Russell King - ARM Linux
  2 siblings, 0 replies; 6+ messages in thread
From: Dietmar Eggemann @ 2017-10-24 10:27 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel; +Cc: Russell King, Vincent Guittot, Juri Lelli

Remove the 'cpu_efficiency/clock-frequency dt property' based solution
to set cpu capacity which was only working for Cortex-A15/A7 arm
big.LITTLE systems.

I.e. the 'capacity-dmips-mhz' based solution is now the only one. It is
shared between arm and arm64 and works for every big.LITTLE system no
matter which core types it consists of.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 arch/arm/kernel/topology.c | 132 ++-------------------------------------------
 1 file changed, 4 insertions(+), 128 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 24ac3cab411d..15cc131ae387 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -30,69 +30,11 @@
 #include <asm/cputype.h>
 #include <asm/topology.h>
 
-/*
- * cpu capacity scale management
- */
-
-/*
- * cpu capacity table
- * This per cpu data structure describes the relative capacity of each core.
- * On a heteregenous system, cores don't have the same computation capacity
- * and we reflect that difference in the cpu_capacity field so the scheduler
- * can take this difference into account during load balance. A per cpu
- * structure is preferred because each CPU updates its own cpu_capacity field
- * during the load balance except for idle cores. One idle core is selected
- * to run the rebalance_domains for all idle cores and the cpu_capacity can be
- * updated during this sequence.
- */
-
 #ifdef CONFIG_OF
-struct cpu_efficiency {
-	const char *compatible;
-	unsigned long efficiency;
-};
-
-/*
- * Table of relative efficiency of each processors
- * The efficiency value must fit in 20bit and the final
- * cpu_scale value must be in the range
- *   0 < cpu_scale < 3*SCHED_CAPACITY_SCALE/2
- * in order to return at most 1 when DIV_ROUND_CLOSEST
- * is used to compute the capacity of a CPU.
- * Processors that are not defined in the table,
- * use the default SCHED_CAPACITY_SCALE value for cpu_scale.
- */
-static const struct cpu_efficiency table_efficiency[] = {
-	{"arm,cortex-a15", 3891},
-	{"arm,cortex-a7",  2048},
-	{NULL, },
-};
-
-static unsigned long *__cpu_capacity;
-#define cpu_capacity(cpu)	__cpu_capacity[cpu]
-
-static unsigned long middle_capacity = 1;
-static bool cap_from_dt = true;
-
-/*
- * Iterate all CPUs' descriptor in DT and compute the efficiency
- * (as per table_efficiency). Also calculate a middle efficiency
- * as close as possible to  (max{eff_i} - min{eff_i}) / 2
- * This is later used to scale the cpu_capacity field such that an
- * 'average' CPU is of middle capacity. Also see the comments near
- * table_efficiency[] and update_cpu_capacity().
- */
 static void __init parse_dt_topology(void)
 {
-	const struct cpu_efficiency *cpu_eff;
-	struct device_node *cn = NULL;
-	unsigned long min_capacity = ULONG_MAX;
-	unsigned long max_capacity = 0;
-	unsigned long capacity = 0;
-	int cpu = 0;
-
-	__cpu_capacity = kcalloc(nr_cpu_ids, sizeof(*__cpu_capacity),
-				 GFP_NOWAIT);
+	struct device_node *cn;
+	int cpu;
 
 	cn = of_find_node_by_path("/cpus");
 	if (!cn) {
@@ -101,9 +43,6 @@ static void __init parse_dt_topology(void)
 	}
 
 	for_each_possible_cpu(cpu) {
-		const u32 *rate;
-		int len;
-
 		/* too early to use cpu->of_node */
 		cn = of_get_cpu_node(cpu, NULL);
 		if (!cn) {
@@ -111,76 +50,15 @@ static void __init parse_dt_topology(void)
 			continue;
 		}
 
-		if (topology_parse_cpu_capacity(cn, cpu)) {
+		if (topology_parse_cpu_capacity(cn, cpu))
 			of_node_put(cn);
-			continue;
-		}
-
-		cap_from_dt = false;
-
-		for (cpu_eff = table_efficiency; cpu_eff->compatible; cpu_eff++)
-			if (of_device_is_compatible(cn, cpu_eff->compatible))
-				break;
-
-		if (cpu_eff->compatible == NULL)
-			continue;
-
-		rate = of_get_property(cn, "clock-frequency", &len);
-		if (!rate || len != 4) {
-			pr_err("%pOF missing clock-frequency property\n", cn);
-			continue;
-		}
-
-		capacity = ((be32_to_cpup(rate)) >> 20) * cpu_eff->efficiency;
-
-		/* Save min capacity of the system */
-		if (capacity < min_capacity)
-			min_capacity = capacity;
-
-		/* Save max capacity of the system */
-		if (capacity > max_capacity)
-			max_capacity = capacity;
-
-		cpu_capacity(cpu) = capacity;
 	}
 
-	/* If min and max capacities are equals, we bypass the update of the
-	 * cpu_scale because all CPUs have the same capacity. Otherwise, we
-	 * compute a middle_capacity factor that will ensure that the capacity
-	 * of an 'average' CPU of the system will be as close as possible to
-	 * SCHED_CAPACITY_SCALE, which is the default value, but with the
-	 * constraint explained near table_efficiency[].
-	 */
-	if (4*max_capacity < (3*(max_capacity + min_capacity)))
-		middle_capacity = (min_capacity + max_capacity)
-				>> (SCHED_CAPACITY_SHIFT+1);
-	else
-		middle_capacity = ((max_capacity / 3)
-				>> (SCHED_CAPACITY_SHIFT-1)) + 1;
-
-	if (cap_from_dt)
-		topology_normalize_cpu_scale();
-}
-
-/*
- * Look for a customed capacity of a CPU in the cpu_capacity table during the
- * boot. The update of all CPUs is in O(n^2) for heteregeneous system but the
- * function returns directly for SMP system.
- */
-static void update_cpu_capacity(unsigned int cpu)
-{
-	if (!cpu_capacity(cpu) || cap_from_dt)
-		return;
-
-	topology_set_cpu_scale(cpu, cpu_capacity(cpu) / middle_capacity);
-
-	pr_info("CPU%u: update cpu_capacity %lu\n",
-		cpu, topology_get_cpu_scale(NULL, cpu));
+	topology_normalize_cpu_scale();
 }
 
 #else
 static inline void parse_dt_topology(void) {}
-static inline void update_cpu_capacity(unsigned int cpuid) {}
 #endif
 
  /*
@@ -276,8 +154,6 @@ void store_cpu_topology(unsigned int cpuid)
 
 	update_siblings_masks(cpuid);
 
-	update_cpu_capacity(cpuid);
-
 	pr_info("CPU%u: thread %d, cpu %d, socket %d, mpidr %x\n",
 		cpuid, cpu_topology[cpuid].thread_id,
 		cpu_topology[cpuid].core_id,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] arm: topology: add missing of_node_put() for cpu device node
  2017-10-24 10:27 [PATCH v3 0/2] arm: remove cpu_efficiency Dietmar Eggemann
  2017-10-24 10:27 ` [PATCH v3 1/2] arm: topology: " Dietmar Eggemann
@ 2017-10-24 10:27 ` Dietmar Eggemann
  2017-10-24 10:52 ` [PATCH v3 0/2] arm: remove cpu_efficiency Russell King - ARM Linux
  2 siblings, 0 replies; 6+ messages in thread
From: Dietmar Eggemann @ 2017-10-24 10:27 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel; +Cc: Russell King, Vincent Guittot, Juri Lelli

This patch adds the missing of_node_put() for of_find_node_by_path() and
of_get_cpu_node() in parse_dt_topology().

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 arch/arm/kernel/topology.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 15cc131ae387..81ec42333489 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -41,6 +41,7 @@ static void __init parse_dt_topology(void)
 		pr_err("No CPU information found in DT\n");
 		return;
 	}
+	of_node_put(cn);
 
 	for_each_possible_cpu(cpu) {
 		/* too early to use cpu->of_node */
@@ -50,8 +51,8 @@ static void __init parse_dt_topology(void)
 			continue;
 		}
 
-		if (topology_parse_cpu_capacity(cn, cpu))
-			of_node_put(cn);
+		topology_parse_cpu_capacity(cn, cpu);
+		of_node_put(cn);
 	}
 
 	topology_normalize_cpu_scale();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/2] arm: remove cpu_efficiency
  2017-10-24 10:27 [PATCH v3 0/2] arm: remove cpu_efficiency Dietmar Eggemann
  2017-10-24 10:27 ` [PATCH v3 1/2] arm: topology: " Dietmar Eggemann
  2017-10-24 10:27 ` [PATCH v3 2/2] arm: topology: add missing of_node_put() for cpu device node Dietmar Eggemann
@ 2017-10-24 10:52 ` Russell King - ARM Linux
  2017-10-24 14:31   ` Dietmar Eggemann
  2 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux @ 2017-10-24 10:52 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, linux-arm-kernel, Vincent Guittot, Juri Lelli

On Tue, Oct 24, 2017 at 11:27:16AM +0100, Dietmar Eggemann wrote:
> With the dt related patches for exynos and renesas now in the
> appropriated for-next branches for v4.15 there are no Cortex-A15/A7 arm
> big.LITTLE systems left relying on the 'cpu_efficiency/clock-frequency
> dt property' based solution anymore.

This is way too early to remove support for something that has been
in place since 2012.  As you've just shown, people are using it with
DT files today.  We don't know how long people will persist using
older files, and we don't know whether there are DT files out in the
wild that we don't know about.

Our general rule is that we maintain compatibility for older DT.

NAK.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/2] arm: remove cpu_efficiency
  2017-10-24 10:52 ` [PATCH v3 0/2] arm: remove cpu_efficiency Russell King - ARM Linux
@ 2017-10-24 14:31   ` Dietmar Eggemann
  2017-11-24 11:57     ` Dietmar Eggemann
  0 siblings, 1 reply; 6+ messages in thread
From: Dietmar Eggemann @ 2017-10-24 14:31 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-kernel, linux-arm-kernel, Vincent Guittot, Juri Lelli

Hi Russel,

Thanks for the review!

On 24/10/17 11:52, Russell King - ARM Linux wrote:
> On Tue, Oct 24, 2017 at 11:27:16AM +0100, Dietmar Eggemann wrote:
>> With the dt related patches for exynos and renesas now in the
>> appropriated for-next branches for v4.15 there are no Cortex-A15/A7 arm
>> big.LITTLE systems left relying on the 'cpu_efficiency/clock-frequency
>> dt property' based solution anymore.
> 
> This is way too early to remove support for something that has been
> in place since 2012.  As you've just shown, people are using it with
> DT files today.  We don't know how long people will persist using
> older files, and we don't know whether there are DT files out in the
> wild that we don't know about.

Understood. But do we really care about out of tree dt files?

In case the mentioned exynos and renesas Cortex-A15/A7 platforms change
to using 'capacity-dmips-mhz' in v4.15 there shouldn't be any
Cortex-A15/A7 platforms in mainline left using the cpu_efficiency
solution anymore.

> Our general rule is that we maintain compatibility for older DT.

OK.

> NAK.

Then I still send out [2/2] separately.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/2] arm: remove cpu_efficiency
  2017-10-24 14:31   ` Dietmar Eggemann
@ 2017-11-24 11:57     ` Dietmar Eggemann
  0 siblings, 0 replies; 6+ messages in thread
From: Dietmar Eggemann @ 2017-11-24 11:57 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-kernel, linux-arm-kernel, Vincent Guittot, Juri Lelli,
	Morten Rasmussen

On 10/24/2017 03:31 PM, Dietmar Eggemann wrote:
> Hi Russel,
> 
> Thanks for the review!
> 
> On 24/10/17 11:52, Russell King - ARM Linux wrote:
>> On Tue, Oct 24, 2017 at 11:27:16AM +0100, Dietmar Eggemann wrote:
>>> With the dt related patches for exynos and renesas now in the
>>> appropriated for-next branches for v4.15 there are no Cortex-A15/A7 arm
>>> big.LITTLE systems left relying on the 'cpu_efficiency/clock-frequency
>>> dt property' based solution anymore.
>>
>> This is way too early to remove support for something that has been
>> in place since 2012.  As you've just shown, people are using it with
>> DT files today.  We don't know how long people will persist using
>> older files, and we don't know whether there are DT files out in the
>> wild that we don't know about.
> 
> Understood. But do we really care about out of tree dt files?
> 
> In case the mentioned exynos and renesas Cortex-A15/A7 platforms change
> to using 'capacity-dmips-mhz' in v4.15 there shouldn't be any
> Cortex-A15/A7 platforms in mainline left using the cpu_efficiency
> solution anymore.
> 
>> Our general rule is that we maintain compatibility for older DT.
> 
> OK.
> 
>> NAK.

Morten reminded me that the per-cpu capacity functionality for
heterogeneous systems (different values than the default 1024 
(SCHED_CAPACITY_SCALE)) has been broken since v4.4 (Jan 2016) due to 
8cd5601c5060 "sched/fair: Convert arch_scale_cpu_capacity() from weak 
function to #define".

So even people had cpu clock-frequency specified in their dt for their
Cortex-A15/A7 arm platforms, they didn't noticed (or cared) that the
task scheduler sees all cpus with the capacity of 1024 rather than
different values for the A15's and A7's.

As an example, a v4.3 TC2 (w/ 'clock-frequency = <1000000000>' added for 
A15 and ,clock-frequency = <800000000>, added for a A7) had the 
following cpu capacity values
The log snippet (for cpu0: A15) comes from sched_domain_debug_one() 
[kernel/sched/core.c] which appears if scheduler debugging is enabled:

   ...
   CPU0 attaching sched-domain:
    domain 0: span 0-1 level MC
     groups: 0 (cpu_capacity = 1441) 1 (cpu_capacity = 1441)
     domain 1: span 0-4 level DIE
      groups: 0-1 (cpu_capacity = 2882) 2-4 (cpu_capacity = 1818)
   ...

   So A15 has 1441 and A7 has 606 as cpu capacity.

Whereas with v4.4 it switched back to:

  ...
  CPU0 attaching sched-domain:
   domain 0: span 0-1 level MC
    groups: 0 1
    domain 1: span 0-4 level DIE
     groups: 0-1 (cpu_capacity = 2048) 2-4 (cpu_capacity = 3072)
  ...

    So A15 and A7 have 1024 (SCHED_CAPACITY_SCALE).


So maybe the requirement to maintain compatibility for older DT's is
less important in this specific case and the argument that we can get
rid of a lot of code and make arm/arm64 consistent in this area is more
important than backward compatibility?

BTW, the missing bit, the #define of arch_scale_cpu_capacity, is only
provided with 552c4653bf89 "arm: wire cpu-invariant accounting support
up to the task scheduler" which is currently queued for v4.15-rc1.

-- Dietmar

[...]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-24 11:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-24 10:27 [PATCH v3 0/2] arm: remove cpu_efficiency Dietmar Eggemann
2017-10-24 10:27 ` [PATCH v3 1/2] arm: topology: " Dietmar Eggemann
2017-10-24 10:27 ` [PATCH v3 2/2] arm: topology: add missing of_node_put() for cpu device node Dietmar Eggemann
2017-10-24 10:52 ` [PATCH v3 0/2] arm: remove cpu_efficiency Russell King - ARM Linux
2017-10-24 14:31   ` Dietmar Eggemann
2017-11-24 11:57     ` Dietmar Eggemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).