public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture
@ 2023-09-01  6:52 Shrikanth Hegde
  2023-09-01  9:49 ` Chen Yu
  2023-09-05 14:03 ` Pierre Gondois
  0 siblings, 2 replies; 13+ messages in thread
From: Shrikanth Hegde @ 2023-09-01  6:52 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot
  Cc: sshegde, dietmar.eggemann, vschneid, linux-kernel,
	ionela.voinescu, quentin.perret, srikar, mgorman, mingo,
	yu.c.chen

Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
some of the architectures. IIUC its meant to either force rebuild the
perf domains or cleanup the perf domains by echoing 1 or 0 respectively.

perf domains are not built when there is SMT, or when there is no
Asymmetric CPU topologies or when there is no frequency invariance.
Since such cases EAS is not set and perf domains are not built. By
changing the values of sysctl_sched_energy_aware, its not possible to
force build the perf domains. Hence remove this sysctl on such platforms
that dont support it. Some of the settings can be changed later
such as smt_active by offlining the CPU's, In those cases if
build_perf_domains returns true, re-enable the sysctl.

Anytime, when sysctl_sched_energy_aware is changed sched_energy_update
is set when building the perf domains. Making use of that to find out if
the change is happening by sysctl or dynamic system change.

Taking different cases:
Case1. system while booting has EAS capability, sysctl will be set 1. Hence
perf domains will be built if needed. On changing the sysctl to 0, since
sched_energy_update is true, perf domains would be freed and sysctl will
not be removed. later sysctl is changed to 1, enabling the perf domains
rebuild again. Since sysctl is already there, it will skip register.

Case2. System while booting doesn't have EAS Capability. Later after system
change it becomes capable of EAS. sched_energy_update is false. Though
sysctl is 0, will go ahead and try to enable eas. This is the current
behaviour. if has_eas  is true, then sysctl will be registered. After
that any sysctl change is same as Case1.

Case3. System becomes not capable of EAS due to system change. Here since
sched_energy_update is false, build_perf_domains return has_eas as false
due to one of the checks and Since this is dynamic change remove the sysctl.
Any further change which enables EAS is Case2

Note: This hasn't been tested on platform which supports EAS. If the
change can be verified on that it would really help. This has been
tested on power10 which doesn't support EAS. sysctl_sched_energy_aware
is removed with patch.

changes since v1:
Chen Yu had pointed out that this will not destroy the perf domains on
architectures where EAS is supported by changing the sysctl. This patch
addresses that.
[v1] Link: https://lore.kernel.org/lkml/20230829065040.920629-1-sshegde@linux.vnet.ibm.com/#t

Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
---
 kernel/sched/topology.c | 45 +++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a5bc678c08..4d16269ac21a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -208,7 +208,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)

 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
 DEFINE_STATIC_KEY_FALSE(sched_energy_present);
-static unsigned int sysctl_sched_energy_aware = 1;
+static unsigned int sysctl_sched_energy_aware;
+static struct ctl_table_header *sysctl_eas_header;
 static DEFINE_MUTEX(sched_energy_mutex);
 static bool sched_energy_update;

@@ -226,6 +227,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
 		void *buffer, size_t *lenp, loff_t *ppos)
 {
 	int ret, state;
+	int prev_val = sysctl_sched_energy_aware;

 	if (write && !capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -233,8 +235,11 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (!ret && write) {
 		state = static_branch_unlikely(&sched_energy_present);
-		if (state != sysctl_sched_energy_aware)
+		if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
+			if (sysctl_sched_energy_aware && !state)
+				pr_warn("Attempt to build energy domains when EAS is disabled\n");
 			rebuild_sched_domains_energy();
+		}
 	}

 	return ret;
@@ -255,7 +260,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {

 static int __init sched_energy_aware_sysctl_init(void)
 {
-	register_sysctl_init("kernel", sched_energy_aware_sysctls);
+	int cpu = cpumask_first(cpu_active_mask);
+
+	if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
+	    !arch_scale_freq_invariant())
+		return 0;
+
+	sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
+	sysctl_sched_energy_aware = 1;
 	return 0;
 }

@@ -336,10 +348,28 @@ static void sched_energy_set(bool has_eas)
 		if (sched_debug())
 			pr_info("%s: stopping EAS\n", __func__);
 		static_branch_disable_cpuslocked(&sched_energy_present);
+#ifdef CONFIG_PROC_SYSCTL
+		/*
+		 * if the architecture supports EAS and forcefully
+		 * perf domains are destroyed, there should be a sysctl
+		 * to enable it later. If this was due to dynamic system
+		 * change such as SMT<->NON_SMT then remove sysctl.
+		 */
+		if (sysctl_eas_header && !sched_energy_update) {
+			unregister_sysctl_table(sysctl_eas_header);
+			sysctl_eas_header = NULL;
+		}
+#endif
+		sysctl_sched_energy_aware = 0;
 	} else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
 		if (sched_debug())
 			pr_info("%s: starting EAS\n", __func__);
 		static_branch_enable_cpuslocked(&sched_energy_present);
+#ifdef CONFIG_PROC_SYSCTL
+		if (!sysctl_eas_header)
+			sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
+#endif
+		sysctl_sched_energy_aware = 1;
 	}
 }

@@ -380,15 +410,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
 	struct cpufreq_policy *policy;
 	struct cpufreq_governor *gov;

-	if (!sysctl_sched_energy_aware)
+	if (!sysctl_sched_energy_aware && sched_energy_update)
 		goto free;

 	/* EAS is enabled for asymmetric CPU capacity topologies. */
 	if (!per_cpu(sd_asym_cpucapacity, cpu)) {
-		if (sched_debug()) {
-			pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
-					cpumask_pr_args(cpu_map));
-		}
+		if (sched_debug())
+			pr_info("rd %*pbl: Disabling EAS,  CPUs do not have asymmetric capacities\n",
+				cpumask_pr_args(cpu_map));
 		goto free;
 	}

--
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture
@ 2023-09-01  6:51 Shrikanth Hegde
  2023-09-01  6:57 ` Shrikanth Hegde
  0 siblings, 1 reply; 13+ messages in thread
From: Shrikanth Hegde @ 2023-09-01  6:51 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot
  Cc: sshegde, dietmar.eggemann, vschneid, linux-kernel,
	ionela.voinescu, quentin.perret, srikar, mgorman, mingo

Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
some of the architectures. IIUC its meant to either force rebuild the
perf domains or cleanup the perf domains by echoing 1 or 0 respectively.

perf domains are not built when there is SMT, or when there is no
Asymmetric CPU topologies or when there is no frequency invariance.
Since such cases EAS is not set and perf domains are not built. By
changing the values of sysctl_sched_energy_aware, its not possible to
force build the perf domains. Hence remove this sysctl on such platforms
that dont support it. Some of the settings can be changed later
such as smt_active by offlining the CPU's, In those cases if
build_perf_domains returns true, re-enable the sysctl.

Anytime, when sysctl_sched_energy_aware is changed sched_energy_update
is set when building the perf domains. Making use of that to find out if
the change is happening by sysctl or dynamic system change.

Taking different cases:
Case1. system while booting has EAS capability, sysctl will be set 1. Hence
perf domains will be built if needed. On changing the sysctl to 0, since
sched_energy_update is true, perf domains would be freed and sysctl will
not be removed. later sysctl is changed to 1, enabling the perf domains
rebuild again. Since sysctl is already there, it will skip register.

Case2. System while booting doesn't have EAS Capability. Later after system
change it becomes capable of EAS. sched_energy_update is false. Though
sysctl is 0, will go ahead and try to enable eas. This is the current
behaviour. if has_eas  is true, then sysctl will be registered. After
that any sysctl change is same as Case1.

Case3. System becomes not capable of EAS due to system change. Here since
sched_energy_update is false, build_perf_domains return has_eas as false
due to one of the checks and Since this is dynamic change remove the sysctl.
Any further change which enables EAS is Case2

Note: This hasn't been tested on platform which supports EAS. If the
change can be verified on that it would really help. This has been
tested on power10 which doesn't support EAS. sysctl_sched_energy_aware
is removed with patch.

changes since v1:
Chen Yu had pointed out that this will not destroy the perf domains on
architectures where EAS is supported by changing the sysctl. This patch
addresses that.
[v1] Link: https://lore.kernel.org/lkml/20230829065040.920629-1-sshegde@linux.vnet.ibm.com/#t

Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
---
 kernel/sched/topology.c | 45 +++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a5bc678c08..4d16269ac21a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -208,7 +208,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)

 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
 DEFINE_STATIC_KEY_FALSE(sched_energy_present);
-static unsigned int sysctl_sched_energy_aware = 1;
+static unsigned int sysctl_sched_energy_aware;
+static struct ctl_table_header *sysctl_eas_header;
 static DEFINE_MUTEX(sched_energy_mutex);
 static bool sched_energy_update;

@@ -226,6 +227,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
 		void *buffer, size_t *lenp, loff_t *ppos)
 {
 	int ret, state;
+	int prev_val = sysctl_sched_energy_aware;

 	if (write && !capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -233,8 +235,11 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (!ret && write) {
 		state = static_branch_unlikely(&sched_energy_present);
-		if (state != sysctl_sched_energy_aware)
+		if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
+			if (sysctl_sched_energy_aware && !state)
+				pr_warn("Attempt to build energy domains when EAS is disabled\n");
 			rebuild_sched_domains_energy();
+		}
 	}

 	return ret;
@@ -255,7 +260,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {

 static int __init sched_energy_aware_sysctl_init(void)
 {
-	register_sysctl_init("kernel", sched_energy_aware_sysctls);
+	int cpu = cpumask_first(cpu_active_mask);
+
+	if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
+	    !arch_scale_freq_invariant())
+		return 0;
+
+	sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
+	sysctl_sched_energy_aware = 1;
 	return 0;
 }

@@ -336,10 +348,28 @@ static void sched_energy_set(bool has_eas)
 		if (sched_debug())
 			pr_info("%s: stopping EAS\n", __func__);
 		static_branch_disable_cpuslocked(&sched_energy_present);
+#ifdef CONFIG_PROC_SYSCTL
+		/*
+		 * if the architecture supports EAS and forcefully
+		 * perf domains are destroyed, there should be a sysctl
+		 * to enable it later. If this was due to dynamic system
+		 * change such as SMT<->NON_SMT then remove sysctl.
+		 */
+		if (sysctl_eas_header && !sched_energy_update) {
+			unregister_sysctl_table(sysctl_eas_header);
+			sysctl_eas_header = NULL;
+		}
+#endif
+		sysctl_sched_energy_aware = 0;
 	} else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
 		if (sched_debug())
 			pr_info("%s: starting EAS\n", __func__);
 		static_branch_enable_cpuslocked(&sched_energy_present);
+#ifdef CONFIG_PROC_SYSCTL
+		if (!sysctl_eas_header)
+			sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
+#endif
+		sysctl_sched_energy_aware = 1;
 	}
 }

@@ -380,15 +410,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
 	struct cpufreq_policy *policy;
 	struct cpufreq_governor *gov;

-	if (!sysctl_sched_energy_aware)
+	if (!sysctl_sched_energy_aware && sched_energy_update)
 		goto free;

 	/* EAS is enabled for asymmetric CPU capacity topologies. */
 	if (!per_cpu(sd_asym_cpucapacity, cpu)) {
-		if (sched_debug()) {
-			pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
-					cpumask_pr_args(cpu_map));
-		}
+		if (sched_debug())
+			pr_info("rd %*pbl: Disabling EAS,  CPUs do not have asymmetric capacities\n",
+				cpumask_pr_args(cpu_map));
 		goto free;
 	}

--
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-09-13 11:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-01  6:52 [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture Shrikanth Hegde
2023-09-01  9:49 ` Chen Yu
2023-09-01 13:08   ` Shrikanth Hegde
2023-09-07 10:21   ` Pierre Gondois
2023-09-07 15:04     ` Chen Yu
2023-09-05 14:03 ` Pierre Gondois
2023-09-05 21:53   ` Tim Chen
2023-09-06  3:21   ` Shrikanth Hegde
2023-09-06 11:35   ` Shrikanth Hegde
2023-09-07 10:34     ` Pierre Gondois
2023-09-13 11:55       ` Shrikanth Hegde
  -- strict thread matches above, loose matches on Subject: below --
2023-09-01  6:51 Shrikanth Hegde
2023-09-01  6:57 ` Shrikanth Hegde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox