[RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative
@ 2025-03-07 19:12 Rafael J. Wysocki
  2025-03-07 19:15 ` [RFC][PATCH v0.3 1/6] cpufreq/sched: schedutil: Add helper for governor checks Rafael J. Wysocki
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-03-07 19:12 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois, Christian Loehle

Hi Everyone,

This is a new take on the "EAS for intel_pstate" work:

https://lore.kernel.org/linux-pm/5861970.DvuYhMxLoT@rjwysocki.net/

with refreshed preparatory patches and a revised energy model design.

The following paragraph from the original cover letter still applies:

"The underlying observation is that on the platforms targeted by these changes,
Lunar Lake at the time of this writing, the "small" CPUs (E-cores), when run at
the same performance level, are always more energy-efficient than the "big" or
"performance" CPUs (P-cores).  This means that, regardless of the scale-
invariant utilization of a task, as long as there is enough spare capacity on
E-cores, the relative cost of running it there is always lower."

However, this time perf domains are registered per CPU and in addition to the
primary cost component, which is related to the CPU type, there is a small
component proportional to performance whose role is to help balance the load
between CPUs of the same type.

This is done to avoid migrating tasks too much between CPUs of the same type,
especially between E-cores, which has been observed in tests of the previous
iteration of this work.

The expected effect is still that the CPUs of the "low-cost" type will be
preferred so long as there is enough spare capacity on any of them.

The first two patches in the series rearrange cpufreq checks related to EAS so
that sched_is_eas_possible() doesn't have to access cpufreq internals directly
and patch [3/6] changes those checks to also allow EAS to be used with cpufreq
drivers that implement internal governors (like intel_pstate).

Patches [4-5/6] deal with the Energy Model code.  Patch [4/6] simply rearranges
it so as to allow the next patch to be simpler and patch [5/6] adds a function
that's used in the last patch.

Patch [6/6] is the actual intel_pstate modification which now is significantly
simpler than before because it doesn't need to track the type of each CPU
directly in order to put into the right perf domain.

Please refer to the individual patch changelogs for details.

For easier access, the series is available on the experimental/intel_pstate/eas-take2
branch in linux-pm.git:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
experimental/intel_pstate/eas-take2

or

https://web.git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=experimental/intel_pstate/eas-take2

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH v0.3 1/6] cpufreq/sched: schedutil: Add helper for governor checks
  2025-03-07 19:12 [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Rafael J. Wysocki
@ 2025-03-07 19:15 ` Rafael J. Wysocki
  2025-03-07 19:16 ` [RFC][PATCH v0.3 2/6] cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq Rafael J. Wysocki
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-03-07 19:15 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois, Christian Loehle, Viresh Kumar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add a helper for checking if schedutil is the current governor for
a given cpufreq policy and use it in sched_is_eas_possible() to avoid
accessing cpufreq policy internals directly from there.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 include/linux/cpufreq.h          |    9 +++++++++
 kernel/sched/cpufreq_schedutil.c |    9 +++++++--
 kernel/sched/sched.h             |    2 --
 kernel/sched/topology.c          |    6 +++---
 4 files changed, 19 insertions(+), 7 deletions(-)

--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -641,6 +641,15 @@
 struct cpufreq_governor *cpufreq_default_governor(void);
 struct cpufreq_governor *cpufreq_fallback_governor(void);
 
+#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
+bool sugov_is_cpufreq_governor(struct cpufreq_policy *policy);
+#else
+static inline bool sugov_is_cpufreq_governor(struct cpufreq_policy *policy)
+{
+	return false;
+}
+#endif
+
 static inline void cpufreq_policy_apply_limits(struct cpufreq_policy *policy)
 {
 	if (policy->max < policy->cur)
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -604,7 +604,7 @@
 
 /********************** cpufreq governor interface *********************/
 
-struct cpufreq_governor schedutil_gov;
+static struct cpufreq_governor schedutil_gov;
 
 static struct sugov_policy *sugov_policy_alloc(struct cpufreq_policy *policy)
 {
@@ -874,7 +874,7 @@
 	sg_policy->limits_changed = true;
 }
 
-struct cpufreq_governor schedutil_gov = {
+static struct cpufreq_governor schedutil_gov = {
 	.name			= "schedutil",
 	.owner			= THIS_MODULE,
 	.flags			= CPUFREQ_GOV_DYNAMIC_SWITCHING,
@@ -892,4 +892,9 @@
 }
 #endif
 
+bool sugov_is_cpufreq_governor(struct cpufreq_policy *policy)
+{
+	return policy->governor == &schedutil_gov;
+}
+
 cpufreq_governor_init(schedutil_gov);
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3552,8 +3552,6 @@
 	return static_branch_unlikely(&sched_energy_present);
 }
 
-extern struct cpufreq_governor schedutil_gov;
-
 #else /* ! (CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL) */
 
 #define perf_domain_span(pd) NULL
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -216,7 +216,7 @@
 {
 	bool any_asym_capacity = false;
 	struct cpufreq_policy *policy;
-	struct cpufreq_governor *gov;
+	bool policy_is_ready;
 	int i;
 
 	/* EAS is enabled for asymmetric CPU capacity topologies. */
@@ -261,9 +261,9 @@
 			}
 			return false;
 		}
-		gov = policy->governor;
+		policy_is_ready = sugov_is_cpufreq_governor(policy);
 		cpufreq_cpu_put(policy);
-		if (gov != &schedutil_gov) {
+		if (!policy_is_ready) {
 			if (sched_debug()) {
 				pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
 					cpumask_pr_args(cpu_mask));




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH v0.3 2/6] cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq
  2025-03-07 19:12 [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Rafael J. Wysocki
  2025-03-07 19:15 ` [RFC][PATCH v0.3 1/6] cpufreq/sched: schedutil: Add helper for governor checks Rafael J. Wysocki
@ 2025-03-07 19:16 ` Rafael J. Wysocki
  2025-03-07 19:16 ` [RFC][PATCH v0.3 3/6] cpufreq/sched: Allow .setpolicy() cpufreq drivers to enable EAS Rafael J. Wysocki
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-03-07 19:16 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois, Christian Loehle, Viresh Kumar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Doing cpufreq-specific EAS checks that require accessing policy
internals directly from sched_is_eas_possible() is a bit unfortunate,
so introduce cpufreq_ready_for_eas() in cpufreq, move those checks
into that new function and make sched_is_eas_possible() call it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c |   30 ++++++++++++++++++++++++++++++
 include/linux/cpufreq.h   |    2 ++
 kernel/sched/topology.c   |   25 +++++--------------------
 3 files changed, 37 insertions(+), 20 deletions(-)

--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -3052,6 +3052,36 @@
 
 	return 0;
 }
+
+bool cpufreq_ready_for_eas(const struct cpumask *cpu_mask)
+{
+	int i;
+
+	/* Do not attempt EAS if schedutil is not being used. */
+	for_each_cpu(i, cpu_mask) {
+		struct cpufreq_policy *policy;
+		bool policy_is_ready;
+
+		policy = cpufreq_cpu_get(i);
+		if (!policy) {
+			pr_debug("rd %*pbl: cpufreq policy not set for CPU: %d",
+				 cpumask_pr_args(cpu_mask), i);
+
+			return false;
+		}
+		policy_is_ready = sugov_is_cpufreq_governor(policy);
+		cpufreq_cpu_put(policy);
+		if (!policy_is_ready) {
+			pr_debug("rd %*pbl: schedutil is mandatory for EAS\n",
+				 cpumask_pr_args(cpu_mask));
+
+			return false;
+		}
+	}
+
+	return true;
+}
+
 module_param(off, int, 0444);
 module_param_string(default_governor, default_governor, CPUFREQ_NAME_LEN, 0444);
 core_initcall(cpufreq_core_init);
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1215,6 +1215,8 @@
 		struct cpufreq_frequency_table *table,
 		unsigned int transition_latency);
 
+bool cpufreq_ready_for_eas(const struct cpumask *cpu_mask);
+
 static inline void cpufreq_register_em_with_opp(struct cpufreq_policy *policy)
 {
 	dev_pm_opp_of_register_em(get_cpu_device(policy->cpu),
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -215,8 +215,6 @@
 static bool sched_is_eas_possible(const struct cpumask *cpu_mask)
 {
 	bool any_asym_capacity = false;
-	struct cpufreq_policy *policy;
-	bool policy_is_ready;
 	int i;
 
 	/* EAS is enabled for asymmetric CPU capacity topologies. */
@@ -251,25 +249,12 @@
 		return false;
 	}
 
-	/* Do not attempt EAS if schedutil is not being used. */
-	for_each_cpu(i, cpu_mask) {
-		policy = cpufreq_cpu_get(i);
-		if (!policy) {
-			if (sched_debug()) {
-				pr_info("rd %*pbl: Checking EAS, cpufreq policy not set for CPU: %d",
-					cpumask_pr_args(cpu_mask), i);
-			}
-			return false;
-		}
-		policy_is_ready = sugov_is_cpufreq_governor(policy);
-		cpufreq_cpu_put(policy);
-		if (!policy_is_ready) {
-			if (sched_debug()) {
-				pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
-					cpumask_pr_args(cpu_mask));
-			}
-			return false;
+	if (!cpufreq_ready_for_eas(cpu_mask)) {
+		if (sched_debug()) {
+			pr_info("rd %*pbl: Checking EAS: cpufreq is not ready",
+				cpumask_pr_args(cpu_mask));
 		}
+		return false;
 	}
 
 	return true;




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH v0.3 3/6] cpufreq/sched: Allow .setpolicy() cpufreq drivers to enable EAS
  2025-03-07 19:12 [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Rafael J. Wysocki
  2025-03-07 19:15 ` [RFC][PATCH v0.3 1/6] cpufreq/sched: schedutil: Add helper for governor checks Rafael J. Wysocki
  2025-03-07 19:16 ` [RFC][PATCH v0.3 2/6] cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq Rafael J. Wysocki
@ 2025-03-07 19:16 ` Rafael J. Wysocki
  2025-03-07 19:17 ` [RFC][PATCH v0.3 4/6] PM: EM: Move CPU capacity check to em_adjust_new_capacity() Rafael J. Wysocki
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-03-07 19:16 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois, Christian Loehle, Viresh Kumar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Some cpufreq drivers, like intel_pstate, have built-in governors that
are used instead of regular cpufreq governors, schedutil in particular,
but they can work with EAS just fine, so allow EAS to be used with
those drivers.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -3053,6 +3053,20 @@
 	return 0;
 }
 
+static bool cpufreq_policy_is_good_for_eas(struct cpufreq_policy *policy)
+{
+	/*
+	 * For EAS compatibility, require that either schedutil is the policy
+	 * governor or the policy is governed directly by the cpufreq driver.
+	 *
+	 * In the latter case, it is assumed that EAS can only be enabled by the
+	 * cpufreq driver itself which will not enable EAS if it does not meet
+	 * the EAS' expectations regarding performance scaling response.
+	 */
+	return sugov_is_cpufreq_governor(policy) || (!policy->governor &&
+		policy->policy != CPUFREQ_POLICY_UNKNOWN);
+}
+
 bool cpufreq_ready_for_eas(const struct cpumask *cpu_mask)
 {
 	int i;
@@ -3069,7 +3083,7 @@
 
 			return false;
 		}
-		policy_is_ready = sugov_is_cpufreq_governor(policy);
+		policy_is_ready = cpufreq_policy_is_good_for_eas(policy);
 		cpufreq_cpu_put(policy);
 		if (!policy_is_ready) {
 			pr_debug("rd %*pbl: schedutil is mandatory for EAS\n",




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH v0.3 4/6] PM: EM: Move CPU capacity check to em_adjust_new_capacity()
  2025-03-07 19:12 [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2025-03-07 19:16 ` [RFC][PATCH v0.3 3/6] cpufreq/sched: Allow .setpolicy() cpufreq drivers to enable EAS Rafael J. Wysocki
@ 2025-03-07 19:17 ` Rafael J. Wysocki
  2025-03-24 16:25   ` Lukasz Luba
  2025-03-07 19:39 ` [RFC][PATCH v0.3 5/6] PM: EM: Introduce em_adjust_cpu_capacity() Rafael J. Wysocki
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-03-07 19:17 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois, Christian Loehle, Viresh Kumar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Move the check of the CPU capacity currently stored in the energy model
against the arch_scale_cpu_capacity() value to em_adjust_new_capacity()
so it will be done regardless of where the latter is called from.

This will be useful when a new em_adjust_new_capacity() caller is added
subsequently.

While at it, move the pd local variable declaration in
em_check_capacity_update() into the loop in which it is used.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/power/energy_model.c |   40 +++++++++++++++++-----------------------
 1 file changed, 17 insertions(+), 23 deletions(-)

--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -721,10 +721,24 @@
  * Adjustment of CPU performance values after boot, when all CPUs capacites
  * are correctly calculated.
  */
-static void em_adjust_new_capacity(struct device *dev,
+static void em_adjust_new_capacity(unsigned int cpu, struct device *dev,
 				   struct em_perf_domain *pd)
 {
+	unsigned long cpu_capacity = arch_scale_cpu_capacity(cpu);
 	struct em_perf_table *em_table;
+	struct em_perf_state *table;
+	unsigned long em_max_perf;
+
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
+	em_max_perf = table[pd->nr_perf_states - 1].performance;
+	rcu_read_unlock();
+
+	if (em_max_perf == cpu_capacity)
+		return;
+
+	pr_debug("updating cpu%d cpu_cap=%lu old capacity=%lu\n", cpu,
+		 cpu_capacity, em_max_perf);
 
 	em_table = em_table_dup(pd);
 	if (!em_table) {
@@ -740,9 +754,6 @@
 static void em_check_capacity_update(void)
 {
 	cpumask_var_t cpu_done_mask;
-	struct em_perf_state *table;
-	struct em_perf_domain *pd;
-	unsigned long cpu_capacity;
 	int cpu;
 
 	if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) {
@@ -753,7 +764,7 @@
 	/* Check if CPUs capacity has changed than update EM */
 	for_each_possible_cpu(cpu) {
 		struct cpufreq_policy *policy;
-		unsigned long em_max_perf;
+		struct em_perf_domain *pd;
 		struct device *dev;
 
 		if (cpumask_test_cpu(cpu, cpu_done_mask))
@@ -776,24 +787,7 @@
 		cpumask_or(cpu_done_mask, cpu_done_mask,
 			   em_span_cpus(pd));
 
-		cpu_capacity = arch_scale_cpu_capacity(cpu);
-
-		rcu_read_lock();
-		table = em_perf_state_from_pd(pd);
-		em_max_perf = table[pd->nr_perf_states - 1].performance;
-		rcu_read_unlock();
-
-		/*
-		 * Check if the CPU capacity has been adjusted during boot
-		 * and trigger the update for new performance values.
-		 */
-		if (em_max_perf == cpu_capacity)
-			continue;
-
-		pr_debug("updating cpu%d cpu_cap=%lu old capacity=%lu\n",
-			 cpu, cpu_capacity, em_max_perf);
-
-		em_adjust_new_capacity(dev, pd);
+		em_adjust_new_capacity(cpu, dev, pd);
 	}
 
 	free_cpumask_var(cpu_done_mask);




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH v0.3 5/6] PM: EM: Introduce em_adjust_cpu_capacity()
  2025-03-07 19:12 [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2025-03-07 19:17 ` [RFC][PATCH v0.3 4/6] PM: EM: Move CPU capacity check to em_adjust_new_capacity() Rafael J. Wysocki
@ 2025-03-07 19:39 ` Rafael J. Wysocki
  2025-03-24 16:25   ` Lukasz Luba
  2025-03-07 19:42 ` [RFC][PATCH v0.3 6/6] cpufreq: intel_pstate: EAS support for hybrid platforms Rafael J. Wysocki
  2025-04-03 10:47 ` [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Christian Loehle
  6 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-03-07 19:39 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois, Christian Loehle, Viresh Kumar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add a function for updating the Energy Model for a CPU after its
capacity has changed, which subsequently will be used by the
intel_pstate driver.

An EM_PERF_DOMAIN_ARTIFICIAL check is added to em_adjust_new_capacity()
to prevent it from calling em_compute_costs() for an "artificial" perf
domain with a NULL cb parameter which would cause it to crash.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

Note that this function is needed because the performance level values
in the EM "state" table need to be adjusted on CPU capacity changes.  In
the intel_pstate case the cost values associated with them don't change
because they are artificial anyway, so replacing the entire table just
in order to update the performance level values is a bit wasteful, but
it seems to be an exception (in the other cases when the CPU capacity
changes, the cost values change too AFAICS).

---
 include/linux/energy_model.h |    2 ++
 kernel/power/energy_model.c  |   28 ++++++++++++++++++++++++----
 2 files changed, 26 insertions(+), 4 deletions(-)

--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -179,6 +179,7 @@
 int em_dev_update_chip_binning(struct device *dev);
 int em_update_performance_limits(struct em_perf_domain *pd,
 		unsigned long freq_min_khz, unsigned long freq_max_khz);
+void em_adjust_cpu_capacity(unsigned int cpu);
 void em_rebuild_sched_domains(void);
 
 /**
@@ -405,6 +406,7 @@
 {
 	return -EINVAL;
 }
+void em_adjust_cpu_capacity(unsigned int cpu) {}
 static inline void em_rebuild_sched_domains(void) {}
 #endif
 
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -698,10 +698,12 @@
 {
 	int ret;
 
-	ret = em_compute_costs(dev, em_table->state, NULL, pd->nr_perf_states,
-			       pd->flags);
-	if (ret)
-		goto free_em_table;
+	if (!(pd->flags & EM_PERF_DOMAIN_ARTIFICIAL)) {
+		ret = em_compute_costs(dev, em_table->state, NULL,
+				       pd->nr_perf_states, pd->flags);
+		if (ret)
+			goto free_em_table;
+	}
 
 	ret = em_dev_update_perf_domain(dev, em_table);
 	if (ret)
@@ -751,6 +753,24 @@
 	em_recalc_and_update(dev, pd, em_table);
 }
 
+/**
+ * em_adjust_cpu_capacity() - Adjust the EM for a CPU after a capacity update.
+ * @cpu: Target CPU.
+ *
+ * Adjust the existing EM for @cpu after a capacity update under the assumption
+ * that the capacity has been updated in the same way for all of the CPUs in
+ * the same perf domain.
+ */
+void em_adjust_cpu_capacity(unsigned int cpu)
+{
+	struct device *dev = get_cpu_device(cpu);
+	struct em_perf_domain *pd;
+
+	pd = em_pd_get(dev);
+	if (pd)
+		em_adjust_new_capacity(cpu, dev, pd);
+}
+
 static void em_check_capacity_update(void)
 {
 	cpumask_var_t cpu_done_mask;




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH v0.3 6/6] cpufreq: intel_pstate: EAS support for hybrid platforms
  2025-03-07 19:12 [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2025-03-07 19:39 ` [RFC][PATCH v0.3 5/6] PM: EM: Introduce em_adjust_cpu_capacity() Rafael J. Wysocki
@ 2025-03-07 19:42 ` Rafael J. Wysocki
  2025-03-13 18:46   ` Tim Chen
  2025-04-03 10:47 ` [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Christian Loehle
  6 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-03-07 19:42 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois, Christian Loehle

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Modify intel_pstate to register EM perf domains for CPUs on hybrid
platforms and enable EAS on them.

This change is targeting platforms (for example, Lunar Lake) where the
"little" CPUs (E-cores) are always more energy-efficient than the "big"
or "performance" CPUs (P-cores) when run at the same HWP performance
level, so it is sufficient to tell EAS that E-cores are always preferred
(so long as there is enough spare capacity on one of them to run the
given task).  However, migrating tasks between CPUs of the same type
too often is not desirable because it may hurt both performance and
energy efficiency due to leaving warm caches behind.

For this reason, register a separate perf domain for each CPU and assign
costs for them so that the cost mostly depends on the CPU type, but
there is also a small component of it depending on the performance
level (utilization) which allows to avoid substantial load imbalances
between CPUs of the same type.

The observation used here is that the IPC metric value for a given CPU
is inversely proportional to its performance-to-frequency scaling factor
and the cost of running on it can be assumed to be roughly proportional
to that IPC ratio (in principle, the higher the IPC ratio, the more
resources are utilized when running at a given frequency, so the cost
should be higher).  This main component of the cost is amended with a
small addition proportional performance.

EM perf domains for all CPUs that are online during system startup are
registered at the driver initialization time, after asymmetric capacity
support has been enabled.  For the CPUs that become online later, EM
perf domains are registered after setting the asymmetric capacity for
them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |  132 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 127 insertions(+), 5 deletions(-)

--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -44,6 +44,8 @@
 #define INTEL_CPUFREQ_TRANSITION_DELAY_HWP	5000
 #define INTEL_CPUFREQ_TRANSITION_DELAY		500
 
+#define INTEL_PSTATE_CORE_SCALING		100000
+
 #ifdef CONFIG_ACPI
 #include <acpi/processor.h>
 #include <acpi/cppc_acpi.h>
@@ -221,6 +223,7 @@
  * @sched_flags:	Store scheduler flags for possible cross CPU update
  * @hwp_boost_min:	Last HWP boosted min performance
  * @suspended:		Whether or not the driver has been suspended.
+ * @em_registered:	If set, an energy model has been registered.
  * @hwp_notify_work:	workqueue for HWP notifications.
  *
  * This structure stores per CPU instance data for all CPUs.
@@ -260,6 +263,9 @@
 	unsigned int sched_flags;
 	u32 hwp_boost_min;
 	bool suspended;
+#ifdef CONFIG_ENERGY_MODEL
+	bool em_registered;
+#endif
 	struct delayed_work hwp_notify_work;
 };
 
@@ -311,7 +317,7 @@
 
 static inline int core_get_scaling(void)
 {
-	return 100000;
+	return INTEL_PSTATE_CORE_SCALING;
 }
 
 #ifdef CONFIG_ACPI
@@ -945,12 +951,105 @@
  */
 static DEFINE_MUTEX(hybrid_capacity_lock);
 
+#ifdef CONFIG_ENERGY_MODEL
+#define HYBRID_EM_STATE_COUNT	4
+
+static int hybrid_active_power(struct device *dev, unsigned long *power,
+			       unsigned long *freq)
+{
+	/*
+	 * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
+	 * of the maximum capacity such that two CPUs of the same type will be
+	 * regarded as equally attractive if the utilization of each of them
+	 * falls into the same bin, which should prevent tasks from being
+	 * migrated between them too often.
+	 *
+	 * For this purpose, return the "frequency" of 2 for the first
+	 * performance level and otherwise leave the value set by the caller.
+	 */
+	if (!*freq)
+		*freq = 2;
+
+	/* No power information. */
+	*power = EM_MAX_POWER;
+
+	return 0;
+}
+
+static int hybrid_get_cost(struct device *dev, unsigned long freq,
+			   unsigned long *cost)
+{
+	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
+
+	/*
+	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
+	 * ratio between the given CPU and the least capable CPU in the system.
+	 * Regard that IPC ratio as the primary cost component and assume that
+	 * the scaling factors for different CPU types will differ by at least
+	 * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
+	 *
+	 * Add the freq value to the cost, so that the cost of running on CPUs
+	 * of the same type in different "utilization bins" is different.
+	 */
+	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
+
+	return 0;
+}
+
+static bool hybrid_register_perf_domain(unsigned int cpu)
+{
+	static const struct em_data_callback cb
+			= EM_ADV_DATA_CB(hybrid_active_power, hybrid_get_cost);
+	struct cpudata *cpudata = all_cpu_data[cpu];
+	struct device *cpu_dev;
+
+	/*
+	 * Registering EM perf domains without enabling asymmetric CPU capacity
+	 * support is not really useful and one domain should not be registered
+	 * more than once.
+	 */
+	if (!hybrid_max_perf_cpu || cpudata->em_registered)
+		return false;
+
+	cpu_dev = get_cpu_device(cpu);
+	if (!cpu_dev)
+		return false;
+
+	if (em_dev_register_perf_domain(cpu_dev, HYBRID_EM_STATE_COUNT, &cb,
+					cpumask_of(cpu), false))
+		return false;
+
+	cpudata->em_registered = true;
+
+	return true;
+}
+
+static void hybrid_register_all_perf_domains(void)
+{
+	unsigned int cpu;
+
+	for_each_online_cpu(cpu)
+		hybrid_register_perf_domain(cpu);
+}
+
+static void hybrid_update_perf_domain(struct cpudata *cpu)
+{
+	if (cpu->em_registered)
+		em_adjust_cpu_capacity(cpu->cpu);
+}
+#else /* !CONFIG_ENERGY_MODEL */
+static inline bool hybrid_register_perf_domain(unsigned int cpu) { return false; }
+static inline void hybrid_register_all_perf_domains(void) {}
+static inline void hybrid_update_perf_domain(struct cpudata *cpu) {}
+#endif /* CONFIG_ENERGY_MODEL */
+
 static void hybrid_set_cpu_capacity(struct cpudata *cpu)
 {
 	arch_set_cpu_capacity(cpu->cpu, cpu->capacity_perf,
 			      hybrid_max_perf_cpu->capacity_perf,
 			      cpu->capacity_perf,
 			      cpu->pstate.max_pstate_physical);
+	hybrid_update_perf_domain(cpu);
 
 	pr_debug("CPU%d: perf = %u, max. perf = %u, base perf = %d\n", cpu->cpu,
 		 cpu->capacity_perf, hybrid_max_perf_cpu->capacity_perf,
@@ -1039,6 +1138,11 @@
 	guard(mutex)(&hybrid_capacity_lock);
 
 	__hybrid_refresh_cpu_capacity_scaling();
+	/*
+	 * Perf domains are not registered before setting hybrid_max_perf_cpu,
+	 * so register them all after setting up CPU capacity scaling.
+	 */
+	hybrid_register_all_perf_domains();
 }
 
 static void hybrid_init_cpu_capacity_scaling(bool refresh)
@@ -1066,7 +1170,7 @@
 		hybrid_refresh_cpu_capacity_scaling();
 		/*
 		 * Disabling ITMT causes sched domains to be rebuilt to disable asym
-		 * packing and enable asym capacity.
+		 * packing and enable asym capacity and EAS.
 		 */
 		sched_clear_itmt_support();
 	}
@@ -1144,6 +1248,14 @@
 	}
 
 	hybrid_set_cpu_capacity(cpu);
+	/*
+	 * If the CPU was offline to start with and it is going online for the
+	 * first time, a perf domain needs to be registered for it if hybrid
+	 * capacity scaling has been enabled already.  In that case, sched
+	 * domains need to be rebuilt to take the new perf domain into account.
+	 */
+	if (hybrid_register_perf_domain(cpu->cpu))
+		em_rebuild_sched_domains();
 
 unlock:
 	mutex_unlock(&hybrid_capacity_lock);
@@ -3416,6 +3528,8 @@
 
 static int intel_pstate_update_status(const char *buf, size_t size)
 {
+	int ret = -EINVAL;
+
 	if (size == 3 && !strncmp(buf, "off", size)) {
 		if (!intel_pstate_driver)
 			return -EINVAL;
@@ -3425,6 +3539,8 @@
 
 		cpufreq_unregister_driver(intel_pstate_driver);
 		intel_pstate_driver_cleanup();
+		/* Trigger EAS support reconfiguration in case it was used. */
+		rebuild_sched_domains_energy();
 		return 0;
 	}
 
@@ -3436,7 +3552,13 @@
 			cpufreq_unregister_driver(intel_pstate_driver);
 		}
 
-		return intel_pstate_register_driver(&intel_pstate);
+		ret = intel_pstate_register_driver(&intel_pstate);
+		/*
+		 * If the previous status had been "passive" and the schedutil
+		 * governor had been used, it disabled EAS on exit, so trigger
+		 * sched domains rebuild in case EAS needs to be enabled again.
+		 */
+		rebuild_sched_domains_energy();
 	}
 
 	if (size == 7 && !strncmp(buf, "passive", size)) {
@@ -3448,10 +3570,10 @@
 			intel_pstate_sysfs_hide_hwp_dynamic_boost();
 		}
 
-		return intel_pstate_register_driver(&intel_cpufreq);
+		ret = intel_pstate_register_driver(&intel_cpufreq);
 	}
 
-	return -EINVAL;
+	return ret;
 }
 
 static int no_load __initdata;




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH v0.3 6/6] cpufreq: intel_pstate: EAS support for hybrid platforms
  2025-03-07 19:42 ` [RFC][PATCH v0.3 6/6] cpufreq: intel_pstate: EAS support for hybrid platforms Rafael J. Wysocki
@ 2025-03-13 18:46   ` Tim Chen
  2025-03-13 18:50     ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Tim Chen @ 2025-03-13 18:46 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois, Christian Loehle

On Fri, 2025-03-07 at 20:42 +0100, Rafael J. Wysocki wrote:
> 
> 
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -44,6 +44,8 @@
>  #define INTEL_CPUFREQ_TRANSITION_DELAY_HWP	5000
>  #define INTEL_CPUFREQ_TRANSITION_DELAY		500
>  
> +#define INTEL_PSTATE_CORE_SCALING		100000
> +

Minor nits.

Suggest move the above define to

#define HYBRID_SCALING_FACTOR_ADL       78741
#define HYBRID_SCALING_FACTOR_MTL       80000
#define HYBRID_SCALING_FACTOR_LNL       86957
#define INTEL_PSTATE_CORE_SCALING	100000

to keep the scaling factors at the same place.


> @@ -3425,6 +3539,8 @@
>  
>  		cpufreq_unregister_driver(intel_pstate_driver);
>  		intel_pstate_driver_cleanup();
> +		/* Trigger EAS support reconfiguration in case it was used. */

May be clearer to say

		/* Disable EAS support in case it was used */

My first read of the comment thought that we are enabling EAS support.

> +		rebuild_sched_domains_energy();
>  		return 0;
>  	}
>  

Rest of patch looks good.

Tim

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH v0.3 6/6] cpufreq: intel_pstate: EAS support for hybrid platforms
  2025-03-13 18:46   ` Tim Chen
@ 2025-03-13 18:50     ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-03-13 18:50 UTC (permalink / raw)
  To: Tim Chen
  Cc: Rafael J. Wysocki, Linux PM, LKML, Lukasz Luba, Peter Zijlstra,
	Srinivas Pandruvada, Dietmar Eggemann, Morten Rasmussen,
	Vincent Guittot, Ricardo Neri, Pierre Gondois, Christian Loehle

On Thu, Mar 13, 2025 at 7:46 PM Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> On Fri, 2025-03-07 at 20:42 +0100, Rafael J. Wysocki wrote:
> >
> >
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -44,6 +44,8 @@
> >  #define INTEL_CPUFREQ_TRANSITION_DELAY_HWP   5000
> >  #define INTEL_CPUFREQ_TRANSITION_DELAY               500
> >
> > +#define INTEL_PSTATE_CORE_SCALING            100000
> > +
>
> Minor nits.
>
> Suggest move the above define to
>
> #define HYBRID_SCALING_FACTOR_ADL       78741
> #define HYBRID_SCALING_FACTOR_MTL       80000
> #define HYBRID_SCALING_FACTOR_LNL       86957
> #define INTEL_PSTATE_CORE_SCALING       100000
>
> to keep the scaling factors at the same place.

It may be needed earlier, but I see your point.  Keeping them together
will make sense.

>
> > @@ -3425,6 +3539,8 @@
> >
> >               cpufreq_unregister_driver(intel_pstate_driver);
> >               intel_pstate_driver_cleanup();
> > +             /* Trigger EAS support reconfiguration in case it was used. */
>
> May be clearer to say
>
>                 /* Disable EAS support in case it was used */

Sure.

> My first read of the comment thought that we are enabling EAS support.
>
> > +             rebuild_sched_domains_energy();
> >               return 0;
> >       }
> >
>
> Rest of patch looks good.

Thanks for the review!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH v0.3 4/6] PM: EM: Move CPU capacity check to em_adjust_new_capacity()
  2025-03-07 19:17 ` [RFC][PATCH v0.3 4/6] PM: EM: Move CPU capacity check to em_adjust_new_capacity() Rafael J. Wysocki
@ 2025-03-24 16:25   ` Lukasz Luba
  0 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2025-03-24 16:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, Dietmar Eggemann,
	Morten Rasmussen, Vincent Guittot, Ricardo Neri, Pierre Gondois,
	Christian Loehle, Viresh Kumar

Hi Rafael,

On 3/7/25 19:17, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Move the check of the CPU capacity currently stored in the energy model
> against the arch_scale_cpu_capacity() value to em_adjust_new_capacity()
> so it will be done regardless of where the latter is called from.
> 
> This will be useful when a new em_adjust_new_capacity() caller is added
> subsequently.
> 
> While at it, move the pd local variable declaration in
> em_check_capacity_update() into the loop in which it is used.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   kernel/power/energy_model.c |   40 +++++++++++++++++-----------------------
>   1 file changed, 17 insertions(+), 23 deletions(-)
> 
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -721,10 +721,24 @@
>    * Adjustment of CPU performance values after boot, when all CPUs capacites
>    * are correctly calculated.
>    */
> -static void em_adjust_new_capacity(struct device *dev,
> +static void em_adjust_new_capacity(unsigned int cpu, struct device *dev,
>   				   struct em_perf_domain *pd)
>   {
> +	unsigned long cpu_capacity = arch_scale_cpu_capacity(cpu);
>   	struct em_perf_table *em_table;
> +	struct em_perf_state *table;
> +	unsigned long em_max_perf;
> +
> +	rcu_read_lock();
> +	table = em_perf_state_from_pd(pd);
> +	em_max_perf = table[pd->nr_perf_states - 1].performance;
> +	rcu_read_unlock();
> +
> +	if (em_max_perf == cpu_capacity)
> +		return;
> +
> +	pr_debug("updating cpu%d cpu_cap=%lu old capacity=%lu\n", cpu,
> +		 cpu_capacity, em_max_perf);
>   
>   	em_table = em_table_dup(pd);
>   	if (!em_table) {
> @@ -740,9 +754,6 @@
>   static void em_check_capacity_update(void)
>   {
>   	cpumask_var_t cpu_done_mask;
> -	struct em_perf_state *table;
> -	struct em_perf_domain *pd;
> -	unsigned long cpu_capacity;
>   	int cpu;
>   
>   	if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) {
> @@ -753,7 +764,7 @@
>   	/* Check if CPUs capacity has changed than update EM */
>   	for_each_possible_cpu(cpu) {
>   		struct cpufreq_policy *policy;
> -		unsigned long em_max_perf;
> +		struct em_perf_domain *pd;
>   		struct device *dev;
>   
>   		if (cpumask_test_cpu(cpu, cpu_done_mask))
> @@ -776,24 +787,7 @@
>   		cpumask_or(cpu_done_mask, cpu_done_mask,
>   			   em_span_cpus(pd));
>   
> -		cpu_capacity = arch_scale_cpu_capacity(cpu);
> -
> -		rcu_read_lock();
> -		table = em_perf_state_from_pd(pd);
> -		em_max_perf = table[pd->nr_perf_states - 1].performance;
> -		rcu_read_unlock();
> -
> -		/*
> -		 * Check if the CPU capacity has been adjusted during boot
> -		 * and trigger the update for new performance values.
> -		 */
> -		if (em_max_perf == cpu_capacity)
> -			continue;
> -
> -		pr_debug("updating cpu%d cpu_cap=%lu old capacity=%lu\n",
> -			 cpu, cpu_capacity, em_max_perf);
> -
> -		em_adjust_new_capacity(dev, pd);
> +		em_adjust_new_capacity(cpu, dev, pd);
>   	}
>   
>   	free_cpumask_var(cpu_done_mask);
> 
> 
> 


LGTM,

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH v0.3 5/6] PM: EM: Introduce em_adjust_cpu_capacity()
  2025-03-07 19:39 ` [RFC][PATCH v0.3 5/6] PM: EM: Introduce em_adjust_cpu_capacity() Rafael J. Wysocki
@ 2025-03-24 16:25   ` Lukasz Luba
  0 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2025-03-24 16:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, Dietmar Eggemann,
	Morten Rasmussen, Vincent Guittot, Ricardo Neri, Pierre Gondois,
	Christian Loehle, Viresh Kumar



On 3/7/25 19:39, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Add a function for updating the Energy Model for a CPU after its
> capacity has changed, which subsequently will be used by the
> intel_pstate driver.
> 
> An EM_PERF_DOMAIN_ARTIFICIAL check is added to em_adjust_new_capacity()
> to prevent it from calling em_compute_costs() for an "artificial" perf
> domain with a NULL cb parameter which would cause it to crash.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> Note that this function is needed because the performance level values
> in the EM "state" table need to be adjusted on CPU capacity changes.  In
> the intel_pstate case the cost values associated with them don't change
> because they are artificial anyway, so replacing the entire table just
> in order to update the performance level values is a bit wasteful, but
> it seems to be an exception (in the other cases when the CPU capacity
> changes, the cost values change too AFAICS).
> 
> ---
>   include/linux/energy_model.h |    2 ++
>   kernel/power/energy_model.c  |   28 ++++++++++++++++++++++++----
>   2 files changed, 26 insertions(+), 4 deletions(-)
> 
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -179,6 +179,7 @@
>   int em_dev_update_chip_binning(struct device *dev);
>   int em_update_performance_limits(struct em_perf_domain *pd,
>   		unsigned long freq_min_khz, unsigned long freq_max_khz);
> +void em_adjust_cpu_capacity(unsigned int cpu);
>   void em_rebuild_sched_domains(void);
>   
>   /**
> @@ -405,6 +406,7 @@
>   {
>   	return -EINVAL;
>   }
> +void em_adjust_cpu_capacity(unsigned int cpu) {}
>   static inline void em_rebuild_sched_domains(void) {}
>   #endif
>   
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -698,10 +698,12 @@
>   {
>   	int ret;
>   
> -	ret = em_compute_costs(dev, em_table->state, NULL, pd->nr_perf_states,
> -			       pd->flags);
> -	if (ret)
> -		goto free_em_table;
> +	if (!(pd->flags & EM_PERF_DOMAIN_ARTIFICIAL)) {
> +		ret = em_compute_costs(dev, em_table->state, NULL,
> +				       pd->nr_perf_states, pd->flags);
> +		if (ret)
> +			goto free_em_table;
> +	}
>   
>   	ret = em_dev_update_perf_domain(dev, em_table);
>   	if (ret)
> @@ -751,6 +753,24 @@
>   	em_recalc_and_update(dev, pd, em_table);
>   }
>   
> +/**
> + * em_adjust_cpu_capacity() - Adjust the EM for a CPU after a capacity update.
> + * @cpu: Target CPU.
> + *
> + * Adjust the existing EM for @cpu after a capacity update under the assumption
> + * that the capacity has been updated in the same way for all of the CPUs in
> + * the same perf domain.
> + */
> +void em_adjust_cpu_capacity(unsigned int cpu)
> +{
> +	struct device *dev = get_cpu_device(cpu);
> +	struct em_perf_domain *pd;
> +
> +	pd = em_pd_get(dev);
> +	if (pd)
> +		em_adjust_new_capacity(cpu, dev, pd);
> +}
> +
>   static void em_check_capacity_update(void)
>   {
>   	cpumask_var_t cpu_done_mask;
> 
> 
> 

LGTM,

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative
  2025-03-07 19:12 [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2025-03-07 19:42 ` [RFC][PATCH v0.3 6/6] cpufreq: intel_pstate: EAS support for hybrid platforms Rafael J. Wysocki
@ 2025-04-03 10:47 ` Christian Loehle
  2025-04-03 11:02   ` Rafael J. Wysocki
  6 siblings, 1 reply; 13+ messages in thread
From: Christian Loehle @ 2025-04-03 10:47 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Lukasz Luba, Peter Zijlstra, Srinivas Pandruvada,
	Dietmar Eggemann, Morten Rasmussen, Vincent Guittot, Ricardo Neri,
	Pierre Gondois

On 3/7/25 19:12, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This is a new take on the "EAS for intel_pstate" work:
> 
> https://lore.kernel.org/linux-pm/5861970.DvuYhMxLoT@rjwysocki.net/
> 
> with refreshed preparatory patches and a revised energy model design.
> 
> The following paragraph from the original cover letter still applies:
> 
> "The underlying observation is that on the platforms targeted by these changes,
> Lunar Lake at the time of this writing, the "small" CPUs (E-cores), when run at
> the same performance level, are always more energy-efficient than the "big" or
> "performance" CPUs (P-cores).  This means that, regardless of the scale-
> invariant utilization of a task, as long as there is enough spare capacity on
> E-cores, the relative cost of running it there is always lower."
> 
> However, this time perf domains are registered per CPU and in addition to the
> primary cost component, which is related to the CPU type, there is a small
> component proportional to performance whose role is to help balance the load
> between CPUs of the same type.
> 
> This is done to avoid migrating tasks too much between CPUs of the same type,
> especially between E-cores, which has been observed in tests of the previous
> iteration of this work.
> 
> The expected effect is still that the CPUs of the "low-cost" type will be
> preferred so long as there is enough spare capacity on any of them.
> 
> The first two patches in the series rearrange cpufreq checks related to EAS so
> that sched_is_eas_possible() doesn't have to access cpufreq internals directly
> and patch [3/6] changes those checks to also allow EAS to be used with cpufreq
> drivers that implement internal governors (like intel_pstate).
> 
> Patches [4-5/6] deal with the Energy Model code.  Patch [4/6] simply rearranges
> it so as to allow the next patch to be simpler and patch [5/6] adds a function
> that's used in the last patch.
> 
> Patch [6/6] is the actual intel_pstate modification which now is significantly
> simpler than before because it doesn't need to track the type of each CPU
> directly in order to put into the right perf domain.
> 
> Please refer to the individual patch changelogs for details.
> 
> For easier access, the series is available on the experimental/intel_pstate/eas-take2
> branch in linux-pm.git:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> experimental/intel_pstate/eas-take2
> 
> or
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=experimental/intel_pstate/eas-take2
> 
> Thanks!
> 


Hi Rafael,
as promised I did the same tests as with v0.2, the results are better with v0.3,
hard to say though if that is because of the cache-affinity on the P-cores.

Interestingly our nosmt Raptor Lake 8+8 should be worse off with its 16 PDs now.
Maybe, if L2 is shared anyway, one PD for e-cores and per-CPU-PD for P-cores
could be experimented with too (so 4+1+1+1+1 for lunar lake).

Anyway these are the results, again 20 iterations of 5 minutes each:

Firefox YouTube 4K video playback:
EAS:
376.229 +-9.566835596650195
CAS:
661.323 +-18.951739322113248
(-43.1% energy used with EAS)
(cf -24.2% energy used with EAS v0.2)

Firefox Web Aquarium 500 fish.
EAS:
331.933 +-10.977847441299437
CAS:
515.594 +-16.997636567737562
(-35.6% energy used with EAS)
(Wasn't tested on v0.2, just to see if above was a lucky workload hit.)

Both don't show any performance hit with EAS (FPS are very stable for both).
v0.2 results:
https://lore.kernel.org/lkml/3861524b-b266-4e54-b7ab-fdccbb7b4177@arm.com/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative
  2025-04-03 10:47 ` [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Christian Loehle
@ 2025-04-03 11:02   ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-04-03 11:02 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Rafael J. Wysocki, Linux PM, LKML, Lukasz Luba, Peter Zijlstra,
	Srinivas Pandruvada, Dietmar Eggemann, Morten Rasmussen,
	Vincent Guittot, Ricardo Neri, Pierre Gondois

On Thu, Apr 3, 2025 at 12:47 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 3/7/25 19:12, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > This is a new take on the "EAS for intel_pstate" work:
> >
> > https://lore.kernel.org/linux-pm/5861970.DvuYhMxLoT@rjwysocki.net/
> >
> > with refreshed preparatory patches and a revised energy model design.
> >
> > The following paragraph from the original cover letter still applies:
> >
> > "The underlying observation is that on the platforms targeted by these changes,
> > Lunar Lake at the time of this writing, the "small" CPUs (E-cores), when run at
> > the same performance level, are always more energy-efficient than the "big" or
> > "performance" CPUs (P-cores).  This means that, regardless of the scale-
> > invariant utilization of a task, as long as there is enough spare capacity on
> > E-cores, the relative cost of running it there is always lower."
> >
> > However, this time perf domains are registered per CPU and in addition to the
> > primary cost component, which is related to the CPU type, there is a small
> > component proportional to performance whose role is to help balance the load
> > between CPUs of the same type.
> >
> > This is done to avoid migrating tasks too much between CPUs of the same type,
> > especially between E-cores, which has been observed in tests of the previous
> > iteration of this work.
> >
> > The expected effect is still that the CPUs of the "low-cost" type will be
> > preferred so long as there is enough spare capacity on any of them.
> >
> > The first two patches in the series rearrange cpufreq checks related to EAS so
> > that sched_is_eas_possible() doesn't have to access cpufreq internals directly
> > and patch [3/6] changes those checks to also allow EAS to be used with cpufreq
> > drivers that implement internal governors (like intel_pstate).
> >
> > Patches [4-5/6] deal with the Energy Model code.  Patch [4/6] simply rearranges
> > it so as to allow the next patch to be simpler and patch [5/6] adds a function
> > that's used in the last patch.
> >
> > Patch [6/6] is the actual intel_pstate modification which now is significantly
> > simpler than before because it doesn't need to track the type of each CPU
> > directly in order to put into the right perf domain.
> >
> > Please refer to the individual patch changelogs for details.
> >
> > For easier access, the series is available on the experimental/intel_pstate/eas-take2
> > branch in linux-pm.git:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > experimental/intel_pstate/eas-take2
> >
> > or
> >
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=experimental/intel_pstate/eas-take2
> >
> > Thanks!
> >
>
>
> Hi Rafael,
> as promised I did the same tests as with v0.2, the results are better with v0.3,
> hard to say though if that is because of the cache-affinity on the P-cores.
>
> Interestingly our nosmt Raptor Lake 8+8 should be worse off with its 16 PDs now.
> Maybe, if L2 is shared anyway, one PD for e-cores and per-CPU-PD for P-cores
> could be experimented with too (so 4+1+1+1+1 for lunar lake).
>
> Anyway these are the results, again 20 iterations of 5 minutes each:
>
> Firefox YouTube 4K video playback:
> EAS:
> 376.229 +-9.566835596650195
> CAS:
> 661.323 +-18.951739322113248
> (-43.1% energy used with EAS)
> (cf -24.2% energy used with EAS v0.2)
>
> Firefox Web Aquarium 500 fish.
> EAS:
> 331.933 +-10.977847441299437
> CAS:
> 515.594 +-16.997636567737562
> (-35.6% energy used with EAS)
> (Wasn't tested on v0.2, just to see if above was a lucky workload hit.)
>
> Both don't show any performance hit with EAS (FPS are very stable for both).
> v0.2 results:
> https://lore.kernel.org/lkml/3861524b-b266-4e54-b7ab-fdccbb7b4177@arm.com/

Thank you!

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-04-03 11:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-07 19:12 [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Rafael J. Wysocki
2025-03-07 19:15 ` [RFC][PATCH v0.3 1/6] cpufreq/sched: schedutil: Add helper for governor checks Rafael J. Wysocki
2025-03-07 19:16 ` [RFC][PATCH v0.3 2/6] cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq Rafael J. Wysocki
2025-03-07 19:16 ` [RFC][PATCH v0.3 3/6] cpufreq/sched: Allow .setpolicy() cpufreq drivers to enable EAS Rafael J. Wysocki
2025-03-07 19:17 ` [RFC][PATCH v0.3 4/6] PM: EM: Move CPU capacity check to em_adjust_new_capacity() Rafael J. Wysocki
2025-03-24 16:25   ` Lukasz Luba
2025-03-07 19:39 ` [RFC][PATCH v0.3 5/6] PM: EM: Introduce em_adjust_cpu_capacity() Rafael J. Wysocki
2025-03-24 16:25   ` Lukasz Luba
2025-03-07 19:42 ` [RFC][PATCH v0.3 6/6] cpufreq: intel_pstate: EAS support for hybrid platforms Rafael J. Wysocki
2025-03-13 18:46   ` Tim Chen
2025-03-13 18:50     ` Rafael J. Wysocki
2025-04-03 10:47 ` [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT - alternative Christian Loehle
2025-04-03 11:02   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox