[PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs
@ 2025-11-04  6:50 Jie Zhan
  2025-11-10 16:49 ` Beata Michalska
  0 siblings, 1 reply; 5+ messages in thread
From: Jie Zhan @ 2025-11-04  6:50 UTC (permalink / raw)
  To: viresh.kumar, rafael, ionela.voinescu, beata.michalska
  Cc: linux-pm, linux-acpi, linux-arm-kernel, linuxarm, zhanjie9,
	zhenglifeng1, prime.zeng, jonathan.cameron

Currently, the CPPC Frequency Invariance Engine (FIE) is invoked from the
scheduler tick but defers the update of arch_freq_scale to a separate
thread because cppc_get_perf_ctrs() would sleep if the CPC regs are in PCC.

However, this deferred update mechanism is unnecessary and introduces extra
overhead for non-PCC register spaces (e.g. System Memory or FFH), where
accessing the regs won't sleep and can be safely performed from the tick
context.

Furthermore, with the CPPC FIE registered, it throws repeated warnings of
"cppc_scale_freq_workfn: failed to read perf counters" on our platform with
the CPC regs in System Memory and a power-down idle state enabled.  That's
because the remote CPU can be in a power-down idle state, and reading its
perf counters returns 0.  Moving the FIE handling back to the scheduler
tick process makes the CPU handle its own perf counters, so it won't be
idle and the issue would be inherently solved.

To address the above issues, update arch_freq_scale directly in ticks for
non-PCC regs and keep the deferred update mechanism for PCC regs.

Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
---
We have tested this on Kunpeng SoCs with the CPC regs both in System Memory
and FFH.  More tests on other platforms are welcome.

Changelog:

v3:
- Stash the state of 'cppc_perf_ctrs_in_pcc' so it won't have to check the CPC
  regs of all CPUs everywhere (Thanks to the suggestion from Beata Michalska).
- Update the commit log, explaining more on the warning issue caused by
  accessing perf counters on remote CPUs.
- Drop Patch 1 that has been accepted, and rebase Patch 2 on that.

v2:
https://lore.kernel.org/linux-pm/20250828110212.2108653-1-zhanjie9@hisilicon.com/
- Update the cover letter and the commit log based on v1 discussion
- Update FIE arch_freq_scale in ticks for non-PCC regs

v1:
https://lore.kernel.org/linux-pm/20250730032312.167062-1-yubowen8@huawei.com/
---
 drivers/cpufreq/cppc_cpufreq.c | 60 ++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 18 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 9eac77c4f294..4fcaec7e2034 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -48,37 +48,31 @@ struct cppc_freq_invariance {
 };
 
 static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
+static bool perf_ctrs_in_pcc;
 static struct kthread_worker *kworker_fie;
 
 static int cppc_perf_from_fbctrs(struct cppc_perf_fb_ctrs *fb_ctrs_t0,
 				 struct cppc_perf_fb_ctrs *fb_ctrs_t1);
 
 /**
- * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance
- * @work: The work item.
+ * __cppc_scale_freq_tick - CPPC arch_freq_scale updater for frequency invariance
+ * @cppc_fi: per-cpu CPPC FIE data.
  *
- * The CPPC driver register itself with the topology core to provide its own
+ * The CPPC driver registers itself with the topology core to provide its own
  * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which
  * gets called by the scheduler on every tick.
  *
  * Note that the arch specific counters have higher priority than CPPC counters,
  * if available, though the CPPC driver doesn't need to have any special
  * handling for that.
- *
- * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we
- * reach here from hard-irq context), which then schedules a normal work item
- * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable
- * based on the counter updates since the last tick.
  */
-static void cppc_scale_freq_workfn(struct kthread_work *work)
+static void __cppc_scale_freq_tick(struct cppc_freq_invariance *cppc_fi)
 {
-	struct cppc_freq_invariance *cppc_fi;
 	struct cppc_perf_fb_ctrs fb_ctrs = {0};
 	struct cppc_cpudata *cpu_data;
 	unsigned long local_freq_scale;
 	u64 perf;
 
-	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
 	cpu_data = cppc_fi->cpu_data;
 
 	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
@@ -102,6 +96,14 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
 	per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale;
 }
 
+static void cppc_scale_freq_workfn(struct kthread_work *work)
+{
+	struct cppc_freq_invariance *cppc_fi;
+
+	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
+	__cppc_scale_freq_tick(cppc_fi);
+}
+
 static void cppc_irq_work(struct irq_work *irq_work)
 {
 	struct cppc_freq_invariance *cppc_fi;
@@ -110,7 +112,14 @@ static void cppc_irq_work(struct irq_work *irq_work)
 	kthread_queue_work(kworker_fie, &cppc_fi->work);
 }
 
-static void cppc_scale_freq_tick(void)
+/*
+ * Reading perf counters may sleep if the CPC regs are in PCC.  Thus, we
+ * schedule an irq work in scale_freq_tick (since we reach here from hard-irq
+ * context), which then schedules a normal work item cppc_scale_freq_workfn()
+ * that updates the per_cpu arch_freq_scale variable based on the counter
+ * updates since the last tick.
+ */
+static void cppc_scale_freq_tick_pcc(void)
 {
 	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id());
 
@@ -121,6 +130,11 @@ static void cppc_scale_freq_tick(void)
 	irq_work_queue(&cppc_fi->irq_work);
 }
 
+static void cppc_scale_freq_tick(void)
+{
+	__cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id()));
+}
+
 static struct scale_freq_data cppc_sftd = {
 	.source = SCALE_FREQ_SOURCE_CPPC,
 	.set_freq_scale = cppc_scale_freq_tick,
@@ -138,8 +152,10 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
 		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
 		cppc_fi->cpu = cpu;
 		cppc_fi->cpu_data = policy->driver_data;
-		kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
-		init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
+		if (perf_ctrs_in_pcc) {
+			kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
+			init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
+		}
 
 		ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
 
@@ -177,6 +193,9 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
 	/* policy->cpus will be empty here, use related_cpus instead */
 	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
 
+	if (!perf_ctrs_in_pcc)
+		return;
+
 	for_each_cpu(cpu, policy->related_cpus) {
 		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
 		irq_work_sync(&cppc_fi->irq_work);
@@ -201,17 +220,22 @@ static void __init cppc_freq_invariance_init(void)
 	};
 	int ret;
 
+	perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc();
+
 	if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) {
-		fie_disabled = FIE_ENABLED;
-		if (cppc_perf_ctrs_in_pcc()) {
+		if (!perf_ctrs_in_pcc) {
+			fie_disabled = FIE_ENABLED;
+		} else {
 			pr_info("FIE not enabled on systems with registers in PCC\n");
 			fie_disabled = FIE_DISABLED;
 		}
 	}
 
-	if (fie_disabled)
+	if (fie_disabled || !perf_ctrs_in_pcc)
 		return;
 
+	cppc_sftd.set_freq_scale = cppc_scale_freq_tick_pcc;
+
 	kworker_fie = kthread_run_worker(0, "cppc_fie");
 	if (IS_ERR(kworker_fie)) {
 		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
@@ -231,7 +255,7 @@ static void __init cppc_freq_invariance_init(void)
 
 static void cppc_freq_invariance_exit(void)
 {
-	if (fie_disabled)
+	if (fie_disabled || !perf_ctrs_in_pcc)
 		return;
 
 	kthread_destroy_worker(kworker_fie);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs
  2025-11-04  6:50 [PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs Jie Zhan
@ 2025-11-10 16:49 ` Beata Michalska
  2025-11-11 11:30   ` Jie Zhan
  0 siblings, 1 reply; 5+ messages in thread
From: Beata Michalska @ 2025-11-10 16:49 UTC (permalink / raw)
  To: Jie Zhan
  Cc: viresh.kumar, rafael, ionela.voinescu, linux-pm, linux-acpi,
	linux-arm-kernel, linuxarm, zhenglifeng1, prime.zeng,
	jonathan.cameron

Hi Jie,
On Tue, Nov 04, 2025 at 02:50:39PM +0800, Jie Zhan wrote:
> Currently, the CPPC Frequency Invariance Engine (FIE) is invoked from the
> scheduler tick but defers the update of arch_freq_scale to a separate
> thread because cppc_get_perf_ctrs() would sleep if the CPC regs are in PCC.
> 
> However, this deferred update mechanism is unnecessary and introduces extra
> overhead for non-PCC register spaces (e.g. System Memory or FFH), where
> accessing the regs won't sleep and can be safely performed from the tick
> context.
> 
> Furthermore, with the CPPC FIE registered, it throws repeated warnings of
> "cppc_scale_freq_workfn: failed to read perf counters" on our platform with
> the CPC regs in System Memory and a power-down idle state enabled.  That's
> because the remote CPU can be in a power-down idle state, and reading its
> perf counters returns 0.  Moving the FIE handling back to the scheduler
> tick process makes the CPU handle its own perf counters, so it won't be
> idle and the issue would be inherently solved.
> 
> To address the above issues, update arch_freq_scale directly in ticks for
> non-PCC regs and keep the deferred update mechanism for PCC regs.
Something about it just didn’t sit right with me, and apparently, it needed some
time to settle down - thus the delay.

It all looks sensible though it might be worth to considered applying
the change on a per-CPU basis, as, in theory at least, different address
spaces are allowed for different registers (at least according to the ACPI
spec, if I read it right).
So I was thinking about smth along the lines of:

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 6c684e54fe01..07f4e59f2f0a 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1431,38 +1431,47 @@ EXPORT_SYMBOL_GPL(cppc_get_perf_caps);
  *
  * Return: true if any of the counters are in PCC regions, false otherwise
  */
-bool cppc_perf_ctrs_in_pcc(void)
+bool cppc_perf_ctrs_in_pcc(unsigned int cpu)
 {
-	int cpu;
+	struct cpc_register_resource *ref_perf_reg;
+	struct cpc_desc *cpc_desc;
 
-	for_each_present_cpu(cpu) {
-		struct cpc_register_resource *ref_perf_reg;
-		struct cpc_desc *cpc_desc;
+	cpc_desc = per_cpu(cpc_desc_ptr, cpu);
 
-		cpc_desc = per_cpu(cpc_desc_ptr, cpu);
+	if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
+	    CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
+	    CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
+		return true;
 
-		if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
-		    CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
-		    CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
-			return true;
 
+	ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
 
-		ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
+	/*
+	 * If reference perf register is not supported then we should
+	 * use the nominal perf value
+	 */
+	if (!CPC_SUPPORTED(ref_perf_reg))
+		ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
 
-		/*
-		 * If reference perf register is not supported then we should
-		 * use the nominal perf value
-		 */
-		if (!CPC_SUPPORTED(ref_perf_reg))
-			ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
+	if (CPC_IN_PCC(ref_perf_reg))
+		return true;
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);
 
-		if (CPC_IN_PCC(ref_perf_reg))
+bool cppc_any_perf_ctrs_in_pcc(void)
+{
+	int cpu;
+
+	for_each_present_cpu(cpu) {
+		if (cppc_perf_ctrs_in_pcc(cpu))
 			return true;
 	}
 
 	return false;
 }
-EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);
+EXPORT_SYMBOL_GPL(cppc_any_perf_ctrs_in_pcc);
 
 /**
  * cppc_get_perf_ctrs - Read a CPU's performance feedback counters.
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 4fcaec7e2034..fdf5a49c04ed 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -48,7 +48,6 @@ struct cppc_freq_invariance {
 };
 
 static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
-static bool perf_ctrs_in_pcc;
 static struct kthread_worker *kworker_fie;
 
 static int cppc_perf_from_fbctrs(struct cppc_perf_fb_ctrs *fb_ctrs_t0,
@@ -132,7 +131,12 @@ static void cppc_scale_freq_tick_pcc(void)
 
 static void cppc_scale_freq_tick(void)
 {
-	__cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id()));
+	unsigned int cpu = smp_processor_id();
+
+	cppc_perf_ctrs_in_pcc(cpu) ? cppc_scale_freq_tick_pcc()
+				   : __cppc_scale_freq_tick(
+				   			&per_cpu(cppc_freq_inv,
+				   				 cpu));
 }
 
 static struct scale_freq_data cppc_sftd = {
@@ -152,7 +156,7 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
 		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
 		cppc_fi->cpu = cpu;
 		cppc_fi->cpu_data = policy->driver_data;
-		if (perf_ctrs_in_pcc) {
+		if (cppc_perf_ctrs_in_pcc(cpu)) {
 			kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
 			init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
 		}
@@ -193,10 +197,9 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
 	/* policy->cpus will be empty here, use related_cpus instead */
 	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
 
-	if (!perf_ctrs_in_pcc)
-		return;
-
 	for_each_cpu(cpu, policy->related_cpus) {
+		if (!cppc_perf_ctrs_in_pcc(cpu))
+			continue;
 		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
 		irq_work_sync(&cppc_fi->irq_work);
 		kthread_cancel_work_sync(&cppc_fi->work);
@@ -218,14 +221,11 @@ static void __init cppc_freq_invariance_init(void)
 		.sched_deadline = 10 * NSEC_PER_MSEC,
 		.sched_period	= 10 * NSEC_PER_MSEC,
 	};
+	bool perf_ctrs_in_pcc = cppc_any_perf_ctrs_in_pcc();
 	int ret;
 
-	perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc();
-
 	if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) {
-		if (!perf_ctrs_in_pcc) {
-			fie_disabled = FIE_ENABLED;
-		} else {
+		if (perf_ctrs_in_pcc) {
 			pr_info("FIE not enabled on systems with registers in PCC\n");
 			fie_disabled = FIE_DISABLED;
 		}
@@ -234,12 +234,12 @@ static void __init cppc_freq_invariance_init(void)
 	if (fie_disabled || !perf_ctrs_in_pcc)
 		return;
 
-	cppc_sftd.set_freq_scale = cppc_scale_freq_tick_pcc;
 
 	kworker_fie = kthread_run_worker(0, "cppc_fie");
 	if (IS_ERR(kworker_fie)) {
 		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
 			PTR_ERR(kworker_fie));
+		kworker_fie = NULL;
 		fie_disabled = FIE_DISABLED;
 		return;
 	}
@@ -255,10 +255,8 @@ static void __init cppc_freq_invariance_init(void)
 
 static void cppc_freq_invariance_exit(void)
 {
-	if (fie_disabled || !perf_ctrs_in_pcc)
-		return;
-
-	kthread_destroy_worker(kworker_fie);
+	if (kworker_fie)
+		kthread_destroy_worker(kworker_fie);
 }
 
 #else
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 13fa81504844..3af503b12f60 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -154,7 +154,8 @@ extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
 extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
 extern int cppc_set_enable(int cpu, bool enable);
 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
-extern bool cppc_perf_ctrs_in_pcc(void);
+extern bool cppc_perf_ctrs_in_pcc(unsigned int cpu);
+extern bool cppc_any_perf_ctrs_in_pcc(void);
 extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf);
 extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq);
 extern bool acpi_cpc_valid(void);
@@ -204,7 +205,11 @@ static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
 {
 	return -EOPNOTSUPP;
 }
-static inline bool cppc_perf_ctrs_in_pcc(void)
+static inline bool cppc_perf_ctrs_in_pcc(unsigned int cpu)
+{
+	return false;
+}
+static inline bool cppc_any_perf_ctrs_in_pcc(void)
 {
 	return false;
 }


Additionally, it might be worth to get rid of (at least) some messages printed
on the path of reading the counters in case it is being done in tick context.

Also , I do not have access to any machine using PCC, and it would be good to
double check that as well.

---
BR
Beata
> 
> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
> ---
> We have tested this on Kunpeng SoCs with the CPC regs both in System Memory
> and FFH.  More tests on other platforms are welcome.
> 
> Changelog:
> 
> v3:
> - Stash the state of 'cppc_perf_ctrs_in_pcc' so it won't have to check the CPC
>   regs of all CPUs everywhere (Thanks to the suggestion from Beata Michalska).
> - Update the commit log, explaining more on the warning issue caused by
>   accessing perf counters on remote CPUs.
> - Drop Patch 1 that has been accepted, and rebase Patch 2 on that.
> 
> v2:
> https://lore.kernel.org/linux-pm/20250828110212.2108653-1-zhanjie9@hisilicon.com/
> - Update the cover letter and the commit log based on v1 discussion
> - Update FIE arch_freq_scale in ticks for non-PCC regs
> 
> v1:
> https://lore.kernel.org/linux-pm/20250730032312.167062-1-yubowen8@huawei.com/
> ---
>  drivers/cpufreq/cppc_cpufreq.c | 60 ++++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index 9eac77c4f294..4fcaec7e2034 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -48,37 +48,31 @@ struct cppc_freq_invariance {
>  };
>  
>  static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
> +static bool perf_ctrs_in_pcc;
>  static struct kthread_worker *kworker_fie;
>  
>  static int cppc_perf_from_fbctrs(struct cppc_perf_fb_ctrs *fb_ctrs_t0,
>  				 struct cppc_perf_fb_ctrs *fb_ctrs_t1);
>  
>  /**
> - * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance
> - * @work: The work item.
> + * __cppc_scale_freq_tick - CPPC arch_freq_scale updater for frequency invariance
> + * @cppc_fi: per-cpu CPPC FIE data.
>   *
> - * The CPPC driver register itself with the topology core to provide its own
> + * The CPPC driver registers itself with the topology core to provide its own
>   * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which
>   * gets called by the scheduler on every tick.
>   *
>   * Note that the arch specific counters have higher priority than CPPC counters,
>   * if available, though the CPPC driver doesn't need to have any special
>   * handling for that.
> - *
> - * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we
> - * reach here from hard-irq context), which then schedules a normal work item
> - * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable
> - * based on the counter updates since the last tick.
>   */
> -static void cppc_scale_freq_workfn(struct kthread_work *work)
> +static void __cppc_scale_freq_tick(struct cppc_freq_invariance *cppc_fi)
>  {
> -	struct cppc_freq_invariance *cppc_fi;
>  	struct cppc_perf_fb_ctrs fb_ctrs = {0};
>  	struct cppc_cpudata *cpu_data;
>  	unsigned long local_freq_scale;
>  	u64 perf;
>  
> -	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
>  	cpu_data = cppc_fi->cpu_data;
>  
>  	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
> @@ -102,6 +96,14 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
>  	per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale;
>  }
>  
> +static void cppc_scale_freq_workfn(struct kthread_work *work)
> +{
> +	struct cppc_freq_invariance *cppc_fi;
> +
> +	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
> +	__cppc_scale_freq_tick(cppc_fi);
> +}
> +
>  static void cppc_irq_work(struct irq_work *irq_work)
>  {
>  	struct cppc_freq_invariance *cppc_fi;
> @@ -110,7 +112,14 @@ static void cppc_irq_work(struct irq_work *irq_work)
>  	kthread_queue_work(kworker_fie, &cppc_fi->work);
>  }
>  
> -static void cppc_scale_freq_tick(void)
> +/*
> + * Reading perf counters may sleep if the CPC regs are in PCC.  Thus, we
> + * schedule an irq work in scale_freq_tick (since we reach here from hard-irq
> + * context), which then schedules a normal work item cppc_scale_freq_workfn()
> + * that updates the per_cpu arch_freq_scale variable based on the counter
> + * updates since the last tick.
> + */
> +static void cppc_scale_freq_tick_pcc(void)
>  {
>  	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id());
>  
> @@ -121,6 +130,11 @@ static void cppc_scale_freq_tick(void)
>  	irq_work_queue(&cppc_fi->irq_work);
>  }
>  
> +static void cppc_scale_freq_tick(void)
> +{
> +	__cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id()));
> +}
> +
>  static struct scale_freq_data cppc_sftd = {
>  	.source = SCALE_FREQ_SOURCE_CPPC,
>  	.set_freq_scale = cppc_scale_freq_tick,
> @@ -138,8 +152,10 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
>  		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
>  		cppc_fi->cpu = cpu;
>  		cppc_fi->cpu_data = policy->driver_data;
> -		kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> -		init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> +		if (perf_ctrs_in_pcc) {
> +			kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> +			init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> +		}
>  
>  		ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
>  
> @@ -177,6 +193,9 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
>  	/* policy->cpus will be empty here, use related_cpus instead */
>  	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
>  
> +	if (!perf_ctrs_in_pcc)
> +		return;
> +
>  	for_each_cpu(cpu, policy->related_cpus) {
>  		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
>  		irq_work_sync(&cppc_fi->irq_work);
> @@ -201,17 +220,22 @@ static void __init cppc_freq_invariance_init(void)
>  	};
>  	int ret;
>  
> +	perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc();
> +
>  	if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) {
> -		fie_disabled = FIE_ENABLED;
> -		if (cppc_perf_ctrs_in_pcc()) {
> +		if (!perf_ctrs_in_pcc) {
> +			fie_disabled = FIE_ENABLED;
> +		} else {
>  			pr_info("FIE not enabled on systems with registers in PCC\n");
>  			fie_disabled = FIE_DISABLED;
>  		}
>  	}
>  
> -	if (fie_disabled)
> +	if (fie_disabled || !perf_ctrs_in_pcc)
>  		return;
>  
> +	cppc_sftd.set_freq_scale = cppc_scale_freq_tick_pcc;
> +
>  	kworker_fie = kthread_run_worker(0, "cppc_fie");
>  	if (IS_ERR(kworker_fie)) {
>  		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
> @@ -231,7 +255,7 @@ static void __init cppc_freq_invariance_init(void)
>  
>  static void cppc_freq_invariance_exit(void)
>  {
> -	if (fie_disabled)
> +	if (fie_disabled || !perf_ctrs_in_pcc)
>  		return;
>  
>  	kthread_destroy_worker(kworker_fie);
> -- 
> 2.33.0
> 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs
  2025-11-10 16:49 ` Beata Michalska
@ 2025-11-11 11:30   ` Jie Zhan
  2025-11-13  8:04     ` Beata Michalska
  0 siblings, 1 reply; 5+ messages in thread
From: Jie Zhan @ 2025-11-11 11:30 UTC (permalink / raw)
  To: Beata Michalska
  Cc: viresh.kumar, rafael, ionela.voinescu, linux-pm, linux-acpi,
	linux-arm-kernel, linuxarm, zhenglifeng1, prime.zeng,
	jonathan.cameron



On 11/11/2025 12:49 AM, Beata Michalska wrote:
> Hi Jie,
> On Tue, Nov 04, 2025 at 02:50:39PM +0800, Jie Zhan wrote:
>> Currently, the CPPC Frequency Invariance Engine (FIE) is invoked from the
>> scheduler tick but defers the update of arch_freq_scale to a separate
>> thread because cppc_get_perf_ctrs() would sleep if the CPC regs are in PCC.
>>
>> However, this deferred update mechanism is unnecessary and introduces extra
>> overhead for non-PCC register spaces (e.g. System Memory or FFH), where
>> accessing the regs won't sleep and can be safely performed from the tick
>> context.
>>
>> Furthermore, with the CPPC FIE registered, it throws repeated warnings of
>> "cppc_scale_freq_workfn: failed to read perf counters" on our platform with
>> the CPC regs in System Memory and a power-down idle state enabled.  That's
>> because the remote CPU can be in a power-down idle state, and reading its
>> perf counters returns 0.  Moving the FIE handling back to the scheduler
>> tick process makes the CPU handle its own perf counters, so it won't be
>> idle and the issue would be inherently solved.
>>
>> To address the above issues, update arch_freq_scale directly in ticks for
>> non-PCC regs and keep the deferred update mechanism for PCC regs.
> Something about it just didn’t sit right with me, and apparently, it needed some
> time to settle down - thus the delay.
> 
> It all looks sensible though it might be worth to considered applying
> the change on a per-CPU basis, as, in theory at least, different address
> spaces are allowed for different registers (at least according to the ACPI
> spec, if I read it right).
> So I was thinking about smth along the lines of:
Beata,

Right, I see what you want to do.
Some comments inline.

Would you like to make it a full patch so I can include it in the next
version? or some other way?

Jie
> 
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index 6c684e54fe01..07f4e59f2f0a 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -1431,38 +1431,47 @@ EXPORT_SYMBOL_GPL(cppc_get_perf_caps);
>   *
>   * Return: true if any of the counters are in PCC regions, false otherwise
>   */
> -bool cppc_perf_ctrs_in_pcc(void)
> +bool cppc_perf_ctrs_in_pcc(unsigned int cpu)
>  {
> -	int cpu;
> +	struct cpc_register_resource *ref_perf_reg;
> +	struct cpc_desc *cpc_desc;
>  
> -	for_each_present_cpu(cpu) {
> -		struct cpc_register_resource *ref_perf_reg;
> -		struct cpc_desc *cpc_desc;
> +	cpc_desc = per_cpu(cpc_desc_ptr, cpu);
>  
> -		cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> +	if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
> +	    CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
> +	    CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
> +		return true;
>  
> -		if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
> -		    CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
> -		    CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
> -			return true;
>  
> +	ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
>  
> -		ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
> +	/*
> +	 * If reference perf register is not supported then we should
> +	 * use the nominal perf value
> +	 */
> +	if (!CPC_SUPPORTED(ref_perf_reg))
> +		ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
Though not related to this issue, I'm confused that this sort of workaround
appears here - it should be in some init function.
>  
> -		/*
> -		 * If reference perf register is not supported then we should
> -		 * use the nominal perf value
> -		 */
> -		if (!CPC_SUPPORTED(ref_perf_reg))
> -			ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
> +	if (CPC_IN_PCC(ref_perf_reg))
> +		return true;
> +
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);
>  
> -		if (CPC_IN_PCC(ref_perf_reg))
> +bool cppc_any_perf_ctrs_in_pcc(void)
> +{
> +	int cpu;
> +
> +	for_each_present_cpu(cpu) {
> +		if (cppc_perf_ctrs_in_pcc(cpu))
>  			return true;
>  	}
>  
>  	return false;
>  }
> -EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);
> +EXPORT_SYMBOL_GPL(cppc_any_perf_ctrs_in_pcc);
>  
>  /**
>   * cppc_get_perf_ctrs - Read a CPU's performance feedback counters.
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index 4fcaec7e2034..fdf5a49c04ed 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -48,7 +48,6 @@ struct cppc_freq_invariance {
>  };
>  
>  static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
> -static bool perf_ctrs_in_pcc;
>  static struct kthread_worker *kworker_fie;
>  
>  static int cppc_perf_from_fbctrs(struct cppc_perf_fb_ctrs *fb_ctrs_t0,
> @@ -132,7 +131,12 @@ static void cppc_scale_freq_tick_pcc(void)
>  
>  static void cppc_scale_freq_tick(void)
>  {
> -	__cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id()));
> +	unsigned int cpu = smp_processor_id();
> +
> +	cppc_perf_ctrs_in_pcc(cpu) ? cppc_scale_freq_tick_pcc()
Calling cppc_perf_ctrs_in_pcc() could be expensive here.
I'd prefer something like a static branch or a determined callback for each
cpu.
> +				   : __cppc_scale_freq_tick(
> +				   			&per_cpu(cppc_freq_inv,
> +				   				 cpu));
>  }
>  
>  static struct scale_freq_data cppc_sftd = {
> @@ -152,7 +156,7 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
>  		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
>  		cppc_fi->cpu = cpu;
>  		cppc_fi->cpu_data = policy->driver_data;
> -		if (perf_ctrs_in_pcc) {
> +		if (cppc_perf_ctrs_in_pcc(cpu)) {
>  			kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
>  			init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
>  		}
> @@ -193,10 +197,9 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
>  	/* policy->cpus will be empty here, use related_cpus instead */
>  	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
>  
> -	if (!perf_ctrs_in_pcc)
> -		return;
> -
>  	for_each_cpu(cpu, policy->related_cpus) {
> +		if (!cppc_perf_ctrs_in_pcc(cpu))
> +			continue;
>  		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
>  		irq_work_sync(&cppc_fi->irq_work);
>  		kthread_cancel_work_sync(&cppc_fi->work);
> @@ -218,14 +221,11 @@ static void __init cppc_freq_invariance_init(void)
>  		.sched_deadline = 10 * NSEC_PER_MSEC,
>  		.sched_period	= 10 * NSEC_PER_MSEC,
>  	};
> +	bool perf_ctrs_in_pcc = cppc_any_perf_ctrs_in_pcc();
>  	int ret;
>  
> -	perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc();
> -
>  	if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) {
> -		if (!perf_ctrs_in_pcc) {
> -			fie_disabled = FIE_ENABLED;
> -		} else {
> +		if (perf_ctrs_in_pcc) {
>  			pr_info("FIE not enabled on systems with registers in PCC\n");
>  			fie_disabled = FIE_DISABLED;
>  		}
> @@ -234,12 +234,12 @@ static void __init cppc_freq_invariance_init(void)
>  	if (fie_disabled || !perf_ctrs_in_pcc)
>  		return;
>  
> -	cppc_sftd.set_freq_scale = cppc_scale_freq_tick_pcc;
>  
>  	kworker_fie = kthread_run_worker(0, "cppc_fie");
>  	if (IS_ERR(kworker_fie)) {
>  		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
>  			PTR_ERR(kworker_fie));
> +		kworker_fie = NULL;
>  		fie_disabled = FIE_DISABLED;
>  		return;
>  	}
> @@ -255,10 +255,8 @@ static void __init cppc_freq_invariance_init(void)
>  
>  static void cppc_freq_invariance_exit(void)
>  {
> -	if (fie_disabled || !perf_ctrs_in_pcc)
> -		return;
> -
> -	kthread_destroy_worker(kworker_fie);
> +	if (kworker_fie)
> +		kthread_destroy_worker(kworker_fie);
>  }
>  
>  #else
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index 13fa81504844..3af503b12f60 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -154,7 +154,8 @@ extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
>  extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
>  extern int cppc_set_enable(int cpu, bool enable);
>  extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
> -extern bool cppc_perf_ctrs_in_pcc(void);
> +extern bool cppc_perf_ctrs_in_pcc(unsigned int cpu);
> +extern bool cppc_any_perf_ctrs_in_pcc(void);
would be slightly better to keep cppc_perf_ctrs_in_pcc(void) and add a new
function, e.g. cppc_perf_ctrs_in_pcc_cpu(unsigned int cpu), such that the
old ABI is unchanged.
>  extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf);
>  extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq);
>  extern bool acpi_cpc_valid(void);
> @@ -204,7 +205,11 @@ static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
>  {
>  	return -EOPNOTSUPP;
>  }
> -static inline bool cppc_perf_ctrs_in_pcc(void)
> +static inline bool cppc_perf_ctrs_in_pcc(unsigned int cpu)
> +{
> +	return false;
> +}
> +static inline bool cppc_any_perf_ctrs_in_pcc(void)
>  {
>  	return false;
>  }
> 
> 
> Additionally, it might be worth to get rid of (at least) some messages printed
> on the path of reading the counters in case it is being done in tick context.
Cool, will have a look.
> 
> Also , I do not have access to any machine using PCC, and it would be good to
> double check that as well.
> 
> ---
> BR
> Beata

>>
>> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
>> ---
>> We have tested this on Kunpeng SoCs with the CPC regs both in System Memory
>> and FFH.  More tests on other platforms are welcome.
>>
>> Changelog:
>>
>> v3:
>> - Stash the state of 'cppc_perf_ctrs_in_pcc' so it won't have to check the CPC
>>   regs of all CPUs everywhere (Thanks to the suggestion from Beata Michalska).
>> - Update the commit log, explaining more on the warning issue caused by
>>   accessing perf counters on remote CPUs.
>> - Drop Patch 1 that has been accepted, and rebase Patch 2 on that.
>>
>> v2:
>> https://lore.kernel.org/linux-pm/20250828110212.2108653-1-zhanjie9@hisilicon.com/
>> - Update the cover letter and the commit log based on v1 discussion
>> - Update FIE arch_freq_scale in ticks for non-PCC regs
>>
>> v1:
>> https://lore.kernel.org/linux-pm/20250730032312.167062-1-yubowen8@huawei.com/
>> ---
>>  drivers/cpufreq/cppc_cpufreq.c | 60 ++++++++++++++++++++++++----------
>>  1 file changed, 42 insertions(+), 18 deletions(-)
...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs
  2025-11-11 11:30   ` Jie Zhan
@ 2025-11-13  8:04     ` Beata Michalska
  2025-11-13  8:18       ` Jie Zhan
  0 siblings, 1 reply; 5+ messages in thread
From: Beata Michalska @ 2025-11-13  8:04 UTC (permalink / raw)
  To: Jie Zhan
  Cc: viresh.kumar, rafael, ionela.voinescu, linux-pm, linux-acpi,
	linux-arm-kernel, linuxarm, zhenglifeng1, prime.zeng,
	jonathan.cameron

On Tue, Nov 11, 2025 at 07:30:09PM +0800, Jie Zhan wrote:
> 
> 
> On 11/11/2025 12:49 AM, Beata Michalska wrote:
> > Hi Jie,
> > On Tue, Nov 04, 2025 at 02:50:39PM +0800, Jie Zhan wrote:
> >> Currently, the CPPC Frequency Invariance Engine (FIE) is invoked from the
> >> scheduler tick but defers the update of arch_freq_scale to a separate
> >> thread because cppc_get_perf_ctrs() would sleep if the CPC regs are in PCC.
> >>
> >> However, this deferred update mechanism is unnecessary and introduces extra
> >> overhead for non-PCC register spaces (e.g. System Memory or FFH), where
> >> accessing the regs won't sleep and can be safely performed from the tick
> >> context.
> >>
> >> Furthermore, with the CPPC FIE registered, it throws repeated warnings of
> >> "cppc_scale_freq_workfn: failed to read perf counters" on our platform with
> >> the CPC regs in System Memory and a power-down idle state enabled.  That's
> >> because the remote CPU can be in a power-down idle state, and reading its
> >> perf counters returns 0.  Moving the FIE handling back to the scheduler
> >> tick process makes the CPU handle its own perf counters, so it won't be
> >> idle and the issue would be inherently solved.
> >>
> >> To address the above issues, update arch_freq_scale directly in ticks for
> >> non-PCC regs and keep the deferred update mechanism for PCC regs.
> > Something about it just didn’t sit right with me, and apparently, it needed some
> > time to settle down - thus the delay.
> > 
> > It all looks sensible though it might be worth to considered applying
> > the change on a per-CPU basis, as, in theory at least, different address
> > spaces are allowed for different registers (at least according to the ACPI
> > spec, if I read it right).
> > So I was thinking about smth along the lines of:
> Beata,
> 
> Right, I see what you want to do.
> Some comments inline.
> 
> Would you like to make it a full patch so I can include it in the next
> version? or some other way?
What I have shared was just to ilustrate the idea, so if that's ok with you,
you might carry on with that as well ?

---
BR
Beata
> 
> Jie
> > 
> > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > index 6c684e54fe01..07f4e59f2f0a 100644
> > --- a/drivers/acpi/cppc_acpi.c
> > +++ b/drivers/acpi/cppc_acpi.c
> > @@ -1431,38 +1431,47 @@ EXPORT_SYMBOL_GPL(cppc_get_perf_caps);
> >   *
> >   * Return: true if any of the counters are in PCC regions, false otherwise
> >   */
> > -bool cppc_perf_ctrs_in_pcc(void)
> > +bool cppc_perf_ctrs_in_pcc(unsigned int cpu)
> >  {
> > -	int cpu;
> > +	struct cpc_register_resource *ref_perf_reg;
> > +	struct cpc_desc *cpc_desc;
> >  
> > -	for_each_present_cpu(cpu) {
> > -		struct cpc_register_resource *ref_perf_reg;
> > -		struct cpc_desc *cpc_desc;
> > +	cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> >  
> > -		cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> > +	if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
> > +	    CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
> > +	    CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
> > +		return true;
> >  
> > -		if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
> > -		    CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
> > -		    CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
> > -			return true;
> >  
> > +	ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
> >  
> > -		ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
> > +	/*
> > +	 * If reference perf register is not supported then we should
> > +	 * use the nominal perf value
> > +	 */
> > +	if (!CPC_SUPPORTED(ref_perf_reg))
> > +		ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
> Though not related to this issue, I'm confused that this sort of workaround
> appears here - it should be in some init function.
> >  
> > -		/*
> > -		 * If reference perf register is not supported then we should
> > -		 * use the nominal perf value
> > -		 */
> > -		if (!CPC_SUPPORTED(ref_perf_reg))
> > -			ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
> > +	if (CPC_IN_PCC(ref_perf_reg))
> > +		return true;
> > +
> > +	return false;
> > +}
> > +EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);
> >  
> > -		if (CPC_IN_PCC(ref_perf_reg))
> > +bool cppc_any_perf_ctrs_in_pcc(void)
> > +{
> > +	int cpu;
> > +
> > +	for_each_present_cpu(cpu) {
> > +		if (cppc_perf_ctrs_in_pcc(cpu))
> >  			return true;
> >  	}
> >  
> >  	return false;
> >  }
> > -EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);
> > +EXPORT_SYMBOL_GPL(cppc_any_perf_ctrs_in_pcc);
> >  
> >  /**
> >   * cppc_get_perf_ctrs - Read a CPU's performance feedback counters.
> > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> > index 4fcaec7e2034..fdf5a49c04ed 100644
> > --- a/drivers/cpufreq/cppc_cpufreq.c
> > +++ b/drivers/cpufreq/cppc_cpufreq.c
> > @@ -48,7 +48,6 @@ struct cppc_freq_invariance {
> >  };
> >  
> >  static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
> > -static bool perf_ctrs_in_pcc;
> >  static struct kthread_worker *kworker_fie;
> >  
> >  static int cppc_perf_from_fbctrs(struct cppc_perf_fb_ctrs *fb_ctrs_t0,
> > @@ -132,7 +131,12 @@ static void cppc_scale_freq_tick_pcc(void)
> >  
> >  static void cppc_scale_freq_tick(void)
> >  {
> > -	__cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id()));
> > +	unsigned int cpu = smp_processor_id();
> > +
> > +	cppc_perf_ctrs_in_pcc(cpu) ? cppc_scale_freq_tick_pcc()
> Calling cppc_perf_ctrs_in_pcc() could be expensive here.
> I'd prefer something like a static branch or a determined callback for each
> cpu.
> > +				   : __cppc_scale_freq_tick(
> > +				   			&per_cpu(cppc_freq_inv,
> > +				   				 cpu));
> >  }
> >  
> >  static struct scale_freq_data cppc_sftd = {
> > @@ -152,7 +156,7 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
> >  		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> >  		cppc_fi->cpu = cpu;
> >  		cppc_fi->cpu_data = policy->driver_data;
> > -		if (perf_ctrs_in_pcc) {
> > +		if (cppc_perf_ctrs_in_pcc(cpu)) {
> >  			kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> >  			init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> >  		}
> > @@ -193,10 +197,9 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> >  	/* policy->cpus will be empty here, use related_cpus instead */
> >  	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
> >  
> > -	if (!perf_ctrs_in_pcc)
> > -		return;
> > -
> >  	for_each_cpu(cpu, policy->related_cpus) {
> > +		if (!cppc_perf_ctrs_in_pcc(cpu))
> > +			continue;
> >  		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> >  		irq_work_sync(&cppc_fi->irq_work);
> >  		kthread_cancel_work_sync(&cppc_fi->work);
> > @@ -218,14 +221,11 @@ static void __init cppc_freq_invariance_init(void)
> >  		.sched_deadline = 10 * NSEC_PER_MSEC,
> >  		.sched_period	= 10 * NSEC_PER_MSEC,
> >  	};
> > +	bool perf_ctrs_in_pcc = cppc_any_perf_ctrs_in_pcc();
> >  	int ret;
> >  
> > -	perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc();
> > -
> >  	if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) {
> > -		if (!perf_ctrs_in_pcc) {
> > -			fie_disabled = FIE_ENABLED;
> > -		} else {
> > +		if (perf_ctrs_in_pcc) {
> >  			pr_info("FIE not enabled on systems with registers in PCC\n");
> >  			fie_disabled = FIE_DISABLED;
> >  		}
> > @@ -234,12 +234,12 @@ static void __init cppc_freq_invariance_init(void)
> >  	if (fie_disabled || !perf_ctrs_in_pcc)
> >  		return;
> >  
> > -	cppc_sftd.set_freq_scale = cppc_scale_freq_tick_pcc;
> >  
> >  	kworker_fie = kthread_run_worker(0, "cppc_fie");
> >  	if (IS_ERR(kworker_fie)) {
> >  		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
> >  			PTR_ERR(kworker_fie));
> > +		kworker_fie = NULL;
> >  		fie_disabled = FIE_DISABLED;
> >  		return;
> >  	}
> > @@ -255,10 +255,8 @@ static void __init cppc_freq_invariance_init(void)
> >  
> >  static void cppc_freq_invariance_exit(void)
> >  {
> > -	if (fie_disabled || !perf_ctrs_in_pcc)
> > -		return;
> > -
> > -	kthread_destroy_worker(kworker_fie);
> > +	if (kworker_fie)
> > +		kthread_destroy_worker(kworker_fie);
> >  }
> >  
> >  #else
> > diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> > index 13fa81504844..3af503b12f60 100644
> > --- a/include/acpi/cppc_acpi.h
> > +++ b/include/acpi/cppc_acpi.h
> > @@ -154,7 +154,8 @@ extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
> >  extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
> >  extern int cppc_set_enable(int cpu, bool enable);
> >  extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
> > -extern bool cppc_perf_ctrs_in_pcc(void);
> > +extern bool cppc_perf_ctrs_in_pcc(unsigned int cpu);
> > +extern bool cppc_any_perf_ctrs_in_pcc(void);
> would be slightly better to keep cppc_perf_ctrs_in_pcc(void) and add a new
> function, e.g. cppc_perf_ctrs_in_pcc_cpu(unsigned int cpu), such that the
> old ABI is unchanged.
> >  extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf);
> >  extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq);
> >  extern bool acpi_cpc_valid(void);
> > @@ -204,7 +205,11 @@ static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
> >  {
> >  	return -EOPNOTSUPP;
> >  }
> > -static inline bool cppc_perf_ctrs_in_pcc(void)
> > +static inline bool cppc_perf_ctrs_in_pcc(unsigned int cpu)
> > +{
> > +	return false;
> > +}
> > +static inline bool cppc_any_perf_ctrs_in_pcc(void)
> >  {
> >  	return false;
> >  }
> > 
> > 
> > Additionally, it might be worth to get rid of (at least) some messages printed
> > on the path of reading the counters in case it is being done in tick context.
> Cool, will have a look.
> > 
> > Also , I do not have access to any machine using PCC, and it would be good to
> > double check that as well.
> > 
> > ---
> > BR
> > Beata
> 
> >>
> >> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
> >> ---
> >> We have tested this on Kunpeng SoCs with the CPC regs both in System Memory
> >> and FFH.  More tests on other platforms are welcome.
> >>
> >> Changelog:
> >>
> >> v3:
> >> - Stash the state of 'cppc_perf_ctrs_in_pcc' so it won't have to check the CPC
> >>   regs of all CPUs everywhere (Thanks to the suggestion from Beata Michalska).
> >> - Update the commit log, explaining more on the warning issue caused by
> >>   accessing perf counters on remote CPUs.
> >> - Drop Patch 1 that has been accepted, and rebase Patch 2 on that.
> >>
> >> v2:
> >> https://lore.kernel.org/linux-pm/20250828110212.2108653-1-zhanjie9@hisilicon.com/
> >> - Update the cover letter and the commit log based on v1 discussion
> >> - Update FIE arch_freq_scale in ticks for non-PCC regs
> >>
> >> v1:
> >> https://lore.kernel.org/linux-pm/20250730032312.167062-1-yubowen8@huawei.com/
> >> ---
> >>  drivers/cpufreq/cppc_cpufreq.c | 60 ++++++++++++++++++++++++----------
> >>  1 file changed, 42 insertions(+), 18 deletions(-)
> ...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs
  2025-11-13  8:04     ` Beata Michalska
@ 2025-11-13  8:18       ` Jie Zhan
  0 siblings, 0 replies; 5+ messages in thread
From: Jie Zhan @ 2025-11-13  8:18 UTC (permalink / raw)
  To: Beata Michalska
  Cc: viresh.kumar, rafael, ionela.voinescu, linux-pm, linux-acpi,
	linux-arm-kernel, linuxarm, zhenglifeng1, prime.zeng,
	jonathan.cameron



On 11/13/2025 4:04 PM, Beata Michalska wrote:
> On Tue, Nov 11, 2025 at 07:30:09PM +0800, Jie Zhan wrote:
>>
>>
>> On 11/11/2025 12:49 AM, Beata Michalska wrote:
>>> Hi Jie,
>>> On Tue, Nov 04, 2025 at 02:50:39PM +0800, Jie Zhan wrote:
>>>> Currently, the CPPC Frequency Invariance Engine (FIE) is invoked from the
>>>> scheduler tick but defers the update of arch_freq_scale to a separate
>>>> thread because cppc_get_perf_ctrs() would sleep if the CPC regs are in PCC.
>>>>
>>>> However, this deferred update mechanism is unnecessary and introduces extra
>>>> overhead for non-PCC register spaces (e.g. System Memory or FFH), where
>>>> accessing the regs won't sleep and can be safely performed from the tick
>>>> context.
>>>>
>>>> Furthermore, with the CPPC FIE registered, it throws repeated warnings of
>>>> "cppc_scale_freq_workfn: failed to read perf counters" on our platform with
>>>> the CPC regs in System Memory and a power-down idle state enabled.  That's
>>>> because the remote CPU can be in a power-down idle state, and reading its
>>>> perf counters returns 0.  Moving the FIE handling back to the scheduler
>>>> tick process makes the CPU handle its own perf counters, so it won't be
>>>> idle and the issue would be inherently solved.
>>>>
>>>> To address the above issues, update arch_freq_scale directly in ticks for
>>>> non-PCC regs and keep the deferred update mechanism for PCC regs.
>>> Something about it just didn’t sit right with me, and apparently, it needed some
>>> time to settle down - thus the delay.
>>>
>>> It all looks sensible though it might be worth to considered applying
>>> the change on a per-CPU basis, as, in theory at least, different address
>>> spaces are allowed for different registers (at least according to the ACPI
>>> spec, if I read it right).
>>> So I was thinking about smth along the lines of:
>> Beata,
>>
>> Right, I see what you want to do.
>> Some comments inline.
>>
>> Would you like to make it a full patch so I can include it in the next
>> version? or some other way?
> What I have shared was just to ilustrate the idea, so if that's ok with you,
> you might carry on with that as well ?
> 
> ---
> BR
> Beata
Sure, I will try to incorporate this and update.

...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-11-13  8:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04  6:50 [PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs Jie Zhan
2025-11-10 16:49 ` Beata Michalska
2025-11-11 11:30   ` Jie Zhan
2025-11-13  8:04     ` Beata Michalska
2025-11-13  8:18       ` Jie Zhan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).