From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C694BCCF9E3 for ; Mon, 10 Nov 2025 16:50:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To: Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Z+WnvqWCJYEUl805I+vlGGEdpLidJYMhGfUXgmt74l8=; b=dNcORGgGyR7252HOJRTPlktaVM MFRiNCjP1FOACn2VYjdNT/mreAJcGNwSUYz/9oU4pe+71kU2JVq2CGE0doBkRzo04LfPOfRcnyhCT wGNiB1FacbUKkg65ZYRUxk3Q4VR0e5lrVoZ19jviEEsJ5DYRsaF7IHz+9Rbx4jO8Onwf/VOheVkXL TXBcr5vl3uRgyN7BgG0GAaTyQElxDRm6Rn5bExSIfRz7cx+NfKMZPJgpq3kaP2hk5LVLAQzFUyMsa 1LDFgeQZ7/jilbDf7CY+HDa3zH1BSRfKkvc6bQ0HudpeTi2VJEZ/blpxmwDY50iu0V5HmOk9rsdav k33lkOZw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1vIV5b-00000005r8M-3Lxq; Mon, 10 Nov 2025 16:49:59 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1vIV5Z-00000005r7p-19pK for linux-arm-kernel@lists.infradead.org; Mon, 10 Nov 2025 16:49:58 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CD43D2B; Mon, 10 Nov 2025 08:49:46 -0800 (PST) Received: from arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BB2AB3F66E; Mon, 10 Nov 2025 08:49:50 -0800 (PST) Date: Mon, 10 Nov 2025 17:49:47 +0100 From: Beata Michalska To: Jie Zhan Cc: viresh.kumar@linaro.org, rafael@kernel.org, ionela.voinescu@arm.com, linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linuxarm@huawei.com, zhenglifeng1@huawei.com, prime.zeng@hisilicon.com, jonathan.cameron@huawei.com Subject: Re: [PATCH v3] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs Message-ID: References: <20251104065039.1675549-1-zhanjie9@hisilicon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20251104065039.1675549-1-zhanjie9@hisilicon.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20251110_084957_398778_766CEEEB X-CRM114-Status: GOOD ( 49.86 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Jie, On Tue, Nov 04, 2025 at 02:50:39PM +0800, Jie Zhan wrote: > Currently, the CPPC Frequency Invariance Engine (FIE) is invoked from the > scheduler tick but defers the update of arch_freq_scale to a separate > thread because cppc_get_perf_ctrs() would sleep if the CPC regs are in PCC. > > However, this deferred update mechanism is unnecessary and introduces extra > overhead for non-PCC register spaces (e.g. System Memory or FFH), where > accessing the regs won't sleep and can be safely performed from the tick > context. > > Furthermore, with the CPPC FIE registered, it throws repeated warnings of > "cppc_scale_freq_workfn: failed to read perf counters" on our platform with > the CPC regs in System Memory and a power-down idle state enabled. That's > because the remote CPU can be in a power-down idle state, and reading its > perf counters returns 0. Moving the FIE handling back to the scheduler > tick process makes the CPU handle its own perf counters, so it won't be > idle and the issue would be inherently solved. > > To address the above issues, update arch_freq_scale directly in ticks for > non-PCC regs and keep the deferred update mechanism for PCC regs. Something about it just didn’t sit right with me, and apparently, it needed some time to settle down - thus the delay. It all looks sensible though it might be worth to considered applying the change on a per-CPU basis, as, in theory at least, different address spaces are allowed for different registers (at least according to the ACPI spec, if I read it right). So I was thinking about smth along the lines of: diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 6c684e54fe01..07f4e59f2f0a 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1431,38 +1431,47 @@ EXPORT_SYMBOL_GPL(cppc_get_perf_caps); * * Return: true if any of the counters are in PCC regions, false otherwise */ -bool cppc_perf_ctrs_in_pcc(void) +bool cppc_perf_ctrs_in_pcc(unsigned int cpu) { - int cpu; + struct cpc_register_resource *ref_perf_reg; + struct cpc_desc *cpc_desc; - for_each_present_cpu(cpu) { - struct cpc_register_resource *ref_perf_reg; - struct cpc_desc *cpc_desc; + cpc_desc = per_cpu(cpc_desc_ptr, cpu); - cpc_desc = per_cpu(cpc_desc_ptr, cpu); + if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) || + CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) || + CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME])) + return true; - if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) || - CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) || - CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME])) - return true; + ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF]; - ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF]; + /* + * If reference perf register is not supported then we should + * use the nominal perf value + */ + if (!CPC_SUPPORTED(ref_perf_reg)) + ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF]; - /* - * If reference perf register is not supported then we should - * use the nominal perf value - */ - if (!CPC_SUPPORTED(ref_perf_reg)) - ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF]; + if (CPC_IN_PCC(ref_perf_reg)) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc); - if (CPC_IN_PCC(ref_perf_reg)) +bool cppc_any_perf_ctrs_in_pcc(void) +{ + int cpu; + + for_each_present_cpu(cpu) { + if (cppc_perf_ctrs_in_pcc(cpu)) return true; } return false; } -EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc); +EXPORT_SYMBOL_GPL(cppc_any_perf_ctrs_in_pcc); /** * cppc_get_perf_ctrs - Read a CPU's performance feedback counters. diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 4fcaec7e2034..fdf5a49c04ed 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -48,7 +48,6 @@ struct cppc_freq_invariance { }; static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv); -static bool perf_ctrs_in_pcc; static struct kthread_worker *kworker_fie; static int cppc_perf_from_fbctrs(struct cppc_perf_fb_ctrs *fb_ctrs_t0, @@ -132,7 +131,12 @@ static void cppc_scale_freq_tick_pcc(void) static void cppc_scale_freq_tick(void) { - __cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id())); + unsigned int cpu = smp_processor_id(); + + cppc_perf_ctrs_in_pcc(cpu) ? cppc_scale_freq_tick_pcc() + : __cppc_scale_freq_tick( + &per_cpu(cppc_freq_inv, + cpu)); } static struct scale_freq_data cppc_sftd = { @@ -152,7 +156,7 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) cppc_fi = &per_cpu(cppc_freq_inv, cpu); cppc_fi->cpu = cpu; cppc_fi->cpu_data = policy->driver_data; - if (perf_ctrs_in_pcc) { + if (cppc_perf_ctrs_in_pcc(cpu)) { kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn); init_irq_work(&cppc_fi->irq_work, cppc_irq_work); } @@ -193,10 +197,9 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) /* policy->cpus will be empty here, use related_cpus instead */ topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus); - if (!perf_ctrs_in_pcc) - return; - for_each_cpu(cpu, policy->related_cpus) { + if (!cppc_perf_ctrs_in_pcc(cpu)) + continue; cppc_fi = &per_cpu(cppc_freq_inv, cpu); irq_work_sync(&cppc_fi->irq_work); kthread_cancel_work_sync(&cppc_fi->work); @@ -218,14 +221,11 @@ static void __init cppc_freq_invariance_init(void) .sched_deadline = 10 * NSEC_PER_MSEC, .sched_period = 10 * NSEC_PER_MSEC, }; + bool perf_ctrs_in_pcc = cppc_any_perf_ctrs_in_pcc(); int ret; - perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc(); - if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) { - if (!perf_ctrs_in_pcc) { - fie_disabled = FIE_ENABLED; - } else { + if (perf_ctrs_in_pcc) { pr_info("FIE not enabled on systems with registers in PCC\n"); fie_disabled = FIE_DISABLED; } @@ -234,12 +234,12 @@ static void __init cppc_freq_invariance_init(void) if (fie_disabled || !perf_ctrs_in_pcc) return; - cppc_sftd.set_freq_scale = cppc_scale_freq_tick_pcc; kworker_fie = kthread_run_worker(0, "cppc_fie"); if (IS_ERR(kworker_fie)) { pr_warn("%s: failed to create kworker_fie: %ld\n", __func__, PTR_ERR(kworker_fie)); + kworker_fie = NULL; fie_disabled = FIE_DISABLED; return; } @@ -255,10 +255,8 @@ static void __init cppc_freq_invariance_init(void) static void cppc_freq_invariance_exit(void) { - if (fie_disabled || !perf_ctrs_in_pcc) - return; - - kthread_destroy_worker(kworker_fie); + if (kworker_fie) + kthread_destroy_worker(kworker_fie); } #else diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h index 13fa81504844..3af503b12f60 100644 --- a/include/acpi/cppc_acpi.h +++ b/include/acpi/cppc_acpi.h @@ -154,7 +154,8 @@ extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); extern int cppc_set_enable(int cpu, bool enable); extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps); -extern bool cppc_perf_ctrs_in_pcc(void); +extern bool cppc_perf_ctrs_in_pcc(unsigned int cpu); +extern bool cppc_any_perf_ctrs_in_pcc(void); extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf); extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq); extern bool acpi_cpc_valid(void); @@ -204,7 +205,11 @@ static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps) { return -EOPNOTSUPP; } -static inline bool cppc_perf_ctrs_in_pcc(void) +static inline bool cppc_perf_ctrs_in_pcc(unsigned int cpu) +{ + return false; +} +static inline bool cppc_any_perf_ctrs_in_pcc(void) { return false; } Additionally, it might be worth to get rid of (at least) some messages printed on the path of reading the counters in case it is being done in tick context. Also , I do not have access to any machine using PCC, and it would be good to double check that as well. --- BR Beata > > Signed-off-by: Jie Zhan > --- > We have tested this on Kunpeng SoCs with the CPC regs both in System Memory > and FFH. More tests on other platforms are welcome. > > Changelog: > > v3: > - Stash the state of 'cppc_perf_ctrs_in_pcc' so it won't have to check the CPC > regs of all CPUs everywhere (Thanks to the suggestion from Beata Michalska). > - Update the commit log, explaining more on the warning issue caused by > accessing perf counters on remote CPUs. > - Drop Patch 1 that has been accepted, and rebase Patch 2 on that. > > v2: > https://lore.kernel.org/linux-pm/20250828110212.2108653-1-zhanjie9@hisilicon.com/ > - Update the cover letter and the commit log based on v1 discussion > - Update FIE arch_freq_scale in ticks for non-PCC regs > > v1: > https://lore.kernel.org/linux-pm/20250730032312.167062-1-yubowen8@huawei.com/ > --- > drivers/cpufreq/cppc_cpufreq.c | 60 ++++++++++++++++++++++++---------- > 1 file changed, 42 insertions(+), 18 deletions(-) > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c > index 9eac77c4f294..4fcaec7e2034 100644 > --- a/drivers/cpufreq/cppc_cpufreq.c > +++ b/drivers/cpufreq/cppc_cpufreq.c > @@ -48,37 +48,31 @@ struct cppc_freq_invariance { > }; > > static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv); > +static bool perf_ctrs_in_pcc; > static struct kthread_worker *kworker_fie; > > static int cppc_perf_from_fbctrs(struct cppc_perf_fb_ctrs *fb_ctrs_t0, > struct cppc_perf_fb_ctrs *fb_ctrs_t1); > > /** > - * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance > - * @work: The work item. > + * __cppc_scale_freq_tick - CPPC arch_freq_scale updater for frequency invariance > + * @cppc_fi: per-cpu CPPC FIE data. > * > - * The CPPC driver register itself with the topology core to provide its own > + * The CPPC driver registers itself with the topology core to provide its own > * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which > * gets called by the scheduler on every tick. > * > * Note that the arch specific counters have higher priority than CPPC counters, > * if available, though the CPPC driver doesn't need to have any special > * handling for that. > - * > - * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we > - * reach here from hard-irq context), which then schedules a normal work item > - * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable > - * based on the counter updates since the last tick. > */ > -static void cppc_scale_freq_workfn(struct kthread_work *work) > +static void __cppc_scale_freq_tick(struct cppc_freq_invariance *cppc_fi) > { > - struct cppc_freq_invariance *cppc_fi; > struct cppc_perf_fb_ctrs fb_ctrs = {0}; > struct cppc_cpudata *cpu_data; > unsigned long local_freq_scale; > u64 perf; > > - cppc_fi = container_of(work, struct cppc_freq_invariance, work); > cpu_data = cppc_fi->cpu_data; > > if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) { > @@ -102,6 +96,14 @@ static void cppc_scale_freq_workfn(struct kthread_work *work) > per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale; > } > > +static void cppc_scale_freq_workfn(struct kthread_work *work) > +{ > + struct cppc_freq_invariance *cppc_fi; > + > + cppc_fi = container_of(work, struct cppc_freq_invariance, work); > + __cppc_scale_freq_tick(cppc_fi); > +} > + > static void cppc_irq_work(struct irq_work *irq_work) > { > struct cppc_freq_invariance *cppc_fi; > @@ -110,7 +112,14 @@ static void cppc_irq_work(struct irq_work *irq_work) > kthread_queue_work(kworker_fie, &cppc_fi->work); > } > > -static void cppc_scale_freq_tick(void) > +/* > + * Reading perf counters may sleep if the CPC regs are in PCC. Thus, we > + * schedule an irq work in scale_freq_tick (since we reach here from hard-irq > + * context), which then schedules a normal work item cppc_scale_freq_workfn() > + * that updates the per_cpu arch_freq_scale variable based on the counter > + * updates since the last tick. > + */ > +static void cppc_scale_freq_tick_pcc(void) > { > struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id()); > > @@ -121,6 +130,11 @@ static void cppc_scale_freq_tick(void) > irq_work_queue(&cppc_fi->irq_work); > } > > +static void cppc_scale_freq_tick(void) > +{ > + __cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id())); > +} > + > static struct scale_freq_data cppc_sftd = { > .source = SCALE_FREQ_SOURCE_CPPC, > .set_freq_scale = cppc_scale_freq_tick, > @@ -138,8 +152,10 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) > cppc_fi = &per_cpu(cppc_freq_inv, cpu); > cppc_fi->cpu = cpu; > cppc_fi->cpu_data = policy->driver_data; > - kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn); > - init_irq_work(&cppc_fi->irq_work, cppc_irq_work); > + if (perf_ctrs_in_pcc) { > + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn); > + init_irq_work(&cppc_fi->irq_work, cppc_irq_work); > + } > > ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs); > > @@ -177,6 +193,9 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) > /* policy->cpus will be empty here, use related_cpus instead */ > topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus); > > + if (!perf_ctrs_in_pcc) > + return; > + > for_each_cpu(cpu, policy->related_cpus) { > cppc_fi = &per_cpu(cppc_freq_inv, cpu); > irq_work_sync(&cppc_fi->irq_work); > @@ -201,17 +220,22 @@ static void __init cppc_freq_invariance_init(void) > }; > int ret; > > + perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc(); > + > if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) { > - fie_disabled = FIE_ENABLED; > - if (cppc_perf_ctrs_in_pcc()) { > + if (!perf_ctrs_in_pcc) { > + fie_disabled = FIE_ENABLED; > + } else { > pr_info("FIE not enabled on systems with registers in PCC\n"); > fie_disabled = FIE_DISABLED; > } > } > > - if (fie_disabled) > + if (fie_disabled || !perf_ctrs_in_pcc) > return; > > + cppc_sftd.set_freq_scale = cppc_scale_freq_tick_pcc; > + > kworker_fie = kthread_run_worker(0, "cppc_fie"); > if (IS_ERR(kworker_fie)) { > pr_warn("%s: failed to create kworker_fie: %ld\n", __func__, > @@ -231,7 +255,7 @@ static void __init cppc_freq_invariance_init(void) > > static void cppc_freq_invariance_exit(void) > { > - if (fie_disabled) > + if (fie_disabled || !perf_ctrs_in_pcc) > return; > > kthread_destroy_worker(kworker_fie); > -- > 2.33.0 >