Linux Power Management development
 help / color / mirror / Atom feed
From: Aboorva Devarajan <aboorvad@linux.ibm.com>
To: Lifeng Zheng <zhenglifeng1@huawei.com>,
	rafael@kernel.org, viresh.kumar@linaro.org
Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com, jonathan.cameron@huawei.com,
	zhanjie9@hisilicon.com, lihuisong@huawei.com,
	fanghao11@huawei.com, gautam@linux.ibm.com,
	aboorvad@linux.ibm.com
Subject: Re: [PATCH v2 2/4] cpufreq: Introduce a more generic way to set default per-policy boost flag
Date: Tue, 04 Feb 2025 22:11:50 +0530	[thread overview]
Message-ID: <c9e56c5f54cc33338762c94e9bed7b5a0d5de812.camel@linux.ibm.com> (raw)
In-Reply-To: <20250117101457.1530653-3-zhenglifeng1@huawei.com>

On Fri, 2025-01-17 at 18:14 +0800, Lifeng Zheng wrote:

> In cpufreq_online() of cpufreq.c, the per-policy boost flag is already set
> to mirror the cpufreq_driver boost during init but using freq_table to
> judge if the policy has boost frequency. There are two drawbacks to this
> approach:
> 
> 1. It doesn't work for the cpufreq drivers that do not use a frequency
> table. For now, acpi-cpufreq and amd-pstate have to enable boost in policy
> initialization. And cppc_cpufreq never set policy to boost when going
> online no matter what the cpufreq_driver boost flag is.
> 
> 2. If the cpu goes offline when cpufreq_driver boost enabled and then goes
> online when cpufreq_driver boost disabled, the per-policy boost flag will
> unreasonably remain true.
> 
> Running set_boost at the end of the online process is a more generic way
> for all cpufreq drivers.
> 
> Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
> ---
>  drivers/cpufreq/cpufreq.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 5882d7f5e3c1..5a3566c2eb8d 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1409,10 +1409,6 @@ static int cpufreq_online(unsigned int cpu)
>  			goto out_free_policy;
>  		}
>  
> -		/* Let the per-policy boost flag mirror the cpufreq_driver boost during init */
> -		if (cpufreq_boost_enabled() && policy_has_boost_freq(policy))
> -			policy->boost_enabled = true;
> -
>  		/*
>  		 * The initialization has succeeded and the policy is online.
>  		 * If there is a problem with its frequency table, take it
> @@ -1573,6 +1569,18 @@ static int cpufreq_online(unsigned int cpu)
>  	if (new_policy && cpufreq_thermal_control_enabled(cpufreq_driver))
>  		policy->cdev = of_cpufreq_cooling_register(policy);
>  
> +	/* Let the per-policy boost flag mirror the cpufreq_driver boost during init */
> +	if (policy->boost_enabled != cpufreq_boost_enabled()) {
> +		policy->boost_enabled = cpufreq_boost_enabled();
> +		ret = cpufreq_driver->set_boost(policy, policy->boost_enabled);
> +		if (ret) {
> +			/* If the set_boost fails, the online operation is not affected */
> +			pr_info("%s: CPU%d: Cannot %s BOOST\n", __func__, policy->cpu,
> +				policy->boost_enabled ? "enable" : "disable");
> +			policy->boost_enabled = !policy->boost_enabled;
> +		}
> +	}
> +
>  	pr_debug("initialization complete\n");
>  
>  
> 	return 0;

Hi,

This patch causes a boot-time crash on PowerNV (Power9-baremetal systems) when WoF
(Workload Optimized Frequency - boost) is enabled, starting from v6.14-rc1.

The crash happens due to null pointer dereference of the `set_boost` function.

`set_boost` is only assigned after the cpufreq driver is registered on PowerNV
as below, 

Initialization Flow: (powernv_cpufreq_init -> cpufreq_enable_boost_support -> 
                      initializes set_boost).

However, with this patch, `set_boost` is invoked in `cpufreq_register_driver`
before it is initialized.

Access Flow:         (powernv_cpufreq_init -> cpufreq_register_driver -> cpufreq_online -> 
                      attempts to access cpufreq_driver->set_boost,
                      which is still NULL at this point).

This causes a boot-time crash as follows:

[    9.393946][    T1] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[    9.393946][    T1] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[    9.396285][    T1] Faulting instruction address: 0x00000000
[    9.396285][    T1] Faulting instruction address: 0x00000000
[    9.398545][    T1] Oops: Kernel access of bad area, sig: 7 [#1]
[    9.398545][    T1] Oops: Kernel access of bad area, sig: 7 [#1]
[    9.398804][    T1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    9.398804][    T1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    9.400114][    T1] Modules linked in:
[    9.400114][    T1] Modules linked in:
[    9.400283][    T1] CPU: 19 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-dirty #23
[    9.400283][    T1] CPU: 19 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-dirty #23
[    9.400605][    T1] Hardware name: 0000000000000000 POWER9 0x4e1202 opal:v6.6-111-gd362ae4f-root-dirty-157d5e1 PowerNV
[    9.400605][    T1] Hardware name: 0000000000000000 POWER9 0x4e1202 opal:v6.6-111-gd362ae4f-root-dirty-157d5e1 PowerNV
[    9.403093][    T1] NIP:  0000000000000000 LR: c000000000f7a574 CTR: 0000000000000000
[    9.403093][    T1] NIP:  0000000000000000 LR: c000000000f7a574 CTR: 0000000000000000
[    9.404397][    T1] REGS: c00020000419f680 TRAP: 0400   Not tainted  (6.14.0-rc1-dirty)
[    9.404397][    T1] REGS: c00020000419f680 TRAP: 0400   Not tainted  (6.14.0-rc1-dirty)
[    9.407771][    T1] MSR:  9000000002089033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 84000482  XER: 00000000
[    9.407771][    T1] MSR:  9000000002089033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 84000482  XER: 00000000
[    9.409191][    T1] CFAR: c000000000f7a570 IRQMASK: 0 
[    9.409191][    T1] GPR00: c000000000f7a51c c00020000419f920 c000000001ea3200 c00020002455f800 
[    9.409191][    T1] GPR04: 0000000000000001 0000000000000000 0000000000000001 0000000000000000 
[    9.409191][    T1] GPR08: 0000000000000000 c000000002a1f280 0000000000000001 0000000000000000 
[    9.409191][    T1] GPR12: 0000000000000000 c000003ffefe0900 c00000000001101c 0000000000000000 
[    9.409191][    T1] GPR16: 0000000000000000 0000000000000000 c000000002c20b80 000000000000005a 
[    9.409191][    T1] GPR20: c00000007ffa0a14 c000000002973200 c000000002c1ff38 c00020002455fc48 
[    9.409191][    T1] GPR24: c0000000029752f8 c000000002c1ff10 c00020002455fcb8 0000000000000000 
[    9.409191][    T1] GPR28: c000000002a71be0 0000000000000000 c00020002455f800 c000000002a1f0a0 
[    9.409191][    T1] CFAR: c000000000f7a570 IRQMASK: 0 
[    9.409191][    T1] GPR00: c000000000f7a51c c00020000419f920 c000000001ea3200 c00020002455f800 
[    9.409191][    T1] GPR04: 0000000000000001 0000000000000000 0000000000000001 0000000000000000 
[    9.409191][    T1] GPR08: 0000000000000000 c000000002a1f280 0000000000000001 0000000000000000 
[    9.409191][    T1] GPR12: 0000000000000000 c000003ffefe0900 c00000000001101c 0000000000000000 
[    9.409191][    T1] GPR16: 0000000000000000 0000000000000000 c000000002c20b80 000000000000005a 
[    9.409191][    T1] GPR20: c00000007ffa0a14 c000000002973200 c000000002c1ff38 c00020002455fc48 
[    9.409191][    T1] GPR24: c0000000029752f8 c000000002c1ff10 c00020002455fcb8 0000000000000000 
[    9.409191][    T1] GPR28: c000000002a71be0 0000000000000000 c00020002455f800 c000000002a1f0a0 
[    9.415226][    T1] NIP [0000000000000000] 0x0
[    9.415226][    T1] NIP [0000000000000000] 0x0
[    9.417435][    T1] LR [c000000000f7a574] cpufreq_online+0x440/0xe14
[    9.417435][    T1] LR [c000000000f7a574] cpufreq_online+0x440/0xe14
[    9.419701][    T1] Call Trace:
[    9.419701][    T1] Call Trace:
[    9.422849][    T1] [c00020000419f920] [c000000000f7a51c] cpufreq_online+0x3e8/0xe14 (unreliable)
[    9.422849][    T1] [c00020000419f920] [c000000000f7a51c] cpufreq_online+0x3e8/0xe14 (unreliable)
[    9.430225][    T1] [c00020000419fa00] [c000000000f7b030] cpufreq_add_dev+0xb4/0xd8
[    9.430225][    T1] [c00020000419fa00] [c000000000f7b030] cpufreq_add_dev+0xb4/0xd8
[    9.434547][    T1] [c00020000419fa30] [c000000000c89e18] subsys_interface_register+0x18c/0x1d4
[    9.434547][    T1] [c00020000419fa30] [c000000000c89e18] subsys_interface_register+0x18c/0x1d4
[    9.440934][    T1] [c00020000419faa0] [c000000000f763a8] cpufreq_register_driver+0x1f0/0x370
[    9.440934][    T1] [c00020000419faa0] [c000000000f763a8] cpufreq_register_driver+0x1f0/0x370
[    9.444314][    T1] [c00020000419fb20] [c000000002093340] powernv_cpufreq_init+0x690/0xa10
[    9.444314][    T1] [c00020000419fb20] [c000000002093340] powernv_cpufreq_init+0x690/0xa10
[    9.444686][    T1] [c00020000419fc30] [c000000000010cb8] do_one_initcall+0x60/0x2c8
[    9.444686][    T1] [c00020000419fc30] [c000000000010cb8] do_one_initcall+0x60/0x2c8
[    9.445024][    T1] [c00020000419fd00] [c0000000020059ec] kernel_init_freeable+0x33c/0x530
[    9.445024][    T1] [c00020000419fd00] [c0000000020059ec] kernel_init_freeable+0x33c/0x530
[    9.446375][    T1] [c00020000419fde0] [c000000000011048] kernel_init+0x34/0x26c
[    9.446375][    T1] [c00020000419fde0] [c000000000011048] kernel_init+0x34/0x26c
[    9.448684][    T1] [c00020000419fe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
[    9.448684][    T1] [c00020000419fe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c

[    9.455840][    T1] ---[ end trace 0000000000000000 ]---
[    9.455840][    T1] ---[ end trace 0000000000000000 ]---

The fix will be to initialize set_boost earlier in powernv_cpufreq_init before calling
cpufreq_register_driver.

Quickly tried this patch and it resolves the issue:

---

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index ae79d909943b..2dd61de34a28 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -1127,8 +1127,10 @@ static int __init powernv_cpufreq_init(void)
        if (rc)
                goto out;
 
-       if (powernv_pstate_info.wof_enabled)
+       if (powernv_pstate_info.wof_enabled) {
                powernv_cpufreq_driver.boost_enabled = true;
+               powernv_cpufreq_driver.set_boost = cpufreq_boost_set_sw;
+       }
        else
                powernv_cpu_freq_attr[SCALING_BOOST_FREQS_ATTR_INDEX] = NULL;
 
@@ -1138,9 +1140,6 @@ static int __init powernv_cpufreq_init(void)
                goto cleanup;
        }
 
-       if (powernv_pstate_info.wof_enabled)
-               cpufreq_enable_boost_support();
-
        register_reboot_notifier(&powernv_cpufreq_reboot_nb);
        opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb);
 
---

I noticed that Viresh is working on a similar patch [1] as part of a broader patchset
to simplify boost handling, which should also resolve this issue.

Should we merge this patch [1] and related patches since this is causing a crash,
or submit a separate patch to fix this?

[1]: https://lore.kernel.org/all/9b4af20d5b415f41e866ddd8bde9cf6441c463b8.1737707712.git.viresh.kumar@linaro.org/

Regards,
Aboorva


  parent reply	other threads:[~2025-02-04 16:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17 10:14 [PATCH v2 0/4] cpufreq: Fix some boost errors related to CPU online and offline Lifeng Zheng
2025-01-17 10:14 ` [PATCH v2 1/4] cpufreq: Fix re-boost issue after hotplugging a cpu Lifeng Zheng
2025-01-20  8:27   ` Viresh Kumar
2025-01-20  9:10     ` zhenglifeng (A)
2025-01-20  9:21       ` Viresh Kumar
2025-01-20  9:21   ` Viresh Kumar
2025-01-17 10:14 ` [PATCH v2 2/4] cpufreq: Introduce a more generic way to set default per-policy boost flag Lifeng Zheng
2025-01-20  9:01   ` Viresh Kumar
2025-01-21  1:45     ` zhenglifeng (A)
2025-01-21  4:20       ` Viresh Kumar
2025-01-21  6:22         ` zhenglifeng (A)
2025-01-21  6:37           ` Viresh Kumar
2025-02-04 16:41   ` Aboorva Devarajan [this message]
2025-02-05  5:01     ` Viresh Kumar
2025-02-05 18:22       ` Aboorva Devarajan
2025-01-17 10:14 ` [PATCH v2 3/4] cpufreq: CPPC: Fix wrong max_freq in policy initialization Lifeng Zheng
2025-01-20  9:28   ` Viresh Kumar
2025-01-17 10:14 ` [PATCH v2 4/4] cpufreq: ACPI: Remove set_boost in acpi_cpufreq_cpu_init() Lifeng Zheng
2025-01-21  6:14   ` Viresh Kumar
2025-01-24  8:59     ` Viresh Kumar
2025-01-24  9:11       ` zhenglifeng (A)
2025-04-15 10:29   ` Viresh Kumar
2025-01-21  6:38 ` [PATCH v2 0/4] cpufreq: Fix some boost errors related to CPU online and offline Viresh Kumar
2025-01-23 20:07   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9e56c5f54cc33338762c94e9bed7b5a0d5de812.camel@linux.ibm.com \
    --to=aboorvad@linux.ibm.com \
    --cc=fanghao11@huawei.com \
    --cc=gautam@linux.ibm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=lihuisong@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=rafael@kernel.org \
    --cc=viresh.kumar@linaro.org \
    --cc=zhanjie9@hisilicon.com \
    --cc=zhenglifeng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox