linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor
@ 2024-09-10  2:46 chenshuo
  2024-09-10  9:13 ` Christian Loehle
  0 siblings, 1 reply; 5+ messages in thread
From: chenshuo @ 2024-09-10  2:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm

Hi Rafael,

I am encountering an issue related to the Energy Model (EM) when using cpufreq with the ondemand governor. Below is a detailed description:

1. Problem Description:
   When using cpufreq with the ondemand governor and enabling the energy model (EM), the CPU OPP table is configured with frequencies and voltages for each frequency point. Additionally, the `dynamic-power-coefficient` is configured in the DTS under the CPU node. However, I observe abnormal dynamic frequency scaling, where the CPU frequency always stays at the highest frequency point in the OPP table. Below is an example of the DTS configuration:
```
cpu0: cpu@0 
	{ 
		...
		operating-points-v2 = <&d0_cpu_opp_table>; 
		#cooling-cells = <2>; dynamic-power-coefficient = <2000>; };
		...
```
2. Root Cause Analysis:
When using the OPP table and configuring the "dynamic-power-coefficient," the `em_dev_register_perf_domain()` function in `kernel/power/energy_model.c` sets the flags to `EM_PERF_DOMAIN_MICROWATTS`. In the `em_create_perf_table()` function, `em_compute_costs()` includes the following code:
```
if (table[i].cost >= prev_cost) {
    table[i].flags = EM_PERF_STATE_INEFFICIENT;
    dev_dbg(dev, "EM: OPP:%lu is inefficient\n", table[i].frequency);
}
```
Since the cost is calculated as power * max_frequency / frequency, the cost for each frequency point becomes a constant value. Consequently, except for nr_states - 1 (where prev_state is initialized as ULONG_MAX), all other frequency points' cost is equal to prev_cost. As a result, only the highest frequency point (table[nr_states - 1]) is not flagged as EM_PERF_STATE_INEFFICIENT in the EM performance table.

In the em_cpufreq_update_efficiencies() function, the following code is executed:
```
for (i = 0; i < pd->nr_perf_states; i++) {
    if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
        continue;

    if (!cpufreq_table_set_inefficient(policy, table[i].frequency))
        found++;
}
```
As a result, all frequency points marked as EM_PERF_STATE_INEFFICIENT are flagged as CPUFREQ_INEFFICIENT_FREQ in the cpufreq_table_set_inefficient() function, causing these frequencies to be skipped during frequency scaling.

3. Proposed Change and Testing: 
On Linux 6.6, this behavior affects the normal operation of the cpufreq ondemand governor, which in turn causes passive cooling devices to malfunction when using the power allocator strategy in the thermal framework. I made a temporary fix by changing the condition from:
	if (table[i].cost >= prev_cost)
to:
	if (table[i].cost > prev_cost)
After this change, the issue seems resolved for now. However, I am concerned about potential side effects of this modification.

Could you please help clarify if there are any risks or negative consequences with this change? Why was the original condition designed to remove frequency points with the same cost from dynamic frequency scaling?

Best regards

chenshuo@eswincomputing.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-09-18  7:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10  2:46 PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor chenshuo
2024-09-10  9:13 ` Christian Loehle
2024-09-10 10:31   ` chenshuo
2024-09-18  6:41     ` chenshuo
2024-09-18  7:48       ` Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).