PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor
@ 2024-09-10  2:46 chenshuo
  2024-09-10  9:13 ` Christian Loehle
  0 siblings, 1 reply; 5+ messages in thread
From: chenshuo @ 2024-09-10  2:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm

Hi Rafael,

I am encountering an issue related to the Energy Model (EM) when using cpufreq with the ondemand governor. Below is a detailed description:

1. Problem Description:
   When using cpufreq with the ondemand governor and enabling the energy model (EM), the CPU OPP table is configured with frequencies and voltages for each frequency point. Additionally, the `dynamic-power-coefficient` is configured in the DTS under the CPU node. However, I observe abnormal dynamic frequency scaling, where the CPU frequency always stays at the highest frequency point in the OPP table. Below is an example of the DTS configuration:
```
cpu0: cpu@0 
	{ 
		...
		operating-points-v2 = <&d0_cpu_opp_table>; 
		#cooling-cells = <2>; dynamic-power-coefficient = <2000>; };
		...
```
2. Root Cause Analysis:
When using the OPP table and configuring the "dynamic-power-coefficient," the `em_dev_register_perf_domain()` function in `kernel/power/energy_model.c` sets the flags to `EM_PERF_DOMAIN_MICROWATTS`. In the `em_create_perf_table()` function, `em_compute_costs()` includes the following code:
```
if (table[i].cost >= prev_cost) {
    table[i].flags = EM_PERF_STATE_INEFFICIENT;
    dev_dbg(dev, "EM: OPP:%lu is inefficient\n", table[i].frequency);
}
```
Since the cost is calculated as power * max_frequency / frequency, the cost for each frequency point becomes a constant value. Consequently, except for nr_states - 1 (where prev_state is initialized as ULONG_MAX), all other frequency points' cost is equal to prev_cost. As a result, only the highest frequency point (table[nr_states - 1]) is not flagged as EM_PERF_STATE_INEFFICIENT in the EM performance table.

In the em_cpufreq_update_efficiencies() function, the following code is executed:
```
for (i = 0; i < pd->nr_perf_states; i++) {
    if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
        continue;

    if (!cpufreq_table_set_inefficient(policy, table[i].frequency))
        found++;
}
```
As a result, all frequency points marked as EM_PERF_STATE_INEFFICIENT are flagged as CPUFREQ_INEFFICIENT_FREQ in the cpufreq_table_set_inefficient() function, causing these frequencies to be skipped during frequency scaling.

3. Proposed Change and Testing: 
On Linux 6.6, this behavior affects the normal operation of the cpufreq ondemand governor, which in turn causes passive cooling devices to malfunction when using the power allocator strategy in the thermal framework. I made a temporary fix by changing the condition from:
	if (table[i].cost >= prev_cost)
to:
	if (table[i].cost > prev_cost)
After this change, the issue seems resolved for now. However, I am concerned about potential side effects of this modification.

Could you please help clarify if there are any risks or negative consequences with this change? Why was the original condition designed to remove frequency points with the same cost from dynamic frequency scaling?

Best regards

chenshuo@eswincomputing.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor
  2024-09-10  2:46 PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor chenshuo
@ 2024-09-10  9:13 ` Christian Loehle
  2024-09-10 10:31   ` chenshuo
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Loehle @ 2024-09-10  9:13 UTC (permalink / raw)
  To: chenshuo@eswincomputing.com, Rafael J. Wysocki; +Cc: linux-pm, Lukasz Luba

On 9/10/24 03:46, chenshuo@eswincomputing.com wrote:
> Hi Rafael,

(+CC Lukasz)

> 
> I am encountering an issue related to the Energy Model (EM) when using cpufreq with the ondemand governor. Below is a detailed description:
> 
> 1. Problem Description:
>    When using cpufreq with the ondemand governor and enabling the energy model (EM), the CPU OPP table is configured with frequencies and voltages for each frequency point. Additionally, the `dynamic-power-coefficient` is configured in the DTS under the CPU node. However, I observe abnormal dynamic frequency scaling, where the CPU frequency always stays at the highest frequency point in the OPP table. Below is an example of the DTS configuration:
> ```
> cpu0: cpu@0 
> 	{ 
> 		...
> 		operating-points-v2 = <&d0_cpu_opp_table>; 

Do you mind sharing <&d0_cpu_opp_table>?

> 		#cooling-cells = <2>; dynamic-power-coefficient = <2000>; };
> 		...
> ```
> 2. Root Cause Analysis:
> When using the OPP table and configuring the "dynamic-power-coefficient," the `em_dev_register_perf_domain()` function in `kernel/power/energy_model.c` sets the flags to `EM_PERF_DOMAIN_MICROWATTS`. In the `em_create_perf_table()` function, `em_compute_costs()` includes the following code:
> ```
> if (table[i].cost >= prev_cost) {
>     table[i].flags = EM_PERF_STATE_INEFFICIENT;
>     dev_dbg(dev, "EM: OPP:%lu is inefficient\n", table[i].frequency);
> }
> ```
> Since the cost is calculated as power * max_frequency / frequency, the cost for each frequency point becomes a constant value. Consequently, except for nr_states - 1 (where prev_state is initialized as ULONG_MAX), all other frequency points' cost is equal to prev_cost. As a result, only the highest frequency point (table[nr_states - 1]) is not flagged as EM_PERF_STATE_INEFFICIENT in the EM performance table.
> 
> In the em_cpufreq_update_efficiencies() function, the following code is executed:
> ```
> for (i = 0; i < pd->nr_perf_states; i++) {
>     if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
>         continue;
> 
>     if (!cpufreq_table_set_inefficient(policy, table[i].frequency))
>         found++;
> }
> ```
> As a result, all frequency points marked as EM_PERF_STATE_INEFFICIENT are flagged as CPUFREQ_INEFFICIENT_FREQ in the cpufreq_table_set_inefficient() function, causing these frequencies to be skipped during frequency scaling.
> 
> 3. Proposed Change and Testing: 
> On Linux 6.6, this behavior affects the normal operation of the cpufreq ondemand governor, which in turn causes passive cooling devices to malfunction when using the power allocator strategy in the thermal framework. I made a temporary fix by changing the condition from:
> 	if (table[i].cost >= prev_cost)
> to:
> 	if (table[i].cost > prev_cost)
> After this change, the issue seems resolved for now. However, I am concerned about potential side effects of this modification.

But this doesn't solve the actual issue, if cost == prev_cost for all
OPPs then all of them but one are indeed inefficient.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor
  2024-09-10  9:13 ` Christian Loehle
@ 2024-09-10 10:31   ` chenshuo
  2024-09-18  6:41     ` chenshuo
  0 siblings, 1 reply; 5+ messages in thread
From: chenshuo @ 2024-09-10 10:31 UTC (permalink / raw)
  To: Christian Loehle, Rafael J. Wysocki; +Cc: linux-pm, Lukasz Luba

>On 9/10/24 03:46, chenshuo@eswincomputing.com wrote:
>> Hi Rafael,
> 
>(+CC Lukasz)
> 
>>
>> I am encountering an issue related to the Energy Model (EM) when using cpufreq with the ondemand governor. Below is a detailed description:
>>
>> 1. Problem Description:
>>    When using cpufreq with the ondemand governor and enabling the energy model (EM), the CPU OPP table is configured with frequencies and voltages for each frequency point. Additionally, the `dynamic-power-coefficient` is configured in the DTS under the CPU node. However, I observe abnormal dynamic frequency scaling, where the CPU frequency always stays at the highest frequency point in the OPP table. Below is an example of the DTS configuration:
>> ```
>> cpu0: cpu@0 
>> { 
>> ...
>> operating-points-v2 = <&d0_cpu_opp_table>; 
> 
>Do you mind sharing <&d0_cpu_opp_table>?
> 
Of course, the entire DTS file is inconvenient to copy, the main useful segments I have are:
```
	d0_cpu_opp_table: opp-table0 {
		compatible = "operating-points-v2";
		opp-shared;

		opp-24000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_24M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-100000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_100M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-200000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_200M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-400000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_400M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-500000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_500M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-600000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_600M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-700000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_700M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-800000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_800M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-900000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_900M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-1000000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_1000M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-1200000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_1200M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-1300000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_1300M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
		opp-1400000000 {
			opp-hz = /bits/ 64 <CLK_FREQ_1400M>;
			opp-microvolt = <800000>;
			clock-latency-ns = <70000>;
		};
	};
...	
	C64: cpus {
		#address-cells = <1>;
		#size-cells = <0>;
		timebase-frequency = <RTCCLK_FREQ>;
		cpu0: cpu@0 {
			...
			operating-points-v2 = <&d0_cpu_opp_table>;
			#cooling-cells = <2>;
			dynamic-power-coefficient = <2000>; 
			C1: interrupt-controller {
				#interrupt-cells = <1>;
				compatible = "riscv,cpu-intc";
				interrupt-controller;
			};
		};
		cpu1: cpu@1 {
			...
			operating-points-v2 = <&d0_cpu_opp_table>;
			#cooling-cells = <2>;
			dynamic-power-coefficient = <2000>;
			C2: interrupt-controller {
				#interrupt-cells = <1>;
				compatible = "riscv,cpu-intc";
				interrupt-controller;
			};
		};	
		cpu2: cpu@2 {
			...
			operating-points-v2 = <&d0_cpu_opp_table>;
			#cooling-cells = <2>;
			dynamic-power-coefficient = <2000>;
			C3: interrupt-controller {
				#interrupt-cells = <1>;
				compatible = "riscv,cpu-intc";
				interrupt-controller;
			};
		};	
		cpu3: cpu@3 {
			...
			operating-points-v2 = <&d0_cpu_opp_table>;
			#cooling-cells = <2>;
			dynamic-power-coefficient = <2000>;
			C4: interrupt-controller {
				#interrupt-cells = <1>;
				compatible = "riscv,cpu-intc";
				interrupt-controller;
			};
		};		
	};		
```
>> #cooling-cells = <2>; dynamic-power-coefficient = <2000>; };
>> ...
>> ```
>> 2. Root Cause Analysis:
>> When using the OPP table and configuring the "dynamic-power-coefficient," the `em_dev_register_perf_domain()` function in `kernel/power/energy_model.c` sets the flags to `EM_PERF_DOMAIN_MICROWATTS`. In the `em_create_perf_table()` function, `em_compute_costs()` includes the following code:
>> ```
>> if (table[i].cost >= prev_cost) {
>>     table[i].flags = EM_PERF_STATE_INEFFICIENT;
>>     dev_dbg(dev, "EM: OPP:%lu is inefficient\n", table[i].frequency);
>> }
>> ```
>> Since the cost is calculated as power * max_frequency / frequency, the cost for each frequency point becomes a constant value. Consequently, except for nr_states - 1 (where prev_state is initialized as ULONG_MAX), all other frequency points' cost is equal to prev_cost. As a result, only the highest frequency point (table[nr_states - 1]) is not flagged as EM_PERF_STATE_INEFFICIENT in the EM performance table.
>>
>> In the em_cpufreq_update_efficiencies() function, the following code is executed:
>> ```
>> for (i = 0; i < pd->nr_perf_states; i++) {
>>     if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
>>         continue;
>>
>>     if (!cpufreq_table_set_inefficient(policy, table[i].frequency))
>>         found++;
>> }
>> ```
>> As a result, all frequency points marked as EM_PERF_STATE_INEFFICIENT are flagged as CPUFREQ_INEFFICIENT_FREQ in the cpufreq_table_set_inefficient() function, causing these frequencies to be skipped during frequency scaling.
>>
>> 3. Proposed Change and Testing: 
>> On Linux 6.6, this behavior affects the normal operation of the cpufreq ondemand governor, which in turn causes passive cooling devices to malfunction when using the power allocator strategy in the thermal framework. I made a temporary fix by changing the condition from:
>> if (table[i].cost >= prev_cost)
>> to:
>> if (table[i].cost > prev_cost)
>> After this change, the issue seems resolved for now. However, I am concerned about potential side effects of this modification.
> 
>But this doesn't solve the actual issue, if cost == prev_cost for all
>OPPs then all of them but one are indeed inefficient.
Despite this, under an ondemand policy based on DVFS, the software might not know the real power consumption, and can only use the formula P=C*V^2*f*usage_rate.
Additionally, this at least ensures that the thermal framework using the IPA strategy can properly cool down.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor
  2024-09-10 10:31   ` chenshuo
@ 2024-09-18  6:41     ` chenshuo
  2024-09-18  7:48       ` Lukasz Luba
  0 siblings, 1 reply; 5+ messages in thread
From: chenshuo @ 2024-09-18  6:41 UTC (permalink / raw)
  To: Christian Loehle, Rafael J. Wysocki; +Cc: linux-pm, Lukasz Luba

>>On 9/10/24 03:46, chenshuo@eswincomputing.com wrote:
>>> Hi Rafael,
>> 
>>(+CC Lukasz)
>> 
>>>
>>> I am encountering an issue related to the Energy Model (EM) when using cpufreq with the ondemand governor. Below is a detailed description:
>>>
>>> 1. Problem Description:
>>>    When using cpufreq with the ondemand governor and enabling the energy model (EM), the CPU OPP table is configured with frequencies and voltages for each frequency point. Additionally, the `dynamic-power-coefficient` is configured in the DTS under the CPU node. However, I observe abnormal dynamic frequency scaling, where the CPU frequency always stays at the highest frequency point in the OPP table. Below is an example of the DTS configuration:
>>> ```
>>> cpu0: cpu@0 
>>> { 
>>> ...
>>> operating-points-v2 = <&d0_cpu_opp_table>; 
>> 
>>Do you mind sharing <&d0_cpu_opp_table>?
>> 
>Of course, the entire DTS file is inconvenient to copy, the main useful segments I have are:
>```
>d0_cpu_opp_table: opp-table0 {
>compatible = "operating-points-v2";
>opp-shared;
>
>opp-24000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_24M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-100000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_100M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-200000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_200M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-400000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_400M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-500000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_500M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-600000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_600M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-700000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_700M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-800000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_800M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-900000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_900M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-1000000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_1000M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-1200000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_1200M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-1300000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_1300M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>opp-1400000000 {
>opp-hz = /bits/ 64 <CLK_FREQ_1400M>;
>opp-microvolt = <800000>;
>clock-latency-ns = <70000>;
>};
>};
>...
>C64: cpus {
>#address-cells = <1>;
>#size-cells = <0>;
>timebase-frequency = <RTCCLK_FREQ>;
>cpu0: cpu@0 {
>...
>operating-points-v2 = <&d0_cpu_opp_table>;
>#cooling-cells = <2>;
>dynamic-power-coefficient = <2000>; 
>C1: interrupt-controller {
>#interrupt-cells = <1>;
>compatible = "riscv,cpu-intc";
>interrupt-controller;
>};
>};
>cpu1: cpu@1 {
>...
>operating-points-v2 = <&d0_cpu_opp_table>;
>#cooling-cells = <2>;
>dynamic-power-coefficient = <2000>;
>C2: interrupt-controller {
>#interrupt-cells = <1>;
>compatible = "riscv,cpu-intc";
>interrupt-controller;
>};
>};
>cpu2: cpu@2 {
>...
>operating-points-v2 = <&d0_cpu_opp_table>;
>#cooling-cells = <2>;
>dynamic-power-coefficient = <2000>;
>C3: interrupt-controller {
>#interrupt-cells = <1>;
>compatible = "riscv,cpu-intc";
>interrupt-controller;
>};
>};
>cpu3: cpu@3 {
>...
>operating-points-v2 = <&d0_cpu_opp_table>;
>#cooling-cells = <2>;
>dynamic-power-coefficient = <2000>;
>C4: interrupt-controller {
>#interrupt-cells = <1>;
>compatible = "riscv,cpu-intc";
>interrupt-controller;
>};
>};
>};
>```
>>> #cooling-cells = <2>; dynamic-power-coefficient = <2000>; };
>>> ...
>>> ```
>>> 2. Root Cause Analysis:
>>> When using the OPP table and configuring the "dynamic-power-coefficient," the `em_dev_register_perf_domain()` function in `kernel/power/energy_model.c` sets the flags to `EM_PERF_DOMAIN_MICROWATTS`. In the `em_create_perf_table()` function, `em_compute_costs()` includes the following code:
>>> ```
>>> if (table[i].cost >= prev_cost) {
>>>     table[i].flags = EM_PERF_STATE_INEFFICIENT;
>>>     dev_dbg(dev, "EM: OPP:%lu is inefficient\n", table[i].frequency);
>>> }
>>> ```
>>> Since the cost is calculated as power * max_frequency / frequency, the cost for each frequency point becomes a constant value. Consequently, except for nr_states - 1 (where prev_state is initialized as ULONG_MAX), all other frequency points' cost is equal to prev_cost. As a result, only the highest frequency point (table[nr_states - 1]) is not flagged as EM_PERF_STATE_INEFFICIENT in the EM performance table.
>>>
>>> In the em_cpufreq_update_efficiencies() function, the following code is executed:
>>> ```
>>> for (i = 0; i < pd->nr_perf_states; i++) {
>>>     if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT))
>>>         continue;
>>>
>>>     if (!cpufreq_table_set_inefficient(policy, table[i].frequency))
>>>         found++;
>>> }
>>> ```
>>> As a result, all frequency points marked as EM_PERF_STATE_INEFFICIENT are flagged as CPUFREQ_INEFFICIENT_FREQ in the cpufreq_table_set_inefficient() function, causing these frequencies to be skipped during frequency scaling.
>>>
>>> 3. Proposed Change and Testing: 
>>> On Linux 6.6, this behavior affects the normal operation of the cpufreq ondemand governor, which in turn causes passive cooling devices to malfunction when using the power allocator strategy in the thermal framework. I made a temporary fix by changing the condition from:
>>> if (table[i].cost >= prev_cost)
>>> to:
>>> if (table[i].cost > prev_cost)
>>> After this change, the issue seems resolved for now. However, I am concerned about potential side effects of this modification.
>> 
>>But this doesn't solve the actual issue, if cost == prev_cost for all
>>OPPs then all of them but one are indeed inefficient.
>Despite this, under an ondemand policy based on DVFS, the software might not know the real power consumption, and can only use the formula P=C*V^2*f*usage_rate.
>Additionally, this at least ensures that the thermal framework using the IPA strategy can properly cool down.
Regarding this problem, are there any errors or omissions in my settings? If not, do you have any better solutions? Looking forward to your reply!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor
  2024-09-18  6:41     ` chenshuo
@ 2024-09-18  7:48       ` Lukasz Luba
  0 siblings, 0 replies; 5+ messages in thread
From: Lukasz Luba @ 2024-09-18  7:48 UTC (permalink / raw)
  To: chenshuo@eswincomputing.com; +Cc: linux-pm, Christian Loehle, Rafael J. Wysocki

Hello,

On 9/18/24 07:41, chenshuo@eswincomputing.com wrote:
>>> On 9/10/24 03:46, chenshuo@eswincomputing.com wrote:
>>>> Hi Rafael,
>>>   
>>> (+CC Lukasz)
>>>   
>>>>
>>>> I am encountering an issue related to the Energy Model (EM) when using cpufreq with the ondemand governor. Below is a detailed description:
>>>>
>>>> 1. Problem Description:
>>>>     When using cpufreq with the ondemand governor and enabling the energy model (EM), the CPU OPP table is configured with frequencies and voltages for each frequency point. Additionally, the `dynamic-power-coefficient` is configured in the DTS under the CPU node. However, I observe abnormal dynamic frequency scaling, where the CPU frequency always stays at the highest frequency point in the OPP table. Below is an example of the DTS configuration:
>>>> ```
>>>> cpu0: cpu@0
>>>> {
>>>> ...
>>>> operating-points-v2 = <&d0_cpu_opp_table>;
>>>   
>>> Do you mind sharing <&d0_cpu_opp_table>?
>>>   
>> Of course, the entire DTS file is inconvenient to copy, the main useful segments I have are:
>> ```
>> d0_cpu_opp_table: opp-table0 {
>> compatible = "operating-points-v2";
>> opp-shared;
>>
>> opp-24000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_24M>;
>> opp-microvolt = <800000>;

The voltage is constant in your OPP table, for all frequencies.
That's the main problem.

>> clock-latency-ns = <70000>;
>> };
>> opp-100000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_100M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-200000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_200M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-400000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_400M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-500000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_500M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-600000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_600M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-700000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_700M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-800000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_800M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-900000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_900M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-1000000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_1000M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-1200000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_1200M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-1300000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_1300M>;
>> opp-microvolt = <800000>;
>> clock-latency-ns = <70000>;
>> };
>> opp-1400000000 {
>> opp-hz = /bits/ 64 <CLK_FREQ_1400M>;
>> opp-microvolt = <800000>;

Even for the fmax here.

<snip>

>> Despite this, under an ondemand policy based on DVFS, the software might not know the real power consumption, and can only use the formula P=C*V^2*f*usage_rate.
>> Additionally, this at least ensures that the thermal framework using the IPA strategy can properly cool down.
> Regarding this problem, are there any errors or omissions in my settings? If not, do you have any better solutions? Looking forward to your reply!

The EM framework works correctly. It cannot do much, when all voltages
are the same for all OPPs.

It's hard to say if this is a real voltage that is needed for
that silicon to operate. You will have to figure this out and maybe
change them. Although, this is dangerous operation and if you don't
have documentation for the chip or HW engineer support, I would
recommend you to NOT do this.

If that voltage is set to your HW, then the real power will
driven by this voltage, so you have inefficient OPPs.

If you think that this voltage is not the final set to the HW,
e.g. due to FW controlling final values, then you can
try a workaround...

The other option for you to workaround the const. voltage
information problem is to add 'opp-microwatt' values to your
OPP table. This is much safer and it will be used by the
EM and other related subsystems.
Please check this doc links:

https://elixir.bootlin.com/linux/v6.11/source/Documentation/power/energy-model.rst#L145

https://elixir.bootlin.com/linux/v6.11/source/Documentation/devicetree/bindings/opp/opp-v2-base.yaml#L104

In this case you can add power values for each OPP, to make sure
they are efficient.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-09-18  7:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10  2:46 PM: EM: Question Potential Issue with EM and OPP Table in cpufreq ondemand Governor chenshuo
2024-09-10  9:13 ` Christian Loehle
2024-09-10 10:31   ` chenshuo
2024-09-18  6:41     ` chenshuo
2024-09-18  7:48       ` Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).