* [PATCH] arm64: dts: qcom: x1e80100: Add performance hint for boost clock
@ 2024-10-25 3:12 Jiajie Chen
2024-10-25 7:58 ` Marc Zyngier
2024-10-25 11:04 ` Dmitry Baryshkov
0 siblings, 2 replies; 4+ messages in thread
From: Jiajie Chen @ 2024-10-25 3:12 UTC (permalink / raw)
To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, linux-arm-msm, devicetree, linux-kernel
Cc: c
The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one
core in the second cluster (cores 4-7) and the other in the third
cluster (cores 8-11). However, the scheduler is currently unaware of
this, leading to scenarios where a single core benchmark might run at
3.4 GHz when scheduled to the first cluster.
This patch introduces capacity-dmips-mhz nodes to each CPU node in the
DTS. For cores numbered 4 and 8, the capacities are set to 1200, while
others are set to 1024. This ensures that the two cores can be
prioritized for scheduling. The value 1200 is derived from approximately
`1024/3.4*4.0`.
Note that capacity-dmips-mhz is not ideally suited for this purpose, as
it was designed to differentiate between performance and efficient
cores, not for core boosting. According to its definition, DMIPS/MHz
actually decreases with higher frequencies. However, since the CPU does
not support AMU, and no elegant solution was found, this approach is
used as a workaround.
With this patch, we observe two cores running at full 4.0 GHz without
core binding. The single core score of Geekbench 6 increases from 2452
to 2892, both without core binding. Tested on Surface Laptop 7.
Signed-off-by: Jiajie Chen <c@jia.je>
---
arch/arm64/boot/dts/qcom/x1e80100.dtsi | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
index cd732ef88cd8..c9c559d956c2 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
@@ -69,6 +69,7 @@ CPU0: cpu@0 {
compatible = "qcom,oryon";
reg = <0x0 0x0>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_0>;
power-domains = <&CPU_PD0>;
power-domain-names = "psci";
@@ -86,6 +87,7 @@ CPU1: cpu@100 {
compatible = "qcom,oryon";
reg = <0x0 0x100>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_0>;
power-domains = <&CPU_PD1>;
power-domain-names = "psci";
@@ -97,6 +99,7 @@ CPU2: cpu@200 {
compatible = "qcom,oryon";
reg = <0x0 0x200>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_0>;
power-domains = <&CPU_PD2>;
power-domain-names = "psci";
@@ -108,6 +111,7 @@ CPU3: cpu@300 {
compatible = "qcom,oryon";
reg = <0x0 0x300>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_0>;
power-domains = <&CPU_PD3>;
power-domain-names = "psci";
@@ -119,6 +123,7 @@ CPU4: cpu@10000 {
compatible = "qcom,oryon";
reg = <0x0 0x10000>;
enable-method = "psci";
+ capacity-dmips-mhz = <1200>;
next-level-cache = <&L2_1>;
power-domains = <&CPU_PD4>;
power-domain-names = "psci";
@@ -136,6 +141,7 @@ CPU5: cpu@10100 {
compatible = "qcom,oryon";
reg = <0x0 0x10100>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_1>;
power-domains = <&CPU_PD5>;
power-domain-names = "psci";
@@ -147,6 +153,7 @@ CPU6: cpu@10200 {
compatible = "qcom,oryon";
reg = <0x0 0x10200>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_1>;
power-domains = <&CPU_PD6>;
power-domain-names = "psci";
@@ -158,6 +165,7 @@ CPU7: cpu@10300 {
compatible = "qcom,oryon";
reg = <0x0 0x10300>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_1>;
power-domains = <&CPU_PD7>;
power-domain-names = "psci";
@@ -169,6 +177,7 @@ CPU8: cpu@20000 {
compatible = "qcom,oryon";
reg = <0x0 0x20000>;
enable-method = "psci";
+ capacity-dmips-mhz = <1200>;
next-level-cache = <&L2_2>;
power-domains = <&CPU_PD8>;
power-domain-names = "psci";
@@ -186,6 +195,7 @@ CPU9: cpu@20100 {
compatible = "qcom,oryon";
reg = <0x0 0x20100>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_2>;
power-domains = <&CPU_PD9>;
power-domain-names = "psci";
@@ -197,6 +207,7 @@ CPU10: cpu@20200 {
compatible = "qcom,oryon";
reg = <0x0 0x20200>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_2>;
power-domains = <&CPU_PD10>;
power-domain-names = "psci";
@@ -208,6 +219,7 @@ CPU11: cpu@20300 {
compatible = "qcom,oryon";
reg = <0x0 0x20300>;
enable-method = "psci";
+ capacity-dmips-mhz = <1024>;
next-level-cache = <&L2_2>;
power-domains = <&CPU_PD11>;
power-domain-names = "psci";
--
2.45.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] arm64: dts: qcom: x1e80100: Add performance hint for boost clock
2024-10-25 3:12 [PATCH] arm64: dts: qcom: x1e80100: Add performance hint for boost clock Jiajie Chen
@ 2024-10-25 7:58 ` Marc Zyngier
2024-10-25 8:08 ` Jiajie Chen
2024-10-25 11:04 ` Dmitry Baryshkov
1 sibling, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2024-10-25 7:58 UTC (permalink / raw)
To: Jiajie Chen
Cc: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, linux-arm-msm, devicetree, linux-kernel
On Fri, 25 Oct 2024 04:12:58 +0100,
Jiajie Chen <c@jia.je> wrote:
>
> The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one
> core in the second cluster (cores 4-7) and the other in the third
> cluster (cores 8-11). However, the scheduler is currently unaware of
> this, leading to scenarios where a single core benchmark might run at
> 3.4 GHz when scheduled to the first cluster.
>
> This patch introduces capacity-dmips-mhz nodes to each CPU node in the
> DTS. For cores numbered 4 and 8, the capacities are set to 1200, while
> others are set to 1024. This ensures that the two cores can be
> prioritized for scheduling. The value 1200 is derived from approximately
> `1024/3.4*4.0`.
>
> Note that capacity-dmips-mhz is not ideally suited for this purpose, as
> it was designed to differentiate between performance and efficient
> cores, not for core boosting. According to its definition, DMIPS/MHz
> actually decreases with higher frequencies. However, since the CPU does
> not support AMU, and no elegant solution was found, this approach is
> used as a workaround.
Are you sure?
[ 0.570323] CPU features: detected: Activity Monitors Unit (AMU) on CPU0-11
So activity monitors are available. Not that what you have here is not
useful, but this comment seems a bit... surprising.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] arm64: dts: qcom: x1e80100: Add performance hint for boost clock
2024-10-25 7:58 ` Marc Zyngier
@ 2024-10-25 8:08 ` Jiajie Chen
0 siblings, 0 replies; 4+ messages in thread
From: Jiajie Chen @ 2024-10-25 8:08 UTC (permalink / raw)
To: Marc Zyngier
Cc: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, linux-arm-msm, devicetree, linux-kernel
On 2024/10/25 15:58, Marc Zyngier wrote:
> On Fri, 25 Oct 2024 04:12:58 +0100,
> Jiajie Chen <c@jia.je> wrote:
>> The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one
>> core in the second cluster (cores 4-7) and the other in the third
>> cluster (cores 8-11). However, the scheduler is currently unaware of
>> this, leading to scenarios where a single core benchmark might run at
>> 3.4 GHz when scheduled to the first cluster.
>>
>> This patch introduces capacity-dmips-mhz nodes to each CPU node in the
>> DTS. For cores numbered 4 and 8, the capacities are set to 1200, while
>> others are set to 1024. This ensures that the two cores can be
>> prioritized for scheduling. The value 1200 is derived from approximately
>> `1024/3.4*4.0`.
>>
>> Note that capacity-dmips-mhz is not ideally suited for this purpose, as
>> it was designed to differentiate between performance and efficient
>> cores, not for core boosting. According to its definition, DMIPS/MHz
>> actually decreases with higher frequencies. However, since the CPU does
>> not support AMU, and no elegant solution was found, this approach is
>> used as a workaround.
> Are you sure?
>
> [ 0.570323] CPU features: detected: Activity Monitors Unit (AMU) on CPU0-11
>
> So activity monitors are available. Not that what you have here is not
> useful, but this comment seems a bit... surprising.
Sorry for the false claim, I was looking for AMU at /proc/cpuinfo, which
is not there. But it did not help the scheduling somehow. Let me have a
look at it.
Best regards,
Jiajie Chen
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] arm64: dts: qcom: x1e80100: Add performance hint for boost clock
2024-10-25 3:12 [PATCH] arm64: dts: qcom: x1e80100: Add performance hint for boost clock Jiajie Chen
2024-10-25 7:58 ` Marc Zyngier
@ 2024-10-25 11:04 ` Dmitry Baryshkov
1 sibling, 0 replies; 4+ messages in thread
From: Dmitry Baryshkov @ 2024-10-25 11:04 UTC (permalink / raw)
To: Jiajie Chen
Cc: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, linux-arm-msm, devicetree, linux-kernel
On Fri, Oct 25, 2024 at 11:12:58AM +0800, Jiajie Chen wrote:
> The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one
> core in the second cluster (cores 4-7) and the other in the third
> cluster (cores 8-11). However, the scheduler is currently unaware of
> this, leading to scenarios where a single core benchmark might run at
> 3.4 GHz when scheduled to the first cluster.
>
> This patch introduces capacity-dmips-mhz nodes to each CPU node in the
> DTS. For cores numbered 4 and 8, the capacities are set to 1200, while
> others are set to 1024. This ensures that the two cores can be
> prioritized for scheduling. The value 1200 is derived from approximately
> `1024/3.4*4.0`.
>
> Note that capacity-dmips-mhz is not ideally suited for this purpose, as
> it was designed to differentiate between performance and efficient
> cores, not for core boosting. According to its definition, DMIPS/MHz
> actually decreases with higher frequencies. However, since the CPU does
> not support AMU, and no elegant solution was found, this approach is
> used as a workaround.
>
> With this patch, we observe two cores running at full 4.0 GHz without
> core binding. The single core score of Geekbench 6 increases from 2452
> to 2892, both without core binding. Tested on Surface Laptop 7.
I think this is a nice hack, but I'd prefer to see scheduler being
improved instead. From my (ignorant) point of view this should be close
to SMT-based scheduling. We should split the jobs between the clusters,
if that provides better power utilisation.
>
> Signed-off-by: Jiajie Chen <c@jia.je>
> ---
> arch/arm64/boot/dts/qcom/x1e80100.dtsi | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-10-25 11:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-25 3:12 [PATCH] arm64: dts: qcom: x1e80100: Add performance hint for boost clock Jiajie Chen
2024-10-25 7:58 ` Marc Zyngier
2024-10-25 8:08 ` Jiajie Chen
2024-10-25 11:04 ` Dmitry Baryshkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).