public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
@ 2024-05-23  6:16 Xiaojian Du
  2024-05-23  6:16 ` [PATCH v3 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Xiaojian Du @ 2024-05-23  6:16 UTC (permalink / raw)
  To: linux-kernel, x86, linux-pm
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, sandipan.das, kai.huang, ray.huang, rafael,
	Perry.Yuan, gautham.shenoy, Borislav.Petkov, mario.limonciello,
	Perry Yuan, Xiaojian Du

From: Perry Yuan <perry.yuan@amd.com>

Some AMD Zen 4 processors support a new feature FAST CPPC which
allows for a faster CPPC loop due to internal architectual
enhancements. The goal of this faster loop is higher performance
at the same power consumption.

Reference:
See the page 99 of PPR for AMD Family 19h Model 61h rev.B1, docID 56713

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/scattered.c    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..6c128d463a14 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
 #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index af5aa2c754c2..9c273c231f56 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -51,6 +51,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
+	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX,  15, 0x80000007, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models
  2024-05-23  6:16 [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
@ 2024-05-23  6:16 ` Xiaojian Du
  2024-05-23  6:17   ` Xiaojian Du
  2024-05-23  6:16 ` [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Xiaojian Du @ 2024-05-23  6:16 UTC (permalink / raw)
  To: linux-kernel, x86, linux-pm
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, sandipan.das, kai.huang, ray.huang, rafael,
	Perry.Yuan, gautham.shenoy, Borislav.Petkov, mario.limonciello,
	Xiaojian Du, Perry Yuan

Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core
clock more quickly and presicely according to CPU work loading.
This is advertised by the Fast CPPC x86 feature.
This change will only be effective in the *passive mode* of
AMD pstate driver. From the test results of different
transition delay values, 600us is chosen to make a balance
between performance and power consumption.

Some test results on AMD Ryzen 7840HS(Phoenix) APU:

1. Tbench
(Energy less is better, Throughput more is better,
PPW--Performance per Watt more is better)
============= =================== ============== =============== ============== =============== ============== =============== ===============
 Trans Delay   Tbench              governor:schedutil, 3-iterations average
============= =================== ============== =============== ============== =============== ============== =============== ===============
 1000us        Clients             1              2               4              8              12             16              32
               Energy/Joules       2010           2804            8768           17171          16170          15132           15027
               Throughput/(MB/s)   114            259             1041           3010           3135           4851            4605
               PPW                 0.0567         0.0923          0.1187         0.1752         0.1938         0.3205          0.3064
 600us         Clients             1              2               4              8              12             16              32
               Energy/Joules       2115  (5.22%)  2388  (-14.84%) 10700(22.03%)  16716 (-2.65%) 15939 (-1.43%) 15053 (-0.52%)  15083 (0.37% )
               Throughput/(MB/s)   122   (7.02%)  234   (-9.65% ) 1188 (14.12%)  3003  (-0.23%) 3143  (0.26% ) 4842  (-0.19%)  4603  (-0.04%)
               PPW                 0.0576(1.59%)  0.0979(6.07%  ) 0.111(-6.49%)  0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% )  0.3051(-0.42%)
============= =================== ============== ================ ============= =============== ============== =============== ===============

2.Dbench
(Energy less is better, Throughput more is better,
PPW--Performance per Watt more is better)
============= =================== ============== =============== ============== =============== ============== =============== ===============
 Trans Delay   Dbench              governor:schedutil, 3-iterations average
============= =================== ============== =============== ============== =============== ============== =============== ===============
 1000us        Clients             1             2               4              8               12             16              32
               Energy/Joules       4890          3779            3567           5157            5611           6500            8163
               Throughput/(MB/s)   327           167             220            577             775            938             1397
               PPW                 0.0668        0.0441          0.0616         0.1118          0.1381         0.1443          0.1711
 600us         Clients             1             2               4              8               12             16              32
               Energy/Joules       4915  (0.51%) 4912  (29.98%)  3506  (-1.71%) 4907  (-4.85% ) 5011 (-10.69%) 5672  (-12.74%) 8141  (-0.27%)
               Throughput/(MB/s)   348   (6.42%) 284   (70.06%)  220   (0.00% ) 518   (-10.23%) 712  (-8.13% ) 854   (-8.96% ) 1475  (5.58% )
               PPW                 0.0708(5.99%) 0.0578(31.07%)  0.0627(1.79% ) 0.1055(-5.64% ) 0.142(2.82%  ) 0.1505(4.30%  ) 0.1811(5.84% )
============= =================== ============== =============== ============== =============== ============== =============== ===============

3.Hackbench(less time is better)
============= =========================== ==========================
  hackbench     governor:schedutil
============= =========================== ==========================
  Trans Delay   Process Mode Ave time(s)  Thread Mode Ave time(s)
  1000us        14.484                      14.484
  600us         14.418(-0.46%)              15.41(+6.39%)
============= =========================== ==========================

4.Perf_sched_bench(less time is better)
============= =================== ============== ============== ============== =============== =============== =============
 Trans Delay  perf_sched_bench    governor:schedutil
============= =================== ============== ============== ============== =============== =============== =============
  1000us        Groups             1             2              4              8               12              24
                AveTime(s)        1.64          2.851          5.878          11.636          16.093          26.395
  600us         Groups             1             2              4              8               12              24
                AveTime(s)        1.69(3.05%)   2.845(-0.21%)  5.843(-0.60%)  11.576(-0.52%)  16.092(-0.01%)  26.32(-0.28%)
============= ================== ============== ============== ============== =============== =============== ==============

5.Sysbench(higher is better)
============= ================== ============== ================= ============== ================ =============== =================
  Sysbench    governor:schedutil
============= ================== ============== ================= ============== ================ =============== =================
  1000us      Thread             1               2                4              8                12               24
              Ave events         6020.98         12273.39         24119.82       46171.57         47074.37         47831.72
  600us       Thread             1               2                4              8                12               24
              Ave events         6154.82(2.22%)  12271.63(-0.01%) 24392.5(1.13%) 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%)
============= ================== ============== ================= ============== ================ =============== =================

In conclusion, a shorter transition delay
of cpu clock will make a quite positive effect to improve PPW
on Dbench test, in the meanwhile, keep stable performance
on Tbench, Hackbench, Perf_sched_bench and Sysbench.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Acked-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 6a342b0c0140..aa157c2b8ba2 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -50,6 +50,7 @@
 
 #define AMD_PSTATE_TRANSITION_LATENCY	20000
 #define AMD_PSTATE_TRANSITION_DELAY	1000
+#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY 600
 #define CPPC_HIGHEST_PERF_PERFORMANCE	196
 #define CPPC_HIGHEST_PERF_DEFAULT	166
 
@@ -817,8 +818,12 @@ static u32 amd_pstate_get_transition_delay_us(unsigned int cpu)
 	u32 transition_delay_ns;
 
 	transition_delay_ns = cppc_get_transition_latency(cpu);
-	if (transition_delay_ns == CPUFREQ_ETERNAL)
-		return AMD_PSTATE_TRANSITION_DELAY;
+	if (transition_delay_ns == CPUFREQ_ETERNAL) {
+		if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC))
+			return AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY;
+		else
+			return AMD_PSTATE_TRANSITION_DELAY;
+	}
 
 	return transition_delay_ns / NSEC_PER_USEC;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-05-23  6:16 [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
  2024-05-23  6:16 ` [PATCH v3 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
@ 2024-05-23  6:16 ` Xiaojian Du
  2024-05-23 16:31 ` Dave Hansen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Xiaojian Du @ 2024-05-23  6:16 UTC (permalink / raw)
  To: linux-kernel, x86, linux-pm
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, sandipan.das, kai.huang, ray.huang, rafael,
	Perry.Yuan, gautham.shenoy, Borislav.Petkov, mario.limonciello,
	Perry Yuan, Xiaojian Du

From: Perry Yuan <perry.yuan@amd.com>

Some AMD Zen 4 processors support a new feature FAST CPPC which
allows for a faster CPPC loop due to internal architectual
enhancements. The goal of this faster loop is higher performance
at the same power consumption.

Reference:
See the page 99 of PPR for AMD Family 19h Model 61h rev.B1, docID 56713

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/scattered.c    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..6c128d463a14 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
 #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index af5aa2c754c2..9c273c231f56 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -51,6 +51,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
+	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX,  15, 0x80000007, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models
  2024-05-23  6:16 ` [PATCH v3 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
@ 2024-05-23  6:17   ` Xiaojian Du
  0 siblings, 0 replies; 9+ messages in thread
From: Xiaojian Du @ 2024-05-23  6:17 UTC (permalink / raw)
  To: linux-kernel, x86, linux-pm
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, sandipan.das, kai.huang, ray.huang, rafael,
	Perry.Yuan, gautham.shenoy, Borislav.Petkov, mario.limonciello,
	Xiaojian Du, Perry Yuan

Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core
clock more quickly and presicely according to CPU work loading.
This is advertised by the Fast CPPC x86 feature.
This change will only be effective in the *passive mode* of
AMD pstate driver. From the test results of different
transition delay values, 600us is chosen to make a balance
between performance and power consumption.

Some test results on AMD Ryzen 7840HS(Phoenix) APU:

1. Tbench
(Energy less is better, Throughput more is better,
PPW--Performance per Watt more is better)
============= =================== ============== =============== ============== =============== ============== =============== ===============
 Trans Delay   Tbench              governor:schedutil, 3-iterations average
============= =================== ============== =============== ============== =============== ============== =============== ===============
 1000us        Clients             1              2               4              8              12             16              32
               Energy/Joules       2010           2804            8768           17171          16170          15132           15027
               Throughput/(MB/s)   114            259             1041           3010           3135           4851            4605
               PPW                 0.0567         0.0923          0.1187         0.1752         0.1938         0.3205          0.3064
 600us         Clients             1              2               4              8              12             16              32
               Energy/Joules       2115  (5.22%)  2388  (-14.84%) 10700(22.03%)  16716 (-2.65%) 15939 (-1.43%) 15053 (-0.52%)  15083 (0.37% )
               Throughput/(MB/s)   122   (7.02%)  234   (-9.65% ) 1188 (14.12%)  3003  (-0.23%) 3143  (0.26% ) 4842  (-0.19%)  4603  (-0.04%)
               PPW                 0.0576(1.59%)  0.0979(6.07%  ) 0.111(-6.49%)  0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% )  0.3051(-0.42%)
============= =================== ============== ================ ============= =============== ============== =============== ===============

2.Dbench
(Energy less is better, Throughput more is better,
PPW--Performance per Watt more is better)
============= =================== ============== =============== ============== =============== ============== =============== ===============
 Trans Delay   Dbench              governor:schedutil, 3-iterations average
============= =================== ============== =============== ============== =============== ============== =============== ===============
 1000us        Clients             1             2               4              8               12             16              32
               Energy/Joules       4890          3779            3567           5157            5611           6500            8163
               Throughput/(MB/s)   327           167             220            577             775            938             1397
               PPW                 0.0668        0.0441          0.0616         0.1118          0.1381         0.1443          0.1711
 600us         Clients             1             2               4              8               12             16              32
               Energy/Joules       4915  (0.51%) 4912  (29.98%)  3506  (-1.71%) 4907  (-4.85% ) 5011 (-10.69%) 5672  (-12.74%) 8141  (-0.27%)
               Throughput/(MB/s)   348   (6.42%) 284   (70.06%)  220   (0.00% ) 518   (-10.23%) 712  (-8.13% ) 854   (-8.96% ) 1475  (5.58% )
               PPW                 0.0708(5.99%) 0.0578(31.07%)  0.0627(1.79% ) 0.1055(-5.64% ) 0.142(2.82%  ) 0.1505(4.30%  ) 0.1811(5.84% )
============= =================== ============== =============== ============== =============== ============== =============== ===============

3.Hackbench(less time is better)
============= =========================== ==========================
  hackbench     governor:schedutil
============= =========================== ==========================
  Trans Delay   Process Mode Ave time(s)  Thread Mode Ave time(s)
  1000us        14.484                      14.484
  600us         14.418(-0.46%)              15.41(+6.39%)
============= =========================== ==========================

4.Perf_sched_bench(less time is better)
============= =================== ============== ============== ============== =============== =============== =============
 Trans Delay  perf_sched_bench    governor:schedutil
============= =================== ============== ============== ============== =============== =============== =============
  1000us        Groups             1             2              4              8               12              24
                AveTime(s)        1.64          2.851          5.878          11.636          16.093          26.395
  600us         Groups             1             2              4              8               12              24
                AveTime(s)        1.69(3.05%)   2.845(-0.21%)  5.843(-0.60%)  11.576(-0.52%)  16.092(-0.01%)  26.32(-0.28%)
============= ================== ============== ============== ============== =============== =============== ==============

5.Sysbench(higher is better)
============= ================== ============== ================= ============== ================ =============== =================
  Sysbench    governor:schedutil
============= ================== ============== ================= ============== ================ =============== =================
  1000us      Thread             1               2                4              8                12               24
              Ave events         6020.98         12273.39         24119.82       46171.57         47074.37         47831.72
  600us       Thread             1               2                4              8                12               24
              Ave events         6154.82(2.22%)  12271.63(-0.01%) 24392.5(1.13%) 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%)
============= ================== ============== ================= ============== ================ =============== =================

In conclusion, a shorter transition delay
of cpu clock will make a quite positive effect to improve PPW
on Dbench test, in the meanwhile, keep stable performance
on Tbench, Hackbench, Perf_sched_bench and Sysbench.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Acked-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 6a342b0c0140..aa157c2b8ba2 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -50,6 +50,7 @@
 
 #define AMD_PSTATE_TRANSITION_LATENCY	20000
 #define AMD_PSTATE_TRANSITION_DELAY	1000
+#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY 600
 #define CPPC_HIGHEST_PERF_PERFORMANCE	196
 #define CPPC_HIGHEST_PERF_DEFAULT	166
 
@@ -817,8 +818,12 @@ static u32 amd_pstate_get_transition_delay_us(unsigned int cpu)
 	u32 transition_delay_ns;
 
 	transition_delay_ns = cppc_get_transition_latency(cpu);
-	if (transition_delay_ns == CPUFREQ_ETERNAL)
-		return AMD_PSTATE_TRANSITION_DELAY;
+	if (transition_delay_ns == CPUFREQ_ETERNAL) {
+		if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC))
+			return AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY;
+		else
+			return AMD_PSTATE_TRANSITION_DELAY;
+	}
 
 	return transition_delay_ns / NSEC_PER_USEC;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-05-23  6:16 [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
  2024-05-23  6:16 ` [PATCH v3 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
  2024-05-23  6:16 ` [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
@ 2024-05-23 16:31 ` Dave Hansen
  2024-05-27 14:36   ` Xiaojian Du
  2024-05-23 22:24 ` Pawan Gupta
  2024-05-23 22:35 ` Pawan Gupta
  4 siblings, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2024-05-23 16:31 UTC (permalink / raw)
  To: Xiaojian Du, linux-kernel, x86, linux-pm
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, sandipan.das, kai.huang, ray.huang, rafael,
	Perry.Yuan, gautham.shenoy, Borislav.Petkov, mario.limonciello

On 5/22/24 23:16, Xiaojian Du wrote:
>  #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
>  #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
>  #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
> +#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */

It'd be nice to expand the CPPC acronym at least _once_.

Also, this is used _once_ at boot and not exposed in /proc/cpuinfo.  Is
it even worth defining an X86_FEATURE_ for it?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-05-23  6:16 [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
                   ` (2 preceding siblings ...)
  2024-05-23 16:31 ` Dave Hansen
@ 2024-05-23 22:24 ` Pawan Gupta
  2024-05-23 22:35 ` Pawan Gupta
  4 siblings, 0 replies; 9+ messages in thread
From: Pawan Gupta @ 2024-05-23 22:24 UTC (permalink / raw)
  To: Xiaojian Du
  Cc: linux-kernel, x86, linux-pm, tglx, mingo, bp, dave.hansen, hpa,
	daniel.sneddon, jpoimboe, sandipan.das, kai.huang, ray.huang,
	rafael, Perry.Yuan, gautham.shenoy, Borislav.Petkov,
	mario.limonciello

On Thu, May 23, 2024 at 02:16:59PM +0800, Xiaojian Du wrote:
> From: Perry Yuan <perry.yuan@amd.com>
> 
> Some AMD Zen 4 processors support a new feature FAST CPPC which
> allows for a faster CPPC loop due to internal architectual

s/architectual/architectural/

> enhancements. The goal of this faster loop is higher performance
> at the same power consumption.
> 
> Reference:
> See the page 99 of PPR for AMD Family 19h Model 61h rev.B1, docID 56713
> 
> Signed-off-by: Perry Yuan <perry.yuan@amd.com>
> Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 3c7434329661..6c128d463a14 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -470,6 +470,7 @@
>  #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
>  #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
>  #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
> +#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */
>  
>  /*
>   * BUG word(s)
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index af5aa2c754c2..9c273c231f56 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -51,6 +51,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
> +	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX,  15, 0x80000007, 0 },
                                                  ^
						  Extra space here.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-05-23  6:16 [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
                   ` (3 preceding siblings ...)
  2024-05-23 22:24 ` Pawan Gupta
@ 2024-05-23 22:35 ` Pawan Gupta
  2024-05-27 14:40   ` Xiaojian Du
  4 siblings, 1 reply; 9+ messages in thread
From: Pawan Gupta @ 2024-05-23 22:35 UTC (permalink / raw)
  To: Xiaojian Du
  Cc: linux-kernel, x86, linux-pm, tglx, mingo, bp, dave.hansen, hpa,
	daniel.sneddon, jpoimboe, sandipan.das, kai.huang, ray.huang,
	rafael, Perry.Yuan, gautham.shenoy, Borislav.Petkov,
	mario.limonciello

On Thu, May 23, 2024 at 02:16:59PM +0800, Xiaojian Du wrote:
> From: Perry Yuan <perry.yuan@amd.com>
> 
> Some AMD Zen 4 processors support a new feature FAST CPPC which
> allows for a faster CPPC loop due to internal architectual
> enhancements. The goal of this faster loop is higher performance
> at the same power consumption.
> 
> Reference:
> See the page 99 of PPR for AMD Family 19h Model 61h rev.B1, docID 56713
> 
> Signed-off-by: Perry Yuan <perry.yuan@amd.com>
> Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 3c7434329661..6c128d463a14 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -470,6 +470,7 @@
>  #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
>  #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
>  #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
> +#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */
>  
>  /*
>   * BUG word(s)
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index af5aa2c754c2..9c273c231f56 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -51,6 +51,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
> +	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX,  15, 0x80000007, 0 },

This list is sorted by the leaf level, so position of this entry should be
higher:

diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index af5aa2c754c2..09e0e40dce6c 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_HW_PSTATE,	CPUID_EDX,  7, 0x80000007, 0 },
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
+	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX, 15, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },

>  	{ 0, 0, 0, 0, 0 }
>  };
>  
> -- 
> 2.34.1
> 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-05-23 16:31 ` Dave Hansen
@ 2024-05-27 14:36   ` Xiaojian Du
  0 siblings, 0 replies; 9+ messages in thread
From: Xiaojian Du @ 2024-05-27 14:36 UTC (permalink / raw)
  To: Dave Hansen, Xiaojian Du, linux-kernel, x86, linux-pm
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, sandipan.das, kai.huang, ray.huang, rafael,
	Perry.Yuan, gautham.shenoy, Borislav.Petkov, mario.limonciello

Thanks Dave.

Because "CPPC" has existed in the "cpufeatures.h", thinking it will add 
duplicated line so I don't expand it.
Making this new flag hidden is to avoid causing user confusion about two 
"CPPC" flags.
This new feature flag is added to choose code branch, if not, it has to 
choose a ugly way and use CPU model ID.

Thanks,
Xiaojian


On 2024/5/24 0:31, Dave Hansen wrote:
> On 5/22/24 23:16, Xiaojian Du wrote:
>>   #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
>>   #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
>>   #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
>> +#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */
> It'd be nice to expand the CPPC acronym at least _once_.
>
> Also, this is used _once_ at boot and not exposed in /proc/cpuinfo.  Is
> it even worth defining an X86_FEATURE_ for it?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-05-23 22:35 ` Pawan Gupta
@ 2024-05-27 14:40   ` Xiaojian Du
  0 siblings, 0 replies; 9+ messages in thread
From: Xiaojian Du @ 2024-05-27 14:40 UTC (permalink / raw)
  To: Pawan Gupta, Xiaojian Du
  Cc: linux-kernel, x86, linux-pm, tglx, mingo, bp, dave.hansen, hpa,
	daniel.sneddon, jpoimboe, sandipan.das, kai.huang, ray.huang,
	rafael, Perry.Yuan, gautham.shenoy, Borislav.Petkov,
	mario.limonciello

Thanks Pawan.

On 2024/5/24 6:35, Pawan Gupta wrote:
> On Thu, May 23, 2024 at 02:16:59PM +0800, Xiaojian Du wrote:
>> From: Perry Yuan <perry.yuan@amd.com>
>>
>> Some AMD Zen 4 processors support a new feature FAST CPPC which
>> allows for a faster CPPC loop due to internal architectual
>> enhancements. The goal of this faster loop is higher performance
>> at the same power consumption.
>>
>> Reference:
>> See the page 99 of PPR for AMD Family 19h Model 61h rev.B1, docID 56713
>>
>> Signed-off-by: Perry Yuan <perry.yuan@amd.com>
>> Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
>> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
>> ---
>>   arch/x86/include/asm/cpufeatures.h | 1 +
>>   arch/x86/kernel/cpu/scattered.c    | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 3c7434329661..6c128d463a14 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -470,6 +470,7 @@
>>   #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
>>   #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
>>   #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
>> +#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */
>>   
>>   /*
>>    * BUG word(s)
>> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
>> index af5aa2c754c2..9c273c231f56 100644
>> --- a/arch/x86/kernel/cpu/scattered.c
>> +++ b/arch/x86/kernel/cpu/scattered.c
>> @@ -51,6 +51,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>>   	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>>   	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>>   	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
>> +	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX,  15, 0x80000007, 0 },
> This list is sorted by the leaf level, so position of this entry should be
> higher:
>
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index af5aa2c754c2..09e0e40dce6c 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>   	{ X86_FEATURE_HW_PSTATE,	CPUID_EDX,  7, 0x80000007, 0 },
>   	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
>   	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> +	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX, 15, 0x80000007, 0 },
>   	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
>   	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
>   	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
>
>>   	{ 0, 0, 0, 0, 0 }
>>   };
>>   
>> -- 
>> 2.34.1

Changed this position and sent V4 patch for review.

Thanks,

Xiaojian





^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-05-27 14:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23  6:16 [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
2024-05-23  6:16 ` [PATCH v3 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
2024-05-23  6:17   ` Xiaojian Du
2024-05-23  6:16 ` [PATCH v3 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
2024-05-23 16:31 ` Dave Hansen
2024-05-27 14:36   ` Xiaojian Du
2024-05-23 22:24 ` Pawan Gupta
2024-05-23 22:35 ` Pawan Gupta
2024-05-27 14:40   ` Xiaojian Du

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox