* [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver
@ 2025-08-22 10:52 Penny Zheng
2025-08-22 10:52 ` [PATCH v7 01/13] tools: drop "has_num" condition check for cppc mode Penny Zheng
` (12 more replies)
0 siblings, 13 replies; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Juergen Gross,
Andrew Cooper, Michal Orzel, Jan Beulich, Julien Grall,
Roger Pau Monné, Stefano Stabellini
amd-cppc is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism on modern AMD APU and CPU series in
Xen. The new mechanism is based on Collaborative Processor Performance
Control (CPPC) which provides finer grain frequency management than
legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
the ACPI P-states driver to manage CPU frequency and clocks with
switching only in 3 P-states. CPPC replaces the ACPI P-states controls
and allows a flexible, low-latency interface for Xen to directly
communicate the performance hints to hardware.
amd_cppc driver has 2 operation modes: autonomous (active) mode,
and non-autonomous (passive) mode. We register different CPUFreq driver
for different modes, "amd-cppc" for passive mode and "amd-cppc-epp"
for active mode.
The passive mode leverages common governors such as *ondemand*,
*performance*, etc, to manage the performance tuning. While the active mode
uses epp to provides a hint to the hardware if software wants to bias
toward performance (0x0) or energy efficiency (0xff). CPPC power algorithm
in hardware will automatically calculate the runtime workload and adjust the
realtime cpu cores frequency according to the power supply and thermal, core
voltage and some other hardware conditions.
amd-cppc is enabled on passive mode with a top-level `cpufreq=amd-cppc` option,
while users add extra `active` flag to select active mode.
With `cpufreq=amd-cppc,active`, we did a 60s sampling test to see the CPU
frequency change, through tweaking the energy_perf preference from
`xenpm set-cpufreq-cppc powersave` to `xenpm set-cpufreq-cppc performance`.
The outputs are as follows:
```
Setting CPU in powersave mode
Sampling and Outputs:
Avg freq 580000 KHz
Avg freq 580000 KHz
Avg freq 580000 KHz
Setting CPU in performance mode
Sampling and Outputs:
Avg freq 4640000 KHz
Avg freq 4220000 KHz
Avg freq 4640000 KHz
Penny Zheng (13):
tools: drop "has_num" condition check for cppc mode
cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para"
tools: fix help info for "xenpm set-cpufreq-cppc"
xen/cpufreq: add missing default: case for x86 vendor
xen/cpufreq: refactor cmdline "cpufreq=xxx"
xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
xen/cpufreq: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc
driver
xen/cpufreq: implement amd-cppc driver for CPPC in passive mode
xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode
xen/cpufreq: get performance policy from governor set via xenpm
tools/cpufreq: extract CPPC para from cpufreq para
xen/cpufreq: bypass governor-related para for amd-cppc-epp
xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc
driver
docs/misc/xen-command-line.pandoc | 14 +-
tools/include/xenctrl.h | 5 +-
tools/libs/ctrl/xc_pm.c | 69 +-
tools/misc/xenpm.c | 104 ++-
xen/arch/x86/acpi/cpufreq/Makefile | 1 +
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 774 ++++++++++++++++++++++
xen/arch/x86/acpi/cpufreq/cpufreq.c | 76 ++-
xen/arch/x86/acpi/cpufreq/hwp.c | 2 +-
xen/arch/x86/cpu/amd.c | 8 +-
xen/arch/x86/include/asm/amd.h | 2 +
xen/arch/x86/include/asm/msr-index.h | 6 +
xen/arch/x86/platform_hypercall.c | 19 +
xen/arch/x86/x86_64/cpufreq.c | 19 +
xen/arch/x86/x86_64/platform_hypercall.c | 3 +
xen/drivers/acpi/pm-op.c | 48 +-
xen/drivers/acpi/pmstat.c | 4 +
xen/drivers/cpufreq/cpufreq.c | 204 +++++-
xen/drivers/cpufreq/utility.c | 15 +
xen/include/acpi/cpufreq/cpufreq.h | 28 +-
xen/include/acpi/cpufreq/processor_perf.h | 14 +-
xen/include/public/platform.h | 26 +
xen/include/public/sysctl.h | 15 +-
xen/include/xen/pmstat.h | 5 +
xen/include/xlat.lst | 1 +
24 files changed, 1368 insertions(+), 94 deletions(-)
create mode 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c
--
2.34.1
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v7 01/13] tools: drop "has_num" condition check for cppc mode
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-22 10:52 ` [PATCH v7 02/13] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para" Penny Zheng
` (11 subsequent siblings)
12 siblings, 0 replies; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Juergen Gross,
Jan Beulich
In `xenpm get-cpufreq-para <cpuid>`, ->freq_num and ->cpu_num checking are
tied together via variable "has_num", while ->freq_num only has non-zero value
when cpufreq driver in legacy P-states mode.
So we drop the "has_num" condition check, and mirror the ->gov_num check for
both ->freq_num and ->cpu_num in xc_get_cpufreq_para().
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
---
v3 -> v4:
- drop the "has_num" condition check
---
v4 -> v5:
- refactor title and commit
- make all three pieces (xc_hypercall_bounce_pre()) be as similar as possible
---
v5 -> v6:
- move set_xen_guest_handle() up to the bottom of the identical conditional
---
tools/libs/ctrl/xc_pm.c | 41 +++++++++++++++++++++--------------------
1 file changed, 21 insertions(+), 20 deletions(-)
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index 1f2430cac2..6fda973f1f 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -210,33 +210,36 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
DECLARE_NAMED_HYPERCALL_BOUNCE(scaling_available_governors,
user_para->scaling_available_governors,
user_para->gov_num * CPUFREQ_NAME_LEN * sizeof(char), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
- bool has_num = user_para->cpu_num && user_para->freq_num;
- if ( has_num )
+ if ( (user_para->cpu_num && !user_para->affected_cpus) ||
+ (user_para->freq_num && !user_para->scaling_available_frequencies) ||
+ (user_para->gov_num && !user_para->scaling_available_governors) )
+ {
+ errno = EINVAL;
+ return -1;
+ }
+ if ( user_para->cpu_num )
{
- if ( (!user_para->affected_cpus) ||
- (!user_para->scaling_available_frequencies) ||
- (user_para->gov_num && !user_para->scaling_available_governors) )
- {
- errno = EINVAL;
- return -1;
- }
ret = xc_hypercall_bounce_pre(xch, affected_cpus);
if ( ret )
return ret;
+ set_xen_guest_handle(sys_para->affected_cpus, affected_cpus);
+ }
+ if ( user_para->freq_num )
+ {
ret = xc_hypercall_bounce_pre(xch, scaling_available_frequencies);
if ( ret )
goto unlock_2;
- if ( user_para->gov_num )
- ret = xc_hypercall_bounce_pre(xch, scaling_available_governors);
+ set_xen_guest_handle(sys_para->scaling_available_frequencies,
+ scaling_available_frequencies);
+ }
+ if ( user_para->gov_num )
+ {
+ ret = xc_hypercall_bounce_pre(xch, scaling_available_governors);
if ( ret )
goto unlock_3;
-
- set_xen_guest_handle(sys_para->affected_cpus, affected_cpus);
- set_xen_guest_handle(sys_para->scaling_available_frequencies, scaling_available_frequencies);
- if ( user_para->gov_num )
- set_xen_guest_handle(sys_para->scaling_available_governors,
- scaling_available_governors);
+ set_xen_guest_handle(sys_para->scaling_available_governors,
+ scaling_available_governors);
}
sysctl.cmd = XEN_SYSCTL_pm_op;
@@ -256,9 +259,7 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
user_para->gov_num = sys_para->gov_num;
}
- if ( has_num )
- goto unlock_4;
- return ret;
+ goto unlock_4;
}
else
{
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 02/13] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para"
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
2025-08-22 10:52 ` [PATCH v7 01/13] tools: drop "has_num" condition check for cppc mode Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-22 10:52 ` [PATCH v7 03/13] tools: fix help info for "xenpm set-cpufreq-cppc" Penny Zheng
` (10 subsequent siblings)
12 siblings, 0 replies; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Juergen Gross,
Andrew Cooper, Michal Orzel, Jan Beulich, Julien Grall,
Roger Pau Monné, Stefano Stabellini
As we are going to add "struct xen_cppc_para" in "struct xen_sysctl_pm_op" as
a new xenpm sub-op later to specifically dealing with CPPC-info, we need to
follow the naming pattern, to change the struct name to "xen_get_cppc_para",
which is more suitable than "xen_cppc_para".
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
---
v5 -> v6:
- new commit
---
tools/include/xenctrl.h | 2 +-
xen/arch/x86/acpi/cpufreq/hwp.c | 2 +-
xen/include/acpi/cpufreq/cpufreq.h | 2 +-
xen/include/public/sysctl.h | 6 +++---
4 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 4955981231..965d3b585a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1906,7 +1906,7 @@ int xc_smt_disable(xc_interface *xch);
*/
typedef struct xen_userspace xc_userspace_t;
typedef struct xen_ondemand xc_ondemand_t;
-typedef struct xen_cppc_para xc_cppc_para_t;
+typedef struct xen_get_cppc_para xc_cppc_para_t;
struct xc_get_cpufreq_para {
/* IN/OUT variable */
diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 36ecb0ed9d..7d14a4abd5 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -529,7 +529,7 @@ hwp_cpufreq_driver = {
#ifdef CONFIG_PM_OP
int get_hwp_para(unsigned int cpu,
- struct xen_cppc_para *cppc_para)
+ struct xen_get_cppc_para *cppc_para)
{
const struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 0742aa9f44..fd530632b4 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -266,7 +266,7 @@ static inline bool hwp_active(void) { return false; }
#endif
int get_hwp_para(unsigned int cpu,
- struct xen_cppc_para *cppc_para);
+ struct xen_get_cppc_para *cppc_para);
int set_hwp_para(struct cpufreq_policy *policy,
struct xen_set_cppc_para *set_cppc);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index b7325b0f72..aafa7fcf2b 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -336,7 +336,7 @@ struct xen_ondemand {
uint32_t up_threshold;
};
-struct xen_cppc_para {
+struct xen_get_cppc_para {
/* OUT */
/* activity_window supported if set */
#define XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW (1 << 0)
@@ -442,7 +442,7 @@ struct xen_set_cppc_para {
XEN_SYSCTL_CPPC_SET_ACT_WINDOW )
/* IN/OUT */
uint32_t set_params; /* bitflags for valid values */
- /* See comments in struct xen_cppc_para. */
+ /* See comments in struct xen_get_cppc_para. */
/* IN */
uint32_t minimum;
uint32_t maximum;
@@ -490,7 +490,7 @@ struct xen_get_cpufreq_para {
struct xen_ondemand ondemand;
} u;
} s;
- struct xen_cppc_para cppc_para;
+ struct xen_get_cppc_para cppc_para;
} u;
int32_t turbo_enabled;
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 03/13] tools: fix help info for "xenpm set-cpufreq-cppc"
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
2025-08-22 10:52 ` [PATCH v7 01/13] tools: drop "has_num" condition check for cppc mode Penny Zheng
2025-08-22 10:52 ` [PATCH v7 02/13] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para" Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 14:30 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 04/13] xen/cpufreq: add missing default: case for x86 vendor Penny Zheng
` (9 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Anthony PERARD
Change "balance" to "ondemand" in help info for "xenpm set-cpufreq-cppc"
Fixes: 81ce87fc5e36 (xen/cpufreq: rename cppc preset name to "XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND")
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v6 -> v7:
- new commit
---
tools/misc/xenpm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 6ba7cb2302..6b054b10a4 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -69,7 +69,7 @@ void show_help(void)
" set-max-cstate <num>|'unlimited' [<num2>|'unlimited']\n"
" set the C-State limitation (<num> >= 0) and\n"
" optionally the C-sub-state limitation (<num2> >= 0)\n"
- " set-cpufreq-cppc [cpuid] [balance|performance|powersave] <param:val>*\n"
+ " set-cpufreq-cppc [cpuid] [ondemand|performance|powersave] <param:val>*\n"
" set Hardware P-State (HWP) parameters\n"
" on CPU <cpuid> or all if omitted.\n"
" optionally a preset of one of:\n"
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 04/13] xen/cpufreq: add missing default: case for x86 vendor
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (2 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 03/13] tools: fix help info for "xenpm set-cpufreq-cppc" Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 14:43 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 05/13] xen/cpufreq: refactor cmdline "cpufreq=xxx" Penny Zheng
` (8 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Jan Beulich, Andrew Cooper,
Roger Pau Monné
Since we are missing default case for x86 vendor, there is possibility (i.e.
new vendor introduced) that we will return successfully while missing the
whole cpufreq driver initialization process.
Move "ret = -ENOENTRY" forward to cover default case for x86 vendor, and
add error log
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v6 -> v7:
- new commit
---
xen/arch/x86/acpi/cpufreq/cpufreq.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index d18735c7ae..e227376bab 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -131,11 +131,11 @@ static int __init cf_check cpufreq_driver_init(void)
if ( cpufreq_controller == FREQCTL_xen )
{
+ ret = -ENOENT;
+
switch ( boot_cpu_data.x86_vendor )
{
case X86_VENDOR_INTEL:
- ret = -ENOENT;
-
for ( unsigned int i = 0; i < cpufreq_xen_cnt; i++ )
{
switch ( cpufreq_xen_opts[i] )
@@ -162,6 +162,10 @@ static int __init cf_check cpufreq_driver_init(void)
case X86_VENDOR_HYGON:
ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -ENODEV;
break;
+
+ default:
+ printk(XENLOG_ERR "Cpufreq: unsupported x86 vendor\n");
+ break;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 05/13] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (3 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 04/13] xen/cpufreq: add missing default: case for x86 vendor Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 14:45 ` Jan Beulich
2025-08-26 7:38 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data Penny Zheng
` (7 subsequent siblings)
12 siblings, 2 replies; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Jan Beulich
A helper function handle_cpufreq_cmdline() is introduced to tidy different
handling pathes.
We also add a new helper cpufreq_opts_contain() to ignore redundant setting,
like "cpufreq=hwp;hwp;xen"
As only slot 0 of cpufreq_xen_opts[] needs explicit initializing with
non-zero CPUFREQ_xen, dropping full array initializer could avoid touching
initializer every time it grows
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v2 -> v3:
- new commit
---
v3 -> v4:
- add one single helper to do the tidy work
- ignore and warn user redundant setting
---
v4 -> v5:
- make "cpufreq_opts_str" static and the string literals end up in
.init.rodata.
- use "CPUFREQ_xxx" as array slot index
- blank line between non-fall-through case blocks
---
v5 -> v6:
- change to "while ( count-- )"
- remove unnecessary warning
- add an assertion to ensure not overruning the array
- add ASSERT_UNREACHABLE()
- check ret of handle_cpufreq_cmdline() and error out
---
v6 -> v7:
- remove overrun assertion as we already have check in setup_cpufreq_option()
- drop full array initializer
---
xen/drivers/cpufreq/cpufreq.c | 55 +++++++++++++++++++++++++++--------
1 file changed, 43 insertions(+), 12 deletions(-)
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index efba141418..de17e53708 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -64,12 +64,49 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
/* set xen as default cpufreq */
enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
-enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
- CPUFREQ_none };
+enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen };
unsigned int __initdata cpufreq_xen_cnt = 1;
static int __init cpufreq_cmdline_parse(const char *s, const char *e);
+static bool __init cpufreq_opts_contain(enum cpufreq_xen_opt option)
+{
+ unsigned int count = cpufreq_xen_cnt;
+
+ while ( count-- )
+ {
+ if ( cpufreq_xen_opts[count] == option )
+ return true;
+ }
+
+ return false;
+}
+
+static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
+{
+ int ret = 0;
+
+ if ( cpufreq_opts_contain(option) )
+ return 0;
+
+ cpufreq_controller = FREQCTL_xen;
+ cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
+ switch ( option )
+ {
+ case CPUFREQ_hwp:
+ case CPUFREQ_xen:
+ xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
+ break;
+
+ default:
+ ASSERT_UNREACHABLE();
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
static int __init cf_check setup_cpufreq_option(const char *str)
{
const char *arg = strpbrk(str, ",:;");
@@ -113,21 +150,15 @@ static int __init cf_check setup_cpufreq_option(const char *str)
if ( choice > 0 || !cmdline_strcmp(str, "xen") )
{
- xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
- cpufreq_controller = FREQCTL_xen;
- cpufreq_xen_opts[cpufreq_xen_cnt++] = CPUFREQ_xen;
- ret = 0;
- if ( arg[0] && arg[1] )
+ ret = handle_cpufreq_cmdline(CPUFREQ_xen);
+ if ( !ret && arg[0] && arg[1] )
ret = cpufreq_cmdline_parse(arg + 1, end);
}
else if ( IS_ENABLED(CONFIG_INTEL) && choice < 0 &&
!cmdline_strcmp(str, "hwp") )
{
- xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
- cpufreq_controller = FREQCTL_xen;
- cpufreq_xen_opts[cpufreq_xen_cnt++] = CPUFREQ_hwp;
- ret = 0;
- if ( arg[0] && arg[1] )
+ ret = handle_cpufreq_cmdline(CPUFREQ_hwp);
+ if ( !ret && arg[0] && arg[1] )
ret = hwp_cmdline_parse(arg + 1, end);
}
else
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (4 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 05/13] xen/cpufreq: refactor cmdline "cpufreq=xxx" Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 15:01 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 07/13] xen/cpufreq: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver Penny Zheng
` (6 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini
In order to provide backward compatibility with existing governors
that represent performance as frequency, like ondemand, the _CPC
table can optionally provide processor frequency range values, Lowest
frequency and Nominal frequency, to let OS use Lowest Frequency/
Performance and Nominal Frequency/Performance as anchor points to
create linear mapping of CPPC performance to CPU frequency.
As Xen is uncapable of parsing the ACPI dynamic table, we'd like to
introduce a new sub-hypercall "XEN_PM_CPPC" to propagate required CPPC
data from dom0 kernel to Xen.
In the according handler set_cppc_pminfo(), we do _CPC and _PSD
sanitization check, as both _PSD and _CPC info are necessary for correctly
initializing cpufreq cores in CPPC mode.
Users shall be warned that if we failed at this point,
no fallback scheme, like legacy P-state could be switched to.
A new flag "XEN_CPPC_INIT" is also introduced for cpufreq core initialised in
CPPC mode. Then all .init flag checking shall be updated to
consider "XEN_CPPC_INIT" too.
We want to bypass construction of px statistic info in cpufreq_statistic_init()
for CPPC mode, while not bypassing cpufreq_statistic_lock initialization for a
good reason. The same check is unnecessary for cpufreq_statistic_exit(),
since it has already been covered by px statistic variable
"cpufreq_statistic_data" check
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Remove unnecessary figure braces
- Pointer-to-const for print_CPPC and set_cppc_pminfo
- Structure allocation shall use xvzalloc()
- Unnecessary memcpy(), and change it to a (type safe) structure assignment
- Add comment for struct xen_processor_cppc, and keep the chosen fields
in the order _CPC has them
- Obey to alphabetic sorting, and prefix compat structures with ? instead
of !
---
v2 -> v3:
- Trim too long line
- Re-place set_cppc_pminfo() past set_px_pminfo()
- Fix Misra violations: Declaration and definition ought to agree
in parameter names
- Introduce a new flag XEN_PM_CPPC to reflect processor initialised in CPPC
mode
---
v3 -> v4:
- Refactor commit message
- make "acpi_id" unsigned int
- Add warning message when cpufreq_cpu_init() failed only under debug mode
- Expand "struct xen_processor_cppc" to include _PSD and shared type
- add sanity check for ACPI CPPC data
---
v4 -> v5:
- remove the ordering check between lowest_nonlinear_perf and lowest_perf
- use printk_once() for cppc perf value warning
- complement comment for cppc perf value check
- remove redundant check and pointless parenthesizing
- use dprintk() for warning under #ifndef NDEBUG
- refactor warning message when having non-zero ret of cpufreq_cpu_init()
- With introduction of "struct xen_psd_package" in "struct xen_processor_cppc",
use ! and the respective XLAT_* macro(s) to wrap.
---
v5 -> v6:
- remove unnecessary input parameter check
- use print_once() instead of dprintk() and reword the log message
- adhere to designated comment style
- relative ordering shall be consistent between different declaration groups
- add alphabetically in xlat.lst
- in get_cpufreq_para(), add must-zero check for ->perf.state_count in CPPC mode
---
v6 -> v7:
- move XEN_PX_INIT to its own line to be neater
- change to "Set CPU%d (ACPI ID %u) CPPC state info:\n"
- convert two-if()s into a single one
- make the extra indentation a proper level (i.e. 4 blanks)
- only both lowest_mhz and nominal_mhz provided, the check is meaningful
- as cpuid is int, so print with %d
- only need to make sure the maximum value highest_perf is smaller than
UINT8_MAX
- make XEN_CPPC_* definition show how it works
- add code comment
---
xen/arch/x86/platform_hypercall.c | 5 +
xen/arch/x86/x86_64/cpufreq.c | 19 ++++
xen/arch/x86/x86_64/platform_hypercall.c | 3 +
xen/drivers/acpi/pm-op.c | 6 +-
xen/drivers/acpi/pmstat.c | 4 +
xen/drivers/cpufreq/cpufreq.c | 124 +++++++++++++++++++++-
xen/include/acpi/cpufreq/processor_perf.h | 4 +-
xen/include/public/platform.h | 26 +++++
xen/include/xen/pmstat.h | 5 +
xen/include/xlat.lst | 1 +
10 files changed, 192 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 3eba791889..42b3b8b95a 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -577,6 +577,11 @@ ret_t do_platform_op(
break;
}
+ case XEN_PM_CPPC:
+ ret = set_cppc_pminfo(op->u.set_pminfo.id,
+ &op->u.set_pminfo.u.cppc_data);
+ break;
+
default:
ret = -EINVAL;
break;
diff --git a/xen/arch/x86/x86_64/cpufreq.c b/xen/arch/x86/x86_64/cpufreq.c
index e4f3d5b436..8d57f67c2e 100644
--- a/xen/arch/x86/x86_64/cpufreq.c
+++ b/xen/arch/x86/x86_64/cpufreq.c
@@ -54,3 +54,22 @@ int compat_set_px_pminfo(uint32_t acpi_id,
return set_px_pminfo(acpi_id, xen_perf);
}
+
+int compat_set_cppc_pminfo(unsigned int acpi_id,
+ const struct compat_processor_cppc *cppc_data)
+
+{
+ struct xen_processor_cppc *xen_cppc;
+ unsigned long xlat_page_current;
+
+ xlat_malloc_init(xlat_page_current);
+
+ xen_cppc = xlat_malloc_array(xlat_page_current,
+ struct xen_processor_cppc, 1);
+ if ( unlikely(xen_cppc == NULL) )
+ return -EFAULT;
+
+ XLAT_processor_cppc(xen_cppc, cppc_data);
+
+ return set_cppc_pminfo(acpi_id, xen_cppc);
+}
diff --git a/xen/arch/x86/x86_64/platform_hypercall.c b/xen/arch/x86/x86_64/platform_hypercall.c
index 9ab631c17f..0288f68df9 100644
--- a/xen/arch/x86/x86_64/platform_hypercall.c
+++ b/xen/arch/x86/x86_64/platform_hypercall.c
@@ -14,6 +14,9 @@ EMIT_FILE;
#define efi_get_info efi_compat_get_info
#define efi_runtime_call(x) efi_compat_runtime_call(x)
+#define xen_processor_cppc compat_processor_cppc
+#define set_cppc_pminfo compat_set_cppc_pminfo
+
#define xen_processor_performance compat_processor_performance
#define set_px_pminfo compat_set_px_pminfo
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 9a1970df34..eab64bb46e 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -91,7 +91,9 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
pmpt = processor_pminfo[op->cpuid];
policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
- if ( !pmpt || !pmpt->perf.states ||
+ if ( !pmpt ||
+ ((pmpt->init & XEN_PX_INIT) && !pmpt->perf.states) ||
+ ((pmpt->init & XEN_CPPC_INIT) && pmpt->perf.state_count) ||
!policy || !policy->governor )
return -EINVAL;
@@ -351,7 +353,7 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
case CPUFREQ_PARA:
if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
return -ENODEV;
- if ( !pmpt || !(pmpt->init & XEN_PX_INIT) )
+ if ( !pmpt || !(pmpt->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
return -EINVAL;
break;
}
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 4fae0c14af..0f31736df2 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -108,6 +108,10 @@ int cpufreq_statistic_init(unsigned int cpu)
if ( !pmpt )
return -EINVAL;
+ /* Only need to initialize in Px mode */
+ if ( !(pmpt->init & XEN_PX_INIT) )
+ return 0;
+
spin_lock(cpufreq_statistic_lock);
pxpt = per_cpu(cpufreq_statistic_data, cpu);
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index de17e53708..046a366d7f 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -40,6 +40,7 @@
#include <xen/domain.h>
#include <xen/cpu.h>
#include <xen/pmstat.h>
+#include <xen/xvmalloc.h>
#include <asm/io.h>
#include <asm/processor.h>
@@ -234,6 +235,11 @@ static int get_psd_info(unsigned int cpu, unsigned int *shared_type,
*domain_info = &processor_pminfo[cpu]->perf.domain_info;
break;
+ case XEN_CPPC_INIT:
+ *shared_type = processor_pminfo[cpu]->cppc_data.shared_type;
+ *domain_info = &processor_pminfo[cpu]->cppc_data.domain_info;
+ break;
+
default:
ret = -EINVAL;
break;
@@ -259,7 +265,7 @@ int cpufreq_add_cpu(unsigned int cpu)
if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
return -EINVAL;
- if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
+ if ( !(processor_pminfo[cpu]->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
return -EINVAL;
if (!cpufreq_driver.init)
@@ -434,7 +440,7 @@ int cpufreq_del_cpu(unsigned int cpu)
if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
return -EINVAL;
- if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
+ if ( !(processor_pminfo[cpu]->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
return -EINVAL;
if (!per_cpu(cpufreq_cpu_policy, cpu))
@@ -693,6 +699,120 @@ int acpi_set_pdc_bits(unsigned int acpi_id, XEN_GUEST_HANDLE(uint32) pdc)
return ret;
}
+static void print_CPPC(const struct xen_processor_cppc *cppc_data)
+{
+ printk("\t_CPC: highest_perf=%u, lowest_perf=%u, "
+ "nominal_perf=%u, lowest_nonlinear_perf=%u, "
+ "nominal_mhz=%uMHz, lowest_mhz=%uMHz\n",
+ cppc_data->cpc.highest_perf, cppc_data->cpc.lowest_perf,
+ cppc_data->cpc.nominal_perf, cppc_data->cpc.lowest_nonlinear_perf,
+ cppc_data->cpc.nominal_mhz, cppc_data->cpc.lowest_mhz);
+}
+
+int set_cppc_pminfo(unsigned int acpi_id,
+ const struct xen_processor_cppc *cppc_data)
+{
+ int ret = 0, cpuid;
+ struct processor_pminfo *pm_info;
+
+ cpuid = get_cpu_id(acpi_id);
+ if ( cpuid < 0 )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if ( cppc_data->pad[0] || cppc_data->pad[1] || cppc_data->pad[2] )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if ( cpufreq_verbose )
+ printk("Set CPU%d (ACPI ID %u) CPPC state info:\n",
+ cpuid, acpi_id);
+
+ pm_info = processor_pminfo[cpuid];
+ if ( !pm_info )
+ {
+ pm_info = xvzalloc(struct processor_pminfo);
+ if ( !pm_info )
+ {
+ ret = -ENOMEM;
+ goto out;
+ }
+ processor_pminfo[cpuid] = pm_info;
+ }
+ pm_info->acpi_id = acpi_id;
+ pm_info->id = cpuid;
+ pm_info->cppc_data = *cppc_data;
+
+ if ( (cppc_data->flags & XEN_CPPC_PSD) &&
+ !check_psd_pminfo(cppc_data->shared_type) )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if ( cppc_data->flags & XEN_CPPC_CPC )
+ {
+ if ( cppc_data->cpc.highest_perf == 0 ||
+ cppc_data->cpc.highest_perf > UINT8_MAX ||
+ cppc_data->cpc.nominal_perf == 0 ||
+ cppc_data->cpc.lowest_nonlinear_perf == 0 ||
+ cppc_data->cpc.lowest_perf == 0 ||
+ cppc_data->cpc.lowest_perf >
+ cppc_data->cpc.lowest_nonlinear_perf ||
+ cppc_data->cpc.lowest_nonlinear_perf >
+ cppc_data->cpc.nominal_perf ||
+ cppc_data->cpc.nominal_perf > cppc_data->cpc.highest_perf )
+ /*
+ * Right now, Xen doesn't actually use highest_perf/nominal_perf/
+ * lowest_nonlinear_perf/lowest_perf values read from ACPI _CPC
+ * table. Xen reads CPPC capability MSR to get these four values.
+ * So warning is enough.
+ */
+ printk_once(XENLOG_WARNING
+ "Broken CPPC perf values: lowest(%u), nonlinear_lowest(%u), nominal(%u), highest(%u)\n",
+ cppc_data->cpc.lowest_perf,
+ cppc_data->cpc.lowest_nonlinear_perf,
+ cppc_data->cpc.nominal_perf,
+ cppc_data->cpc.highest_perf);
+
+ /* lowest_mhz and nominal_mhz are optional value */
+ if ( cppc_data->cpc.nominal_mhz &&
+ cppc_data->cpc.lowest_mhz > cppc_data->cpc.nominal_mhz )
+ {
+ printk_once(XENLOG_WARNING
+ "Broken CPPC freq values: lowest(%u), nominal(%u)\n",
+ cppc_data->cpc.lowest_mhz,
+ cppc_data->cpc.nominal_mhz);
+ /* Re-set with zero values, instead of keeping invalid values */
+ pm_info->cppc_data.cpc.nominal_mhz = 0;
+ pm_info->cppc_data.cpc.lowest_mhz = 0;
+ }
+ }
+
+ if ( cppc_data->flags == (XEN_CPPC_PSD | XEN_CPPC_CPC) )
+ {
+ if ( cpufreq_verbose )
+ {
+ print_PSD(&pm_info->cppc_data.domain_info);
+ print_CPPC(&pm_info->cppc_data);
+ }
+
+ pm_info->init = XEN_CPPC_INIT;
+ ret = cpufreq_cpu_init(cpuid);
+ if ( ret )
+ printk_once(XENLOG_WARNING
+ "CPU%d failed amd-cppc mode init; use \"cpufreq=xen\" instead",
+ cpuid);
+ }
+
+ out:
+ return ret;
+}
+
static void cpufreq_cmdline_common_para(struct cpufreq_policy *new_policy)
{
if (usr_max_freq)
diff --git a/xen/include/acpi/cpufreq/processor_perf.h b/xen/include/acpi/cpufreq/processor_perf.h
index 4e045da983..e6576314f0 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -5,7 +5,8 @@
#include <public/sysctl.h>
#include <xen/acpi.h>
-#define XEN_PX_INIT 0x80000000U
+#define XEN_CPPC_INIT 0x40000000U
+#define XEN_PX_INIT 0x80000000U
unsigned int powernow_register_driver(void);
unsigned int get_measured_perf(unsigned int cpu, unsigned int flag);
@@ -43,6 +44,7 @@ struct processor_pminfo {
uint32_t acpi_id;
uint32_t id;
struct processor_performance perf;
+ struct xen_processor_cppc cppc_data;
uint32_t init;
};
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 2725b8d104..94349fc5f5 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -363,6 +363,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
#define XEN_PM_PX 1
#define XEN_PM_TX 2
#define XEN_PM_PDC 3
+#define XEN_PM_CPPC 4
/* Px sub info type */
#define XEN_PX_PCT 1
@@ -370,6 +371,10 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
#define XEN_PX_PPC 4
#define XEN_PX_PSD 8
+/* CPPC sub info type */
+#define XEN_CPPC_PSD (1U << 0)
+#define XEN_CPPC_CPC (1U << 1)
+
struct xen_power_register {
uint32_t space_id;
uint32_t bit_width;
@@ -457,6 +462,26 @@ struct xen_processor_performance {
typedef struct xen_processor_performance xen_processor_performance_t;
DEFINE_XEN_GUEST_HANDLE(xen_processor_performance_t);
+struct xen_processor_cppc {
+ uint8_t flags; /* IN: XEN_CPPC_xxx */
+ uint8_t pad[3];
+ /*
+ * IN: Subset _CPC fields useful for CPPC-compatible cpufreq
+ * driver's initialization
+ */
+ struct {
+ uint32_t highest_perf;
+ uint32_t nominal_perf;
+ uint32_t lowest_nonlinear_perf;
+ uint32_t lowest_perf;
+ uint32_t lowest_mhz;
+ uint32_t nominal_mhz;
+ } cpc;
+ uint32_t shared_type; /* IN: XEN_CPUPERF_SHARED_TYPE_xxx */
+ struct xen_psd_package domain_info; /* IN: _PSD */
+};
+typedef struct xen_processor_cppc xen_processor_cppc_t;
+
struct xenpf_set_processor_pminfo {
/* IN variables */
uint32_t id; /* ACPI CPU ID */
@@ -465,6 +490,7 @@ struct xenpf_set_processor_pminfo {
struct xen_processor_power power;/* Cx: _CST/_CSD */
struct xen_processor_performance perf; /* Px: _PPC/_PCT/_PSS/_PSD */
XEN_GUEST_HANDLE(uint32) pdc; /* _PDC */
+ xen_processor_cppc_t cppc_data; /* CPPC: _CPC and _PSD */
} u;
};
typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t;
diff --git a/xen/include/xen/pmstat.h b/xen/include/xen/pmstat.h
index 8350403e95..6096560d3c 100644
--- a/xen/include/xen/pmstat.h
+++ b/xen/include/xen/pmstat.h
@@ -7,12 +7,17 @@
int set_px_pminfo(uint32_t acpi_id, struct xen_processor_performance *perf);
long set_cx_pminfo(uint32_t acpi_id, struct xen_processor_power *power);
+int set_cppc_pminfo(unsigned int acpi_id,
+ const struct xen_processor_cppc *cppc_data);
#ifdef CONFIG_COMPAT
struct compat_processor_performance;
int compat_set_px_pminfo(uint32_t acpi_id, struct compat_processor_performance *perf);
struct compat_processor_power;
long compat_set_cx_pminfo(uint32_t acpi_id, struct compat_processor_power *power);
+struct compat_processor_cppc;
+int compat_set_cppc_pminfo(unsigned int acpi_id,
+ const struct compat_processor_cppc *cppc_data);
#endif
uint32_t pmstat_get_cx_nr(unsigned int cpu);
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 6d6c6cfab2..9d08dcc4bb 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -160,6 +160,7 @@
! pct_register platform.h
! power_register platform.h
+! processor_cppc platform.h
? processor_csd platform.h
! processor_cx platform.h
! processor_flags platform.h
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 07/13] xen/cpufreq: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (5 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 15:07 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 08/13] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode Penny Zheng
` (5 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Andrew Cooper, Anthony PERARD,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
Users need to set "cpufreq=amd-cppc" in xen cmdline to enable amd-cppc driver,
which selects ACPI Collaborative Performance and Power Control (CPPC) on
supported AMD hardware to provide a finer grained frequency control mechanism.
`verbose` option can also be included to support verbose print.
When users setting "cpufreq=amd-cppc", a new amd-cppc driver
shall be registered and used. All hooks for amd-cppc driver are transiently
missing, and we temporarily make registration fail with -EOPNOTSUPP here. It
will be fixed along with the implementation.
New xen-pm internal flag XEN_PROCESSOR_PM_CPPC is introduced, to stand for
cpufreq driver in CPPC mode. We define XEN_PROCESSOR_PM_CPPC 0x100, as it is
the next value to use after 8-bits wide public xen-pm options. We also add
sanity check on compile time. All XEN_PROCESSOR_PM_xxx checking shall be
updated to consider "XEN_PROCESSOR_PM_CPPC" too.
XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX are firstly set when Xen parsed
relative driver signature from xen cmdline, and will become exclusive after
cpufreq driver registration. It is because that platform could not support
both or mixed mode (CPPC & legacy Px) operations, and only one cpufreq driver
could be registerd in Xen at one time, such as on AMD, it is either amd-cppc
or legacy P-states driver.
Xen rely on XEN_PROCESSOR_PM_CPPC flag to tell current cpufreq driver is in
CPPC mode, and accepts relative hypercall. It will neglect Px request and
yields success.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Obey to alphabetic sorting and also strict it with CONFIG_AMD
- Remove unnecessary empty comment line
- Use __initconst_cf_clobber for pre-filled structure cpufreq_driver
- Make new switch-case code apply to Hygon CPUs too
- Change ENOSYS with EOPNOTSUPP
- Blanks around binary operator
- Change all amd_/-pstate defined values to amd_/-cppc
---
v2 -> v3
- refactor too long lines
- Make sure XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC incompatible flags
after cpufreq register registrantion
---
v3 -> v4:
- introduce XEN_PROCESSOR_PM_CPPC in xen internal header
- complement "Hygon" in log message
- remove unnecessary if()
- grow cpufreq_xen_opts[] array
---
v4 -> v5:
- remove XEN_PROCESSOR_PM_xxx flag sanitization from individual driver
- prefer ! over "== 0" in purely boolean contexts
- Blank line between non-fall-through case blocks
- add build-time checking between internal and public XEN_PROCESSOR_PM_*
values
- define XEN_PROCESSOR_PM_CPPC with 0x100, as it is the next value to use
after public interface, and public mask SIF_PM_MASK is 8 bits wide.
- as Dom0 will send the CPPC/Px data whenever it could, the return value shall
be 0 instead of -ENOSYS/EOPNOTSUP when platform doesn't require these data.
---
v5 -> v6:
- do not register the driver when all hooks are NULL
- refactor the subject and commit message
- move pruning of xen_processor_pmbits into generic space
- add comment and build-time check for XEN_PROCESSOR_PM_CPPC
---
v6 -> v7:
- reomove pointless initializer for "unsigned int i"
- move closing brace into its own line
- insertion always at the bottom
- change to use #ifdef CONFIG_AMD code wrapping
---
docs/misc/xen-command-line.pandoc | 7 ++-
xen/arch/x86/acpi/cpufreq/Makefile | 1 +
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 59 ++++++++++++++++++++
xen/arch/x86/acpi/cpufreq/cpufreq.c | 68 ++++++++++++++++++++++-
xen/arch/x86/platform_hypercall.c | 14 +++++
xen/drivers/acpi/pm-op.c | 3 +-
xen/drivers/cpufreq/cpufreq.c | 13 ++++-
xen/include/acpi/cpufreq/cpufreq.h | 6 +-
xen/include/acpi/cpufreq/processor_perf.h | 10 ++++
9 files changed, 174 insertions(+), 7 deletions(-)
create mode 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a75b6c9301..3916cc81f6 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -515,7 +515,7 @@ If set, force use of the performance counters for oprofile, rather than detectin
available support.
### cpufreq
-> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]]`
+> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[verbose]]`
> Default: `xen`
@@ -526,7 +526,7 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
* `<maxfreq>` and `<minfreq>` are integers which represent max and min processor frequencies
respectively.
* `verbose` option can be included as a string or also as `verbose=<integer>`
- for `xen`. It is a boolean for `hwp`.
+ for `xen`. It is a boolean for `hwp` and `amd-cppc`.
* `hwp` selects Hardware-Controlled Performance States (HWP) on supported Intel
hardware. HWP is a Skylake+ feature which provides better CPU power
management. The default is disabled. If `hwp` is selected, but hardware
@@ -534,6 +534,9 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC). HDC enables the
processor to autonomously force physical package components into idle state.
The default is enabled, but the option only applies when `hwp` is enabled.
+* `amd-cppc` selects ACPI Collaborative Performance and Power Control (CPPC)
+ on supported AMD hardware to provide finer grained frequency control
+ mechanism. The default is disabled.
There is also support for `;`-separated fallback options:
`cpufreq=hwp;xen,verbose`. This first tries `hwp` and falls back to `xen` if
diff --git a/xen/arch/x86/acpi/cpufreq/Makefile b/xen/arch/x86/acpi/cpufreq/Makefile
index e7dbe434a8..a2ba34bda0 100644
--- a/xen/arch/x86/acpi/cpufreq/Makefile
+++ b/xen/arch/x86/acpi/cpufreq/Makefile
@@ -1,4 +1,5 @@
obj-$(CONFIG_INTEL) += acpi.o
+obj-$(CONFIG_AMD) += amd-cppc.o
obj-y += cpufreq.o
obj-$(CONFIG_INTEL) += hwp.o
obj-$(CONFIG_AMD) += powernow.o
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
new file mode 100644
index 0000000000..3377783f7e
--- /dev/null
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * amd-cppc.c - AMD Processor CPPC Frequency Driver
+ *
+ * Copyright (C) 2025 Advanced Micro Devices, Inc. All Rights Reserved.
+ *
+ * Author: Penny Zheng <penny.zheng@amd.com>
+ *
+ * AMD CPPC cpufreq driver introduces a new CPU performance scaling design
+ * for AMD processors using the ACPI Collaborative Performance and Power
+ * Control (CPPC) feature which provides finer grained frequency control range.
+ */
+
+#include <xen/domain.h>
+#include <xen/init.h>
+#include <xen/param.h>
+#include <acpi/cpufreq/cpufreq.h>
+
+static bool __init amd_cppc_handle_option(const char *s, const char *end)
+{
+ int ret;
+
+ ret = parse_boolean("verbose", s, end);
+ if ( ret >= 0 )
+ {
+ cpufreq_verbose = ret;
+ return true;
+ }
+
+ return false;
+}
+
+int __init amd_cppc_cmdline_parse(const char *s, const char *e)
+{
+ do {
+ const char *end = strpbrk(s, ",;");
+
+ if ( !amd_cppc_handle_option(s, end) )
+ {
+ printk(XENLOG_WARNING
+ "cpufreq/amd-cppc: option '%.*s' not recognized\n",
+ (int)((end ?: e) - s), s);
+
+ return -EINVAL;
+ }
+
+ s = end ? end + 1 : NULL;
+ } while ( s && s < e );
+
+ return 0;
+}
+
+int __init amd_cppc_register_driver(void)
+{
+ if ( !cpu_has_cppc )
+ return -ENODEV;
+
+ return -EOPNOTSUPP;
+}
diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index e227376bab..6a0d9b1092 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -131,12 +131,13 @@ static int __init cf_check cpufreq_driver_init(void)
if ( cpufreq_controller == FREQCTL_xen )
{
+ unsigned int i;
ret = -ENOENT;
switch ( boot_cpu_data.x86_vendor )
{
case X86_VENDOR_INTEL:
- for ( unsigned int i = 0; i < cpufreq_xen_cnt; i++ )
+ for ( i = 0; i < cpufreq_xen_cnt; i++ )
{
switch ( cpufreq_xen_opts[i] )
{
@@ -151,6 +152,11 @@ static int __init cf_check cpufreq_driver_init(void)
case CPUFREQ_none:
ret = 0;
break;
+
+ default:
+ printk(XENLOG_WARNING
+ "Unsupported cpufreq driver for vendor Intel\n");
+ break;
}
if ( !ret || ret == -EBUSY )
@@ -160,13 +166,71 @@ static int __init cf_check cpufreq_driver_init(void)
case X86_VENDOR_AMD:
case X86_VENDOR_HYGON:
- ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -ENODEV;
+#ifdef CONFIG_AMD
+ for ( i = 0; i < cpufreq_xen_cnt; i++ )
+ {
+ switch ( cpufreq_xen_opts[i] )
+ {
+ case CPUFREQ_xen:
+ ret = powernow_register_driver();
+ break;
+
+ case CPUFREQ_amd_cppc:
+ ret = amd_cppc_register_driver();
+ break;
+
+ case CPUFREQ_none:
+ ret = 0;
+ break;
+
+ default:
+ printk(XENLOG_WARNING
+ "Unsupported cpufreq driver for vendor AMD or Hygon\n");
+ break;
+ }
+
+ if ( !ret || ret == -EBUSY )
+ break;
+ }
+#else
+ ret = -ENODEV;
+#endif /* CONFIG_AMD */
break;
default:
printk(XENLOG_ERR "Cpufreq: unsupported x86 vendor\n");
break;
}
+
+ /*
+ * After successful cpufreq driver registeration, XEN_PROCESSOR_PM_CPPC
+ * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
+ */
+ if ( !ret )
+ {
+ ASSERT(i < cpufreq_xen_cnt);
+ switch ( cpufreq_xen_opts[i] )
+ {
+ case CPUFREQ_amd_cppc:
+ xen_processor_pmbits &= ~XEN_PROCESSOR_PM_PX;
+ break;
+
+ case CPUFREQ_hwp:
+ case CPUFREQ_xen:
+ xen_processor_pmbits &= ~XEN_PROCESSOR_PM_CPPC;
+ break;
+
+ default:
+ break;
+ }
+ }
+ else if ( ret != -EBUSY )
+ /*
+ * No cpufreq driver gets registered, clear both
+ * XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX
+ */
+ xen_processor_pmbits &= ~(XEN_PROCESSOR_PM_CPPC |
+ XEN_PROCESSOR_PM_PX);
}
return ret;
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 42b3b8b95a..cf64b8a622 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -546,6 +546,8 @@ ret_t do_platform_op(
ret = 0;
break;
}
+ /* Xen doesn't support mixed mode */
+ ASSERT(!(xen_processor_pmbits & XEN_PROCESSOR_PM_CPPC));
ret = set_px_pminfo(op->u.set_pminfo.id, &op->u.set_pminfo.u.perf);
break;
@@ -578,6 +580,18 @@ ret_t do_platform_op(
}
case XEN_PM_CPPC:
+ if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_CPPC) )
+ {
+ /*
+ * Neglect CPPC-info when registered cpufreq driver
+ * isn't in CPPC mode
+ */
+ ret = 0;
+ break;
+ }
+ /* Xen doesn't support mixed mode */
+ ASSERT(!(xen_processor_pmbits & XEN_PROCESSOR_PM_PX));
+
ret = set_cppc_pminfo(op->u.set_pminfo.id,
&op->u.set_pminfo.u.cppc_data);
break;
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index eab64bb46e..427656c48c 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -351,7 +351,8 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
switch ( op->cmd & PM_PARA_CATEGORY_MASK )
{
case CPUFREQ_PARA:
- if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
+ if ( !(xen_processor_pmbits & (XEN_PROCESSOR_PM_PX |
+ XEN_PROCESSOR_PM_CPPC)) )
return -ENODEV;
if ( !pmpt || !(pmpt->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
return -EINVAL;
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 046a366d7f..41e0da3b77 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -65,7 +65,7 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
/* set xen as default cpufreq */
enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
-enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen };
+enum cpufreq_xen_opt __initdata cpufreq_xen_opts[3] = { CPUFREQ_xen };
unsigned int __initdata cpufreq_xen_cnt = 1;
static int __init cpufreq_cmdline_parse(const char *s, const char *e);
@@ -99,6 +99,10 @@ static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
break;
+ case CPUFREQ_amd_cppc:
+ xen_processor_pmbits |= XEN_PROCESSOR_PM_CPPC;
+ break;
+
default:
ASSERT_UNREACHABLE();
ret = -EINVAL;
@@ -162,6 +166,13 @@ static int __init cf_check setup_cpufreq_option(const char *str)
if ( !ret && arg[0] && arg[1] )
ret = hwp_cmdline_parse(arg + 1, end);
}
+ else if ( IS_ENABLED(CONFIG_AMD) && choice < 0 &&
+ !cmdline_strcmp(str, "amd-cppc") )
+ {
+ ret = handle_cpufreq_cmdline(CPUFREQ_amd_cppc);
+ if ( !ret && arg[0] && arg[1] )
+ ret = amd_cppc_cmdline_parse(arg + 1, end);
+ }
else
ret = -EINVAL;
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index fd530632b4..5d4881eea8 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -26,8 +26,9 @@ enum cpufreq_xen_opt {
CPUFREQ_none,
CPUFREQ_xen,
CPUFREQ_hwp,
+ CPUFREQ_amd_cppc,
};
-extern enum cpufreq_xen_opt cpufreq_xen_opts[2];
+extern enum cpufreq_xen_opt cpufreq_xen_opts[3];
extern unsigned int cpufreq_xen_cnt;
struct cpufreq_governor;
@@ -272,4 +273,7 @@ int set_hwp_para(struct cpufreq_policy *policy,
int acpi_cpufreq_register(void);
+int amd_cppc_cmdline_parse(const char *s, const char *e);
+int amd_cppc_register_driver(void);
+
#endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/acpi/cpufreq/processor_perf.h b/xen/include/acpi/cpufreq/processor_perf.h
index e6576314f0..0a87bc0384 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -5,6 +5,16 @@
#include <public/sysctl.h>
#include <xen/acpi.h>
+/*
+ * Internal xen-pm options
+ * They are extension to public xen-pm options (XEN_PROCESSOR_PM_xxx) defined
+ * in public/platform.h, guarded by SIF_PM_MASK
+ */
+#define XEN_PROCESSOR_PM_CPPC 0x100
+#if XEN_PROCESSOR_PM_CPPC & MASK_EXTR(~0, SIF_PM_MASK)
+# error "XEN_PROCESSOR_PM_CPPC shall not occupy bits reserved for public xen-pm options"
+#endif
+
#define XEN_CPPC_INIT 0x40000000U
#define XEN_PX_INIT 0x80000000U
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 08/13] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (6 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 07/13] xen/cpufreq: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 15:14 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 09/13] xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode Penny Zheng
` (4 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="a", Size: 21727 bytes --]
amd-cppc is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism. The new mechanism is based on
Collaborative Processor Performance Control (CPPC) which is a finer grain
frequency management than legacy ACPI hardware P-States.
Current AMD CPU platforms are using the ACPI P-states driver to
manage CPU frequency and clocks with switching only in 3 P-states, while the
new amd-cppc allows a more flexible, low-latency interface for Xen
to directly communicate the performance hints to hardware.
"amd-cppc" driver is responsible for implementing CPPC in passive mode, which
still leverages Xen governors such as *ondemand*, *performance*, etc, to
calculate the performance hints. In the future, we will introduce an advanced
active mode to enable autonomous performence level selection.
Field epp, energy performance preference, which only has meaning when active
mode is enabled and will be introduced later in details, so we read
pre-defined BIOS value for it in passive mode.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- re-construct union caps and req to have anonymous struct instead
- avoid "else" when the earlier if() ends in an unconditional control flow statement
- Add check to avoid chopping off set bits from cast
- make pointers pointer-to-const wherever possible
- remove noisy log
- exclude families before 0x17 before CPPC-feature MSR op
- remove useless variable helpers
- use xvzalloc and XVFREE
- refactor error handling as ENABLE bit can only be cleared by reset
---
v2 -> v3:
- Move all MSR-definations to msr-index.h and follow the required style
- Refactor opening figure braces for struct/union
- Sort overlong lines throughout the series
- Make offset/res int covering underflow scenario
- Error out when amd_max_freq_mhz isn't set
- Introduce amd_get_freq(name) macro to decrease redundancy
- Supported CPU family checked ahead of smp-function
- Nominal freq shall be checked between the [min, max]
- Use APERF/MPREF to calculate current frequency
- Use amd_cppc_cpufreq_cpu_exit() to tidy error path
---
v3 -> v4:
- verbose print shall come with a CPU number
- deal with res <= 0 in amd_cppc_khz_to_perf()
- introduce a single helper amd_get_lowest_or_nominal_freq() to cover both
lowest and nominal scenario
- reduce abuse of wrmsr_safe()/rdmsr_safe() with wrmsrl()/rdmsrl()
- move cf_check from amd_cppc_write_request() to amd_cppc_write_request_msrs()
- add comment to explain why setting non_linear_lowest in passive mode
- add check to ensure perf values in
lowest <= non_linear_lowest <= nominal <= highset
- refactor comment for "data->err != 0" scenario
- use "data->err" instead of -ENODEV
- add U suffixes for all msr macro
---
v4 -> v5:
- all freq-values shall be unsigned int type
- remove shortcuts as it is rarely taken
- checking cpc.nominal_mhz and cpc.lowest_mhz are non-zero values is enough
- drop the explicit type cast
- null pointer check is in no need for internal functions
- change amd_get_lowest_or_nominal_freq() to amd_get_cpc_freq()
- clarifying function-wide that the calculated frequency result is to be in kHz
- use array notation
- with cpu_has_cppc check, no need to do cpu family check
---
v5 -> v6
- replace "AMD_CPPC" with "AMD-CPPC" in message
- add equation(mul,div) non-zero check
- replace -EINVAL with -EOPNOTSUPP
- refactor comment
---
v6 -> v7
- used > in place of !=, to not only serve a doc aspect, but also allow to
drop one part
- unify with UINT8_MAX
- return -ERANGE as we reject perf values of 0 as invalid
- replace uint32_t with unsigned int
- Move some epp introduction here, otherwise we will mis-handle this field here
by always clearing it
---
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 418 ++++++++++++++++++++++++++-
xen/arch/x86/cpu/amd.c | 8 +-
xen/arch/x86/include/asm/amd.h | 2 +
xen/arch/x86/include/asm/msr-index.h | 6 +
xen/include/public/sysctl.h | 1 +
5 files changed, 430 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index 3377783f7e..df9ed52237 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -14,7 +14,98 @@
#include <xen/domain.h>
#include <xen/init.h>
#include <xen/param.h>
+#include <xen/percpu.h>
+#include <xen/xvmalloc.h>
#include <acpi/cpufreq/cpufreq.h>
+#include <asm/amd.h>
+#include <asm/msr.h>
+
+#define amd_cppc_err(cpu, fmt, args...) \
+ printk(XENLOG_ERR "AMD-CPPC: CPU%u error: " fmt, cpu, ## args)
+#define amd_cppc_warn(cpu, fmt, args...) \
+ printk(XENLOG_WARNING "AMD-CPPC: CPU%u warning: " fmt, cpu, ## args)
+#define amd_cppc_verbose(cpu, fmt, args...) \
+({ \
+ if ( cpufreq_verbose ) \
+ printk(XENLOG_DEBUG "AMD-CPPC: CPU%u " fmt, cpu, ## args); \
+})
+
+/*
+ * Field highest_perf, nominal_perf, lowest_nonlinear_perf, and lowest_perf
+ * contain the values read from CPPC capability MSR. They represent the limits
+ * of managed performance range as well as the dynamic capability, which may
+ * change during processor operation
+ * Field highest_perf represents highest performance, which is the absolute
+ * maximum performance an individual processor may reach, assuming ideal
+ * conditions. This performance level may not be sustainable for long
+ * durations and may only be achievable if other platform components
+ * are in a specific state; for example, it may require other processors be
+ * in an idle state. This would be equivalent to the highest frequencies
+ * supported by the processor.
+ * Field nominal_perf represents maximum sustained performance level of the
+ * processor, assuming ideal operating conditions. All cores/processors are
+ * expected to be able to sustain their nominal performance state
+ * simultaneously.
+ * Field lowest_nonlinear_perf represents Lowest Nonlinear Performance, which
+ * is the lowest performance level at which nonlinear power savings are
+ * achieved. Above this threshold, lower performance levels should be
+ * generally more energy efficient than higher performance levels. So in
+ * traditional terms, this represents the P-state range of performance levels.
+ * Field lowest_perf represents the absolute lowest performance level of the
+ * platform. Selecting it may cause an efficiency penalty but should reduce
+ * the instantaneous power consumption of the processor. So in traditional
+ * terms, this represents the T-state range of performance levels.
+ *
+ * Field max_perf, min_perf, des_perf store the values for CPPC request MSR.
+ * Software passes performance goals through these fields.
+ * Field max_perf conveys the maximum performance level at which the platform
+ * may run. And it may be set to any performance value in the range
+ * [lowest_perf, highest_perf], inclusive.
+ * Field min_perf conveys the minimum performance level at which the platform
+ * may run. And it may be set to any performance value in the range
+ * [lowest_perf, highest_perf], inclusive but must be less than or equal to
+ * max_perf.
+ * Field des_perf conveys performance level Xen governor is requesting. And it
+ * may be set to any performance value in the range [min_perf, max_perf],
+ * inclusive.
+ * Field epp represents energy performance preference, which only has meaning
+ * when active mode is enabled.
+ */
+struct amd_cppc_drv_data
+{
+ const struct xen_processor_cppc *cppc_data;
+ union {
+ uint64_t raw;
+ struct {
+ unsigned int lowest_perf:8;
+ unsigned int lowest_nonlinear_perf:8;
+ unsigned int nominal_perf:8;
+ unsigned int highest_perf:8;
+ unsigned int :32;
+ };
+ } caps;
+ union {
+ uint64_t raw;
+ struct {
+ unsigned int max_perf:8;
+ unsigned int min_perf:8;
+ unsigned int des_perf:8;
+ unsigned int epp:8;
+ unsigned int :32;
+ };
+ } req;
+
+ int err;
+};
+
+static DEFINE_PER_CPU_READ_MOSTLY(struct amd_cppc_drv_data *,
+ amd_cppc_drv_data);
+/*
+ * Core max frequency read from PstateDef as anchor point
+ * for freq-to-perf transition
+ */
+static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, pxfreq_mhz);
+static DEFINE_PER_CPU_READ_MOSTLY(uint8_t, epp_init);
static bool __init amd_cppc_handle_option(const char *s, const char *end)
{
@@ -50,10 +141,335 @@ int __init amd_cppc_cmdline_parse(const char *s, const char *e)
return 0;
}
+/*
+ * If CPPC lowest_freq and nominal_freq registers are exposed then we can
+ * use them to convert perf to freq and vice versa. The conversion is
+ * extrapolated as an linear function passing by the 2 points:
+ * - (Low perf, Low freq)
+ * - (Nominal perf, Nominal freq)
+ * Parameter freq is always in kHz.
+ */
+static int amd_cppc_khz_to_perf(const struct amd_cppc_drv_data *data,
+ unsigned int freq, uint8_t *perf)
+{
+ const struct xen_processor_cppc *cppc_data = data->cppc_data;
+ unsigned int mul, div;
+ int offset = 0, res;
+
+ if ( cppc_data->cpc.lowest_mhz &&
+ data->caps.nominal_perf > data->caps.lowest_perf &&
+ cppc_data->cpc.nominal_mhz > cppc_data->cpc.lowest_mhz )
+ {
+ mul = data->caps.nominal_perf - data->caps.lowest_perf;
+ div = cppc_data->cpc.nominal_mhz - cppc_data->cpc.lowest_mhz;
+
+ /*
+ * We don't need to convert to kHz for computing offset and can
+ * directly use nominal_mhz and lowest_mhz as the division
+ * will remove the frequency unit.
+ */
+ offset = data->caps.nominal_perf -
+ (mul * cppc_data->cpc.nominal_mhz) / div;
+ }
+ else
+ {
+ /* Read Processor Max Speed(MHz) as anchor point */
+ mul = data->caps.highest_perf;
+ div = this_cpu(pxfreq_mhz);
+ if ( !div )
+ return -EOPNOTSUPP;
+ }
+
+ res = offset + (mul * freq) / (div * 1000);
+ if ( res > UINT8_MAX )
+ {
+ printk_once(XENLOG_WARNING
+ "Perf value exceeds maximum value 255: %d\n", res);
+ *perf = UINT8_MAX;
+ return 0;
+ }
+ if ( res <= 0 )
+ {
+ printk_once(XENLOG_WARNING
+ "Perf value smaller than minimum value 0: %d\n", res);
+ return -ERANGE;
+ }
+ *perf = res;
+
+ return 0;
+}
+
+/*
+ * _CPC may define nominal frequecy and lowest frequency, if not, use
+ * Processor Max Speed as anchor point to calculate.
+ * Output freq stores cpc frequency in kHz
+ */
+static int amd_get_cpc_freq(const struct amd_cppc_drv_data *data,
+ unsigned int cpc_mhz, uint8_t perf,
+ unsigned int *freq)
+{
+ unsigned int mul, div, res;
+
+ if ( cpc_mhz )
+ {
+ /* Switch to kHz */
+ *freq = cpc_mhz * 1000;
+ return 0;
+ }
+
+ /* Read Processor Max Speed(MHz) as anchor point */
+ mul = this_cpu(pxfreq_mhz);
+ if ( !mul )
+ return -EOPNOTSUPP;
+ div = data->caps.highest_perf;
+ res = (mul * perf * 1000) / div;
+ if ( unlikely(!res) )
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+/* Output max_freq stores calculated maximum frequency in kHz */
+static int amd_get_max_freq(const struct amd_cppc_drv_data *data,
+ unsigned int *max_freq)
+{
+ unsigned int nom_freq = 0;
+ int res;
+
+ res = amd_get_cpc_freq(data, data->cppc_data->cpc.nominal_mhz,
+ data->caps.nominal_perf, &nom_freq);
+ if ( res )
+ return res;
+
+ *max_freq = (data->caps.highest_perf * nom_freq) / data->caps.nominal_perf;
+
+ return 0;
+}
+
+static int cf_check amd_cppc_cpufreq_verify(struct cpufreq_policy *policy)
+{
+ cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
+ policy->cpuinfo.max_freq);
+
+ return 0;
+}
+
+static void cf_check amd_cppc_write_request_msrs(void *info)
+{
+ const struct amd_cppc_drv_data *data = info;
+
+ wrmsrl(MSR_AMD_CPPC_REQ, data->req.raw);
+}
+
+static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
+ uint8_t des_perf, uint8_t max_perf,
+ uint8_t epp)
+{
+ struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
+ uint64_t prev = data->req.raw;
+
+ data->req.min_perf = min_perf;
+ data->req.max_perf = max_perf;
+ data->req.des_perf = des_perf;
+ data->req.epp = epp;
+
+ if ( prev == data->req.raw )
+ return;
+
+ on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs, data, 1);
+}
+
+static int cf_check amd_cppc_cpufreq_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int cpu = policy->cpu;
+ const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
+ uint8_t des_perf;
+ int res;
+
+ if ( unlikely(!target_freq) )
+ return 0;
+
+ res = amd_cppc_khz_to_perf(data, target_freq, &des_perf);
+ if ( res )
+ return res;
+
+ /*
+ * Having a performance level lower than the lowest nonlinear
+ * performance level, such as, lowest_perf <= perf <= lowest_nonliner_perf,
+ * may actually cause an efficiency penalty, So when deciding the min_perf
+ * value, we prefer lowest nonlinear performance over lowest performance.
+ */
+ amd_cppc_write_request(policy->cpu, data->caps.lowest_nonlinear_perf,
+ des_perf, data->caps.highest_perf,
+ /* Pre-defined BIOS value for passive mode */
+ per_cpu(epp_init, policy->cpu));
+ return 0;
+}
+
+static void cf_check amd_cppc_init_msrs(void *info)
+{
+ struct cpufreq_policy *policy = info;
+ struct amd_cppc_drv_data *data = this_cpu(amd_cppc_drv_data);
+ uint64_t val;
+ unsigned int min_freq = 0, nominal_freq = 0, max_freq;
+
+ /* Package level MSR */
+ rdmsrl(MSR_AMD_CPPC_ENABLE, val);
+ /*
+ * Only when Enable bit is on, the hardware will calculate the processor’s
+ * performance capabilities and initialize the performance level fields in
+ * the CPPC capability registers.
+ */
+ if ( !(val & AMD_CPPC_ENABLE) )
+ {
+ val |= AMD_CPPC_ENABLE;
+ wrmsrl(MSR_AMD_CPPC_ENABLE, val);
+ }
+
+ rdmsrl(MSR_AMD_CPPC_CAP1, data->caps.raw);
+
+ if ( data->caps.highest_perf == 0 || data->caps.lowest_perf == 0 ||
+ data->caps.nominal_perf == 0 || data->caps.lowest_nonlinear_perf == 0 ||
+ data->caps.lowest_perf > data->caps.lowest_nonlinear_perf ||
+ data->caps.lowest_nonlinear_perf > data->caps.nominal_perf ||
+ data->caps.nominal_perf > data->caps.highest_perf )
+ {
+ amd_cppc_err(policy->cpu,
+ "Out of range values: highest(%u), lowest(%u), nominal(%u), lowest_nonlinear(%u)\n",
+ data->caps.highest_perf, data->caps.lowest_perf,
+ data->caps.nominal_perf, data->caps.lowest_nonlinear_perf);
+ goto err;
+ }
+
+ amd_process_freq(&cpu_data[policy->cpu],
+ NULL, NULL, &this_cpu(pxfreq_mhz));
+
+ data->err = amd_get_cpc_freq(data, data->cppc_data->cpc.lowest_mhz,
+ data->caps.lowest_perf, &min_freq);
+ if ( data->err )
+ return;
+
+ data->err = amd_get_cpc_freq(data, data->cppc_data->cpc.nominal_mhz,
+ data->caps.nominal_perf, &nominal_freq);
+ if ( data->err )
+ return;
+
+ data->err = amd_get_max_freq(data, &max_freq);
+ if ( data->err )
+ return;
+
+ if ( min_freq > nominal_freq || nominal_freq > max_freq )
+ {
+ amd_cppc_err(policy->cpu,
+ "min(%u), or max(%u), or nominal(%u) freq value is incorrect\n",
+ min_freq, max_freq, nominal_freq);
+ goto err;
+ }
+
+ policy->min = min_freq;
+ policy->max = max_freq;
+
+ policy->cpuinfo.min_freq = min_freq;
+ policy->cpuinfo.max_freq = max_freq;
+ policy->cpuinfo.perf_freq = nominal_freq;
+ /*
+ * Set after policy->cpuinfo.perf_freq, as we are taking
+ * APERF/MPERF average frequency as current frequency.
+ */
+ policy->cur = cpufreq_driver_getavg(policy->cpu, GOV_GETAVG);
+
+ /* Store pre-defined BIOS value for passive mode */
+ rdmsrl(MSR_AMD_CPPC_REQ, val);
+ this_cpu(epp_init) = MASK_EXTR(val, AMD_CPPC_EPP_MASK);
+
+ return;
+
+ err:
+ /*
+ * No fallback shceme is available here, see more explanation at call
+ * site in amd_cppc_cpufreq_cpu_init().
+ */
+ data->err = -EINVAL;
+}
+
+/*
+ * AMD CPPC driver is different than legacy ACPI hardware P-State,
+ * which has a finer grain frequency range between the highest and lowest
+ * frequency. And boost frequency is actually the frequency which is mapped on
+ * highest performance ratio. The legacy P0 frequency is actually mapped on
+ * nominal performance ratio.
+ */
+static void amd_cppc_boost_init(struct cpufreq_policy *policy,
+ const struct amd_cppc_drv_data *data)
+{
+ if ( data->caps.highest_perf <= data->caps.nominal_perf )
+ return;
+
+ policy->turbo = CPUFREQ_TURBO_ENABLED;
+}
+
+static int cf_check amd_cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+ XVFREE(per_cpu(amd_cppc_drv_data, policy->cpu));
+
+ return 0;
+}
+
+static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+ unsigned int cpu = policy->cpu;
+ struct amd_cppc_drv_data *data;
+
+ data = xvzalloc(struct amd_cppc_drv_data);
+ if ( !data )
+ return -ENOMEM;
+
+ data->cppc_data = &processor_pminfo[cpu]->cppc_data;
+
+ per_cpu(amd_cppc_drv_data, cpu) = data;
+
+ on_selected_cpus(cpumask_of(cpu), amd_cppc_init_msrs, policy, 1);
+
+ /*
+ * The enable bit is sticky, as we need to enable it at the very first
+ * begining, before CPPC capability values sanity check.
+ * If error path is taken effective, not only amd-cppc cpufreq core fails
+ * to initialize, but also we could not fall back to legacy P-states
+ * driver, irrespective of the command line specifying a fallback option.
+ */
+ if ( data->err )
+ {
+ amd_cppc_err(cpu, "Could not initialize cpufreq core in CPPC mode\n");
+ amd_cppc_cpufreq_cpu_exit(policy);
+ return data->err;
+ }
+
+ policy->governor = cpufreq_opt_governor ? : CPUFREQ_DEFAULT_GOVERNOR;
+
+ amd_cppc_boost_init(policy, data);
+
+ amd_cppc_verbose(policy->cpu,
+ "CPU initialized with amd-cppc passive mode\n");
+
+ return 0;
+}
+
+static const struct cpufreq_driver __initconst_cf_clobber
+amd_cppc_cpufreq_driver =
+{
+ .name = XEN_AMD_CPPC_DRIVER_NAME,
+ .verify = amd_cppc_cpufreq_verify,
+ .target = amd_cppc_cpufreq_target,
+ .init = amd_cppc_cpufreq_cpu_init,
+ .exit = amd_cppc_cpufreq_cpu_exit,
+};
+
int __init amd_cppc_register_driver(void)
{
if ( !cpu_has_cppc )
return -ENODEV;
- return -EOPNOTSUPP;
+ return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
}
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index eb428f284e..1b9af1270c 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -613,10 +613,10 @@ static unsigned int attr_const amd_parse_freq(unsigned int family,
return freq;
}
-static void amd_process_freq(const struct cpuinfo_x86 *c,
- unsigned int *low_mhz,
- unsigned int *nom_mhz,
- unsigned int *hi_mhz)
+void amd_process_freq(const struct cpuinfo_x86 *c,
+ unsigned int *low_mhz,
+ unsigned int *nom_mhz,
+ unsigned int *hi_mhz)
{
unsigned int idx = 0, h;
uint64_t hi, lo, val;
diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
index 9c9599a622..72df42a6f6 100644
--- a/xen/arch/x86/include/asm/amd.h
+++ b/xen/arch/x86/include/asm/amd.h
@@ -173,5 +173,7 @@ extern bool amd_virt_spec_ctrl;
bool amd_setup_legacy_ssbd(void);
void amd_set_legacy_ssbd(bool enable);
void amd_set_cpuid_user_dis(bool enable);
+void amd_process_freq(const struct cpuinfo_x86 *c, unsigned int *low_mhz,
+ unsigned int *nom_mhz, unsigned int *hi_mhz);
#endif /* __AMD_H__ */
diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
index 428d993ee8..6abf154887 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -241,6 +241,12 @@
#define MSR_AMD_CSTATE_CFG 0xc0010296U
+#define MSR_AMD_CPPC_CAP1 0xc00102b0U
+#define MSR_AMD_CPPC_ENABLE 0xc00102b1U
+#define AMD_CPPC_ENABLE (_AC(1, ULL) << 0)
+#define MSR_AMD_CPPC_REQ 0xc00102b3U
+#define AMD_CPPC_EPP_MASK (_AC(0xff, ULL) << 24)
+
/*
* Legacy MSR constants in need of cleanup. No new MSRs below this comment.
*/
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index aafa7fcf2b..aa29a5401c 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -453,6 +453,7 @@ struct xen_set_cppc_para {
uint32_t activity_window;
};
+#define XEN_AMD_CPPC_DRIVER_NAME "amd-cppc"
#define XEN_HWP_DRIVER_NAME "hwp"
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 09/13] xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (7 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 08/13] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 15:19 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 10/13] xen/cpufreq: get performance policy from governor set via xenpm Penny Zheng
` (3 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Andrew Cooper, Anthony PERARD,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
amd-cppc has 2 operation modes: autonomous (active) mode and
non-autonomous (passive) mode.
In active mode, we don't need Xen governor to calculate and tune the cpu
frequency, while hardware built-in CPPC power algorithm will calculate the
runtime workload and adjust cores frequency automatically according to the
power supply, thermal, core voltage and some other hardware conditions.
In active mode, CPPC ignores requests done in the desired performance field,
and takes into account only the values set to the minimum performance, maximum
performance, and energy performance preference registers.
A new field EPP (energy performance preference), in CPPC request register, is
introduced. It will be used in the CCLK DPM controller to drive the frequency
that a core is going to operate during short periods of activity, called
minimum active frequency, It could contatin a range of values from 0 to 0xff.
An EPP of zero sets the min active frequency to maximum frequency, while
an EPP of 0xff sets the min active frequency to approxiately Idle frequency.
We implement a new AMD CPU frequency driver `amd-cppc-epp` for active mode.
It requires `active` tag in Xen cmdline for users to explicitly select active
mode.
In driver `active-cppc-epp`, ->setpolicy() is hooked, not the ->target(), as
it does not depend on xen governor to do performance tuning.
We also introduce a new field "policy" (CPUFREQ_POLICY_xxx) to represent
performance policy. Right now, it supports three values:
CPUFREQ_POLICY_PERFORMANCE as maximum performance, CPUFREQ_POLICY_POWERSAVE
as the least power consumption, and CPUFREQ_POLICY_ONDEMAND as no preference,
just corresponding to "performance", "powersave" and "ondemand" Xen governor,
which benefit users from re-using "governor" in Xen cmdline to deliver
which performance policy they want to apply.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Remove redundant epp_mode
- Remove pointless initializer
- Define sole caller read_epp_init_once and epp_init value to read
pre-defined BIOS epp value only once
- Combine the commit "xen/cpufreq: introduce policy type when
cpufreq_driver->setpolicy exists"
---
v2 -> v3:
- Combined with commit "x86/cpufreq: add "cpufreq=amd-cppc,active" para"
- Refactor doc about "active mode"
- Change opt_cpufreq_active to opt_active_mode
- Let caller pass epp_init when unspecified to allow the function parameter
to be of uint8_t
- Make epp_init per-cpu value
---
v3 -> v4:
- doc refinement
- use MASK_EXTR() to get epp value
- fix indentation
- replace if-else() with switch()
- combine successive comments and do refinement
- no need to introduce amd_cppc_epp_update_limit() as a wrapper
- rename cpufreq_parse_policy() with cpufreq_policy_from_governor()
- no need to use case-insensitive comparison
---
v4 -> v5:
- refine doc to state what the default is for "active" sub-option and it's of
boolean nature
- excess blank after << for AMD_CPPC_EPP_MASK
- set max_perf with lowest_perf to get utmost powersave
- refine commit message to include description about relation between "policy"
and "governor"
---
v5 -> v6:
- expand comment for "epp" field
- let min_perf set with lowest_nonliner_perf, not lowest_perf, to constrain
performance tuning in P-states range
- refactor doc and comments
- blank lines between non-fall-through case blocks
- introduce and add entry for "CPUFREQ_POLICY_ONDEMAND"
---
v6 -> v7
- make opt_active_mode __initdata when NDEBUG=y
- add assertion check for must-zero des_perf in active mode
- use the local variable max_perf and min_perf
- read_epp_init() doesn't worth a separate function
---
docs/misc/xen-command-line.pandoc | 9 +-
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 126 ++++++++++++++++++++++++++-
xen/drivers/cpufreq/utility.c | 15 ++++
xen/include/acpi/cpufreq/cpufreq.h | 18 ++++
xen/include/public/sysctl.h | 1 +
5 files changed, 164 insertions(+), 5 deletions(-)
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 3916cc81f6..c029a6e053 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -515,7 +515,7 @@ If set, force use of the performance counters for oprofile, rather than detectin
available support.
### cpufreq
-> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[verbose]]`
+> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[active][,verbose]]`
> Default: `xen`
@@ -537,6 +537,13 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
* `amd-cppc` selects ACPI Collaborative Performance and Power Control (CPPC)
on supported AMD hardware to provide finer grained frequency control
mechanism. The default is disabled.
+* `active` is a boolean to enable amd-cppc driver in active(autonomous) mode.
+ In this mode, users don't rely on Xen governor to do performance monitoring
+ and tuning. Hardware built-in CPPC power algorithm will calculate the runtime
+ workload and adjust cores frequency automatically according to the power
+ supply, thermal, core voltage and some other hardware conditions.
+ The default is disabled, and the option only applies when `amd-cppc` is
+ enabled.
There is also support for `;`-separated fallback options:
`cpufreq=hwp;xen,verbose`. This first tries `hwp` and falls back to `xen` if
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index df9ed52237..1b4753a062 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -67,9 +67,14 @@
* max_perf.
* Field des_perf conveys performance level Xen governor is requesting. And it
* may be set to any performance value in the range [min_perf, max_perf],
- * inclusive.
+ * inclusive. In active mode, des_perf must be zero.
* Field epp represents energy performance preference, which only has meaning
- * when active mode is enabled.
+ * when active mode is enabled. The EPP is used in the CCLK DPM controller
+ * to drive the frequency that a core is going to operate during short periods
+ * of activity, called minimum active frequency, It could contatin a range of
+ * values from 0 to 0xff. An EPP of zero sets the min active frequency to
+ * maximum frequency, while an EPP of 0xff sets the min active frequency to
+ * approxiately Idle frequency.
*/
struct amd_cppc_drv_data
{
@@ -106,6 +111,12 @@ static DEFINE_PER_CPU_READ_MOSTLY(struct amd_cppc_drv_data *,
*/
static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, pxfreq_mhz);
static DEFINE_PER_CPU_READ_MOSTLY(uint8_t, epp_init);
+#ifndef NDEBUG
+static bool __ro_after_init opt_active_mode;
+#else
+static bool __initdata opt_active_mode;
+#endif
+
static bool __init amd_cppc_handle_option(const char *s, const char *end)
{
@@ -118,6 +129,13 @@ static bool __init amd_cppc_handle_option(const char *s, const char *end)
return true;
}
+ ret = parse_boolean("active", s, end);
+ if ( ret >= 0 )
+ {
+ opt_active_mode = ret;
+ return true;
+ }
+
return false;
}
@@ -270,6 +288,10 @@ static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
data->req.min_perf = min_perf;
data->req.max_perf = max_perf;
+#ifndef NDEBUG
+ if ( opt_active_mode )
+ ASSERT(!des_perf);
+#endif
data->req.des_perf = des_perf;
data->req.epp = epp;
@@ -417,7 +439,7 @@ static int cf_check amd_cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
return 0;
}
-static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+static int amd_cppc_cpufreq_init_perf(struct cpufreq_policy *policy)
{
unsigned int cpu = policy->cpu;
struct amd_cppc_drv_data *data;
@@ -450,12 +472,91 @@ static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
amd_cppc_boost_init(policy, data);
+ return 0;
+}
+
+static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+ int ret;
+
+ ret = amd_cppc_cpufreq_init_perf(policy);
+ if ( ret )
+ return ret;
+
amd_cppc_verbose(policy->cpu,
"CPU initialized with amd-cppc passive mode\n");
return 0;
}
+static int cf_check amd_cppc_epp_cpu_init(struct cpufreq_policy *policy)
+{
+ int ret;
+
+ ret = amd_cppc_cpufreq_init_perf(policy);
+ if ( ret )
+ return ret;
+
+ policy->policy = cpufreq_policy_from_governor(policy->governor);
+
+ amd_cppc_verbose(policy->cpu,
+ "CPU initialized with amd-cppc active mode\n");
+
+ return 0;
+}
+
+static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
+{
+ const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
+ policy->cpu);
+ uint8_t max_perf, min_perf, epp;
+
+ /*
+ * On default, set min_perf with lowest_nonlinear_perf, and max_perf
+ * with the highest, to ensure performance scaling in P-states range.
+ */
+ max_perf = data->caps.highest_perf;
+ min_perf = data->caps.lowest_nonlinear_perf;
+
+ /*
+ * In policy CPUFREQ_POLICY_PERFORMANCE, increase min_perf to
+ * highest_perf to achieve ultmost performance.
+ * In policy CPUFREQ_POLICY_POWERSAVE, decrease max_perf to
+ * lowest_nonlinear_perf to achieve ultmost power saving.
+ */
+ switch ( policy->policy )
+ {
+ case CPUFREQ_POLICY_PERFORMANCE:
+ /* Force the epp value to be zero for performance policy */
+ epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
+ min_perf = max_perf;
+ break;
+
+ case CPUFREQ_POLICY_POWERSAVE:
+ /* Force the epp value to be 0xff for powersave policy */
+ epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
+ max_perf = min_perf;
+ break;
+
+ case CPUFREQ_POLICY_ONDEMAND:
+ /*
+ * Set epp with medium value to show no preference over performance
+ * or powersave
+ */
+ epp = CPPC_ENERGY_PERF_BALANCE;
+ break;
+
+ default:
+ epp = per_cpu(epp_init, policy->cpu);
+ break;
+ }
+
+ amd_cppc_write_request(policy->cpu, min_perf,
+ 0 /* no des_perf in active mode */,
+ max_perf, epp);
+ return 0;
+}
+
static const struct cpufreq_driver __initconst_cf_clobber
amd_cppc_cpufreq_driver =
{
@@ -466,10 +567,27 @@ amd_cppc_cpufreq_driver =
.exit = amd_cppc_cpufreq_cpu_exit,
};
+static const struct cpufreq_driver __initconst_cf_clobber
+amd_cppc_epp_driver =
+{
+ .name = XEN_AMD_CPPC_EPP_DRIVER_NAME,
+ .verify = amd_cppc_cpufreq_verify,
+ .setpolicy = amd_cppc_epp_set_policy,
+ .init = amd_cppc_epp_cpu_init,
+ .exit = amd_cppc_cpufreq_cpu_exit,
+};
+
int __init amd_cppc_register_driver(void)
{
+ int ret;
+
if ( !cpu_has_cppc )
return -ENODEV;
- return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+ if ( opt_active_mode )
+ ret = cpufreq_register_driver(&amd_cppc_epp_driver);
+ else
+ ret = cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+
+ return ret;
}
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index 987c3b5929..e2cc9ff2af 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -250,6 +250,7 @@ int __cpufreq_set_policy(struct cpufreq_policy *data,
data->min = policy->min;
data->max = policy->max;
data->limits = policy->limits;
+ data->policy = policy->policy;
if (cpufreq_driver.setpolicy)
return alternative_call(cpufreq_driver.setpolicy, data);
@@ -281,3 +282,17 @@ int __cpufreq_set_policy(struct cpufreq_policy *data,
return __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
}
+
+unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov)
+{
+ if ( !strncmp(gov->name, "performance", CPUFREQ_NAME_LEN) )
+ return CPUFREQ_POLICY_PERFORMANCE;
+
+ if ( !strncmp(gov->name, "powersave", CPUFREQ_NAME_LEN) )
+ return CPUFREQ_POLICY_POWERSAVE;
+
+ if ( !strncmp(gov->name, "ondemand", CPUFREQ_NAME_LEN) )
+ return CPUFREQ_POLICY_ONDEMAND;
+
+ return CPUFREQ_POLICY_UNKNOWN;
+}
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 5d4881eea8..9ef7c4683a 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -81,6 +81,7 @@ struct cpufreq_policy {
int8_t turbo; /* tristate flag: 0 for unsupported
* -1 for disable, 1 for enabled
* See CPUFREQ_TURBO_* below for defines */
+ unsigned int policy; /* CPUFREQ_POLICY_* */
};
DECLARE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_policy);
@@ -131,6 +132,23 @@ extern int cpufreq_register_governor(struct cpufreq_governor *governor);
extern struct cpufreq_governor *__find_governor(const char *governor);
#define CPUFREQ_DEFAULT_GOVERNOR &cpufreq_gov_dbs
+/*
+ * Performance Policy
+ * If cpufreq_driver->target() exists, the ->governor decides what frequency
+ * within the limits is used. If cpufreq_driver->setpolicy() exists, these
+ * following policies are available:
+ * CPUFREQ_POLICY_PERFORMANCE represents maximum performance
+ * CPUFREQ_POLICY_POWERSAVE represents least power consumption
+ * CPUFREQ_POLICY_ONDEMAND represents no preference over performance or
+ * powersave
+ */
+#define CPUFREQ_POLICY_UNKNOWN 0
+#define CPUFREQ_POLICY_POWERSAVE 1
+#define CPUFREQ_POLICY_PERFORMANCE 2
+#define CPUFREQ_POLICY_ONDEMAND 3
+
+unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov);
+
/* pass a target to the cpufreq driver */
extern int __cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int target_freq,
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index aa29a5401c..eb3a23b038 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -454,6 +454,7 @@ struct xen_set_cppc_para {
};
#define XEN_AMD_CPPC_DRIVER_NAME "amd-cppc"
+#define XEN_AMD_CPPC_EPP_DRIVER_NAME "amd-cppc-epp"
#define XEN_HWP_DRIVER_NAME "hwp"
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 10/13] xen/cpufreq: get performance policy from governor set via xenpm
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (8 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 09/13] xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 15:23 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para Penny Zheng
` (2 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Jan Beulich
Even if Xen governor is not used in amd-cppc active mode, we could
somehow deduce which performance policy (CPUFREQ_POLICY_xxx) user wants to
apply through which governor they choose, such as:
If user chooses performance governor, they want maximum performance, then
the policy shall be CPUFREQ_POLICY_PERFORMANCE
If user chooses powersave governor, they want the least power consumption,
then the policy shall be CPUFREQ_POLICY_POWERSAVE
Function cpufreq_policy_from_governor() is responsible for above transition,
and it shall be also effective when users setting new governor through xenpm.
userspace are forbidden choices, and if users specify such options, we shall
not only give warning message to suggest using "xenpm set-cpufreq-cppc", but
also error out.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v4 -> v5:
- new commit
---
v5 -> v6:
- refactor warning message
---
v6 -> v7:
- move policy->policy set where it firstly gets introduced
- refactor commit message
---
xen/drivers/acpi/pm-op.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 427656c48c..6991616c1d 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -206,6 +206,14 @@ static int set_cpufreq_gov(struct xen_sysctl_pm_op *op)
if ( new_policy.governor == NULL )
return -EINVAL;
+ new_policy.policy = cpufreq_policy_from_governor(new_policy.governor);
+ if ( new_policy.policy == CPUFREQ_POLICY_UNKNOWN )
+ {
+ printk("Failed to get performance policy from %s, Try \"xenpm set-cpufreq-cppc\"\n",
+ new_policy.governor->name);
+ return -EINVAL;
+ }
+
return __cpufreq_set_policy(old_policy, &new_policy);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (9 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 10/13] xen/cpufreq: get performance policy from governor set via xenpm Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 15:36 ` Jan Beulich
2025-08-27 15:22 ` Anthony PERARD
2025-08-22 10:52 ` [PATCH v7 12/13] xen/cpufreq: bypass governor-related para for amd-cppc-epp Penny Zheng
2025-08-22 10:52 ` [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver Penny Zheng
12 siblings, 2 replies; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Juergen Gross,
Andrew Cooper, Michal Orzel, Jan Beulich, Julien Grall,
Roger Pau Monné, Stefano Stabellini
We extract cppc info from "struct xen_get_cpufreq_para", where it acts as
a member of union, and share the space with governor info.
However, it may fail in amd-cppc passive mode, in which governor info and
CPPC info could co-exist, and both need to be printed together via xenpm tool.
If we tried to still put it in "struct xen_get_cpufreq_para" (e.g. just move
out of union), "struct xen_get_cpufreq_para" will enlarge too much to further
make xen_sysctl.u exceed 128 bytes.
So we introduce a new sub-field GET_CPUFREQ_CPPC to dedicatedly acquire
CPPC-related para, and make get-cpufreq-para invoke GET_CPUFREQ_CPPC
if available.
New helpers print_cppc_para() and get_cpufreq_cppc() are introduced to
extract CPPC-related parameters process from cpufreq para.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v4 -> v5:
- new commit
---
v5 -> v6:
- remove the changes for get-cpufreq-para
---
v6 -> v7:
- make get-cpufreq-para invoke GET_CPUFREQ_CPPC
---
tools/include/xenctrl.h | 3 +-
tools/libs/ctrl/xc_pm.c | 28 ++++++++++++-
tools/misc/xenpm.c | 78 ++++++++++++++++++++++++-------------
xen/drivers/acpi/pm-op.c | 19 +++++++--
xen/include/public/sysctl.h | 3 +-
5 files changed, 98 insertions(+), 33 deletions(-)
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 965d3b585a..e5103453a9 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1938,7 +1938,6 @@ struct xc_get_cpufreq_para {
xc_ondemand_t ondemand;
} u;
} s;
- xc_cppc_para_t cppc_para;
} u;
int32_t turbo_enabled;
@@ -1953,6 +1952,8 @@ int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
int ctrl_type, int ctrl_value);
int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
xc_set_cppc_para_t *set_cppc);
+int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
+ xc_cppc_para_t *cppc_para);
int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq);
int xc_set_sched_opt_smt(xc_interface *xch, uint32_t value);
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index 6fda973f1f..446ac0b911 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -288,7 +288,6 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
CHK_FIELD(s.scaling_min_freq);
CHK_FIELD(s.u.userspace);
CHK_FIELD(s.u.ondemand);
- CHK_FIELD(cppc_para);
#undef CHK_FIELD
@@ -366,6 +365,33 @@ int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
return ret;
}
+int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
+ xc_cppc_para_t *cppc_para)
+{
+ int ret;
+ struct xen_sysctl sysctl = {};
+ struct xen_get_cppc_para *sys_cppc_para = &sysctl.u.pm_op.u.get_cppc;
+
+ if ( !xch || !cppc_para )
+ {
+ errno = EINVAL;
+ return -1;
+ }
+
+ sysctl.cmd = XEN_SYSCTL_pm_op;
+ sysctl.u.pm_op.cmd = GET_CPUFREQ_CPPC;
+ sysctl.u.pm_op.cpuid = cpuid;
+
+ ret = xc_sysctl(xch, &sysctl);
+ if ( ret )
+ return ret;
+
+ BUILD_BUG_ON(sizeof(*cppc_para) != sizeof(*sys_cppc_para));
+ memcpy(cppc_para, sys_cppc_para, sizeof(*sys_cppc_para));
+
+ return ret;
+}
+
int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq)
{
int ret = 0;
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 6b054b10a4..8fc1d7cc65 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -801,6 +801,34 @@ static unsigned int calculate_activity_window(const xc_cppc_para_t *cppc,
return mantissa * multiplier;
}
+/* print out parameters about cpu cppc */
+static void print_cppc_para(unsigned int cpuid,
+ const xc_cppc_para_t *cppc)
+{
+ printf("cppc variables :\n");
+ printf(" hardware limits : lowest [%"PRIu32"] lowest nonlinear [%"PRIu32"]\n",
+ cppc->lowest, cppc->lowest_nonlinear);
+ printf(" : nominal [%"PRIu32"] highest [%"PRIu32"]\n",
+ cppc->nominal, cppc->highest);
+ printf(" configured limits : min [%"PRIu32"] max [%"PRIu32"] energy perf [%"PRIu32"]\n",
+ cppc->minimum, cppc->maximum, cppc->energy_perf);
+
+ if ( cppc->features & XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW )
+ {
+ unsigned int activity_window;
+ const char *units;
+
+ activity_window = calculate_activity_window(cppc, &units);
+ printf(" : activity_window [%"PRIu32" %s]\n",
+ activity_window, units);
+ }
+
+ printf(" : desired [%"PRIu32"%s]\n",
+ cppc->desired,
+ cppc->desired ? "" : " hw autonomous");
+ printf("\n");
+}
+
/* print out parameters about cpu frequency */
static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
{
@@ -826,33 +854,7 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
printf("scaling_driver : %s\n", p_cpufreq->scaling_driver);
- if ( hwp )
- {
- const xc_cppc_para_t *cppc = &p_cpufreq->u.cppc_para;
-
- printf("cppc variables :\n");
- printf(" hardware limits : lowest [%"PRIu32"] lowest nonlinear [%"PRIu32"]\n",
- cppc->lowest, cppc->lowest_nonlinear);
- printf(" : nominal [%"PRIu32"] highest [%"PRIu32"]\n",
- cppc->nominal, cppc->highest);
- printf(" configured limits : min [%"PRIu32"] max [%"PRIu32"] energy perf [%"PRIu32"]\n",
- cppc->minimum, cppc->maximum, cppc->energy_perf);
-
- if ( cppc->features & XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW )
- {
- unsigned int activity_window;
- const char *units;
-
- activity_window = calculate_activity_window(cppc, &units);
- printf(" : activity_window [%"PRIu32" %s]\n",
- activity_window, units);
- }
-
- printf(" : desired [%"PRIu32"%s]\n",
- cppc->desired,
- cppc->desired ? "" : " hw autonomous");
- }
- else
+ if ( !hwp )
{
if ( p_cpufreq->gov_num )
printf("scaling_avail_gov : %s\n",
@@ -898,6 +900,23 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
printf("\n");
}
+/* show cpu cppc parameters information on CPU cpuid */
+static int show_cppc_para_by_cpuid(xc_interface *xc_handle, unsigned int cpuid)
+{
+ int ret;
+ xc_cppc_para_t cppc_para;
+
+ ret = xc_get_cppc_para(xc_handle, cpuid, &cppc_para);
+ if ( !ret )
+ print_cppc_para(cpuid, &cppc_para);
+ else if ( errno == ENODEV )
+ ret = 0; /* Ignore unsupported platform */
+ else
+ fprintf(stderr, "[CPU%u] failed to get cppc parameter\n", cpuid);
+
+ return ret;
+}
+
/* show cpu frequency parameters information on CPU cpuid */
static int show_cpufreq_para_by_cpuid(xc_interface *xc_handle, int cpuid)
{
@@ -957,7 +976,12 @@ static int show_cpufreq_para_by_cpuid(xc_interface *xc_handle, int cpuid)
} while ( ret && errno == EAGAIN );
if ( ret == 0 )
+ {
print_cpufreq_para(cpuid, p_cpufreq);
+
+ /* Show CPPC parameters if available */
+ ret = show_cppc_para_by_cpuid(xc_handle, cpuid);
+ }
else if ( errno == ENODEV )
{
ret = -ENODEV;
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 6991616c1d..bf4638927f 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -77,6 +77,17 @@ static int read_scaling_available_governors(char *scaling_available_governors,
return 0;
}
+static int get_cpufreq_cppc(unsigned int cpu,
+ struct xen_get_cppc_para *cppc_para)
+{
+ int ret = -ENODEV;
+
+ if ( hwp_active() )
+ ret = get_hwp_para(cpu, cppc_para);
+
+ return ret;
+}
+
static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
{
uint32_t ret = 0;
@@ -142,9 +153,7 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
else
strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
- if ( hwp_active() )
- ret = get_hwp_para(policy->cpu, &op->u.get_para.u.cppc_para);
- else
+ if ( !hwp_active() )
{
if ( !(scaling_available_governors =
xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
@@ -381,6 +390,10 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
ret = set_cpufreq_para(op);
break;
+ case GET_CPUFREQ_CPPC:
+ ret = get_cpufreq_cppc(op->cpuid, &op->u.get_cppc);
+ break;
+
case SET_CPUFREQ_CPPC:
ret = set_cpufreq_cppc(op);
break;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index eb3a23b038..3f654f98ab 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -492,7 +492,6 @@ struct xen_get_cpufreq_para {
struct xen_ondemand ondemand;
} u;
} s;
- struct xen_get_cppc_para cppc_para;
} u;
int32_t turbo_enabled;
@@ -523,6 +522,7 @@ struct xen_sysctl_pm_op {
#define SET_CPUFREQ_PARA (CPUFREQ_PARA | 0x03)
#define GET_CPUFREQ_AVGFREQ (CPUFREQ_PARA | 0x04)
#define SET_CPUFREQ_CPPC (CPUFREQ_PARA | 0x05)
+ #define GET_CPUFREQ_CPPC (CPUFREQ_PARA | 0x06)
/* set/reset scheduler power saving option */
#define XEN_SYSCTL_pm_op_set_sched_opt_smt 0x21
@@ -547,6 +547,7 @@ struct xen_sysctl_pm_op {
uint32_t cpuid;
union {
struct xen_get_cpufreq_para get_para;
+ struct xen_get_cppc_para get_cppc;
struct xen_set_cpufreq_gov set_gov;
struct xen_set_cpufreq_para set_para;
struct xen_set_cppc_para set_cppc;
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 12/13] xen/cpufreq: bypass governor-related para for amd-cppc-epp
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (10 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 15:44 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver Penny Zheng
12 siblings, 1 reply; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Anthony PERARD, Jan Beulich
HWP and amd-cppc-epp are both governor-less driver, so we introduce
"is_goverless" flag and cpufreq_is_governorless() to help bypass
governor-related info on dealing with cpufreq para.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v3 -> v4:
- Include validation check fix here
---
v4 -> v5:
- validation check has beem moved to where XEN_PROCESSOR_PM_CPPC and
XEN_CPPC_INIT have been firstly introduced
- adding "cpufreq_driver.setpolicy == NULL" check to exclude governor-related
para for amd-cppc-epp driver in get/set_cpufreq_para()
---
v5 -> v6:
- add helper cpufreq_is_governorless() to tell whether cpufreq driver is
governor-less
---
v6 -> v7:
- change "hw_auto" to "is_goverless"
- complement comment
- wrap around with PM_OP to avoid violating Misra rule 2.1
---
tools/misc/xenpm.c | 10 +++++++---
xen/drivers/acpi/pm-op.c | 4 ++--
xen/drivers/cpufreq/cpufreq.c | 14 ++++++++++++++
xen/include/acpi/cpufreq/cpufreq.h | 2 ++
4 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 8fc1d7cc65..02981c4583 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -832,9 +832,13 @@ static void print_cppc_para(unsigned int cpuid,
/* print out parameters about cpu frequency */
static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
{
- bool hwp = strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) == 0;
+ bool is_goverless = false;
int i;
+ if ( !strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) ||
+ !strcmp(p_cpufreq->scaling_driver, XEN_AMD_CPPC_EPP_DRIVER_NAME) )
+ is_goverless = true;
+
printf("cpu id : %d\n", cpuid);
printf("affected_cpus :");
@@ -842,7 +846,7 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
printf(" %d", p_cpufreq->affected_cpus[i]);
printf("\n");
- if ( hwp )
+ if ( is_goverless )
printf("cpuinfo frequency : base [%"PRIu32"] max [%"PRIu32"]\n",
p_cpufreq->cpuinfo_min_freq,
p_cpufreq->cpuinfo_max_freq);
@@ -854,7 +858,7 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
printf("scaling_driver : %s\n", p_cpufreq->scaling_driver);
- if ( !hwp )
+ if ( !is_goverless )
{
if ( p_cpufreq->gov_num )
printf("scaling_avail_gov : %s\n",
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index bf4638927f..2b4c8070aa 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -153,7 +153,7 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
else
strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
- if ( !hwp_active() )
+ if ( !cpufreq_is_governorless(op->cpuid) )
{
if ( !(scaling_available_governors =
xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
@@ -236,7 +236,7 @@ static int set_cpufreq_para(struct xen_sysctl_pm_op *op)
if ( !policy || !policy->governor )
return -EINVAL;
- if ( hwp_active() )
+ if ( cpufreq_is_governorless(op->cpuid) )
return -EOPNOTSUPP;
switch( op->u.set_para.ctrl_type )
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 41e0da3b77..871fe33681 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -956,3 +956,17 @@ int __init cpufreq_register_driver(const struct cpufreq_driver *driver_data)
return 0;
}
+
+#ifdef CONFIG_PM_OP
+/*
+ * Governor-less cpufreq driver indicates the driver doesn't rely on Xen
+ * governor to do performance tuning, mostly it has hardware built-in
+ * algorithm to calculate runtime workload and adjust cores frequency
+ * automatically. like Intel HWP, or CPPC in AMD.
+ */
+bool cpufreq_is_governorless(unsigned int cpuid)
+{
+ return processor_pminfo[cpuid]->init && (hwp_active() ||
+ cpufreq_driver.setpolicy);
+}
+#endif /* CONFIG_PM_OP */
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 9ef7c4683a..babc4a1a2c 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -294,4 +294,6 @@ int acpi_cpufreq_register(void);
int amd_cppc_cmdline_parse(const char *s, const char *e);
int amd_cppc_register_driver(void);
+bool cpufreq_is_governorless(unsigned int cpuid);
+
#endif /* __XEN_CPUFREQ_PM_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (11 preceding siblings ...)
2025-08-22 10:52 ` [PATCH v7 12/13] xen/cpufreq: bypass governor-related para for amd-cppc-epp Penny Zheng
@ 2025-08-22 10:52 ` Penny Zheng
2025-08-25 16:02 ` Jan Beulich
2025-08-27 15:58 ` Anthony PERARD
12 siblings, 2 replies; 43+ messages in thread
From: Penny Zheng @ 2025-08-22 10:52 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Andrew Cooper,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
Introduce helper set_amd_cppc_para() and get_amd_cppc_para() to
SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.
In get_cpufreq_cppc()/set_cpufreq_cppc(), we include
"processor_pminfo[cpuid]->init & XEN_CPPC_INIT" condition check to deal with
cpufreq driver in amd-cppc.
Also, a new field "policy" has also been added in "struct xen_get_cppc_para"
to describe performance policy in active mode. It gets printed with other
cppc paras. Move manifest constants "XEN_CPUFREQ_POLICY_xxx" to public header
to let it be used in user space tools.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Give the variable des_perf an initializer of 0
- Use the strncmp()s directly in the if()
---
v3 -> v4
- refactor comments
- remove double blank lines
- replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
---
v4 -> v5:
- add new field "policy" in "struct xen_cppc_para"
- add new performamce policy XEN_CPUFREQ_POLICY_BALANCE
- drop string comparisons with "processor_pminfo[cpuid]->init & XEN_CPPC_INIT"
and "cpufreq.setpolicy == NULL"
- Blank line ahead of the main "return" of a function
- refactor comments, commit message and title
---
v5 -> v6:
- remove duplicated manifest constants, and just move it to public header
- use "else if" to avoid confusion that it looks as if both paths could be taken
- add check for legitimate perf values
- use "unknown" instead of "none"
- introduce "CPUFREQ_POLICY_END" for array overrun check in user space tools
---
v6 -> v7:
- use ARRAY_SIZE() instead
- ->policy print is avoided in passive mode and print "unknown" in invalid
cases
- let cpufreq_is_governorless() being the variable's initializer
- refactor with the conditional operator to increase readability
- move duplicated defination ahead and use local variable
- avoid using "else-condition" to bring "dead code" in Misra's nomeclature
- move the comment out of public header and into the respective internal
struct field
- wrap set{,get}_amd_cppc_para() with CONFIG_PM_OP
- add symmetry scenario for maximum check
---
tools/misc/xenpm.c | 16 +++
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 181 +++++++++++++++++++++++++++
xen/drivers/acpi/pm-op.c | 10 +-
xen/include/acpi/cpufreq/cpufreq.h | 32 ++---
xen/include/public/sysctl.h | 6 +
5 files changed, 226 insertions(+), 19 deletions(-)
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 02981c4583..eedb745a46 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -38,6 +38,13 @@
static xc_interface *xc_handle;
static unsigned int max_cpu_nr;
+static const char cpufreq_policy_str[][12] = {
+ [CPUFREQ_POLICY_UNKNOWN] = "unknown",
+ [CPUFREQ_POLICY_POWERSAVE] = "powersave",
+ [CPUFREQ_POLICY_PERFORMANCE] = "performance",
+ [CPUFREQ_POLICY_ONDEMAND] = "ondemand",
+};
+
/* help message */
void show_help(void)
{
@@ -826,6 +833,15 @@ static void print_cppc_para(unsigned int cpuid,
printf(" : desired [%"PRIu32"%s]\n",
cppc->desired,
cppc->desired ? "" : " hw autonomous");
+
+ if ( !cppc->desired )
+ {
+ if ( cppc->policy < ARRAY_SIZE(cpufreq_policy_str) )
+ printf(" performance policy : %s\n",
+ cpufreq_policy_str[cppc->policy]);
+ else
+ printf(" performance policy : unknown\n");
+ }
printf("\n");
}
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index 1b4753a062..493550bbb3 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -557,6 +557,187 @@ static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
return 0;
}
+#ifdef CONFIG_PM_OP
+int get_amd_cppc_para(const struct cpufreq_policy *policy,
+ struct xen_get_cppc_para *cppc_para)
+{
+ const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
+ policy->cpu);
+
+ if ( data == NULL )
+ return -ENODATA;
+
+ cppc_para->policy = policy->policy;
+ cppc_para->lowest = data->caps.lowest_perf;
+ cppc_para->lowest_nonlinear = data->caps.lowest_nonlinear_perf;
+ cppc_para->nominal = data->caps.nominal_perf;
+ cppc_para->highest = data->caps.highest_perf;
+ cppc_para->minimum = data->req.min_perf;
+ cppc_para->maximum = data->req.max_perf;
+ cppc_para->desired = data->req.des_perf;
+ cppc_para->energy_perf = data->req.epp;
+
+ return 0;
+}
+
+int set_amd_cppc_para(struct cpufreq_policy *policy,
+ const struct xen_set_cppc_para *set_cppc)
+{
+ unsigned int cpu = policy->cpu;
+ struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
+ uint8_t max_perf, min_perf, des_perf, epp;
+ bool active_mode = cpufreq_is_governorless(cpu);
+
+ if ( data == NULL )
+ return -ENOENT;
+
+ /* Return if there is nothing to do. */
+ if ( set_cppc->set_params == 0 )
+ return 0;
+
+ /* Only allow values if params bit is set. */
+ if ( (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
+ set_cppc->desired) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
+ set_cppc->minimum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
+ set_cppc->maximum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF) &&
+ set_cppc->energy_perf) )
+ return -EINVAL;
+
+ /*
+ * Validate all parameters
+ * Maximum performance may be set to any performance value in the range
+ * [Nonlinear Lowest Performance, Highest Performance], inclusive but must
+ * be set to a value that is larger than or equal to minimum Performance.
+ */
+ if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
+ (set_cppc->maximum > data->caps.highest_perf ||
+ set_cppc->maximum <
+ (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM
+ ? set_cppc->minimum
+ : data->req.min_perf)) )
+ return -EINVAL;
+ /*
+ * Minimum performance may be set to any performance value in the range
+ * [Nonlinear Lowest Performance, Highest Performance], inclusive but must
+ * be set to a value that is less than or equal to Maximum Performance.
+ */
+ if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
+ (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
+ (set_cppc->minimum >
+ (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM
+ ? set_cppc->maximum
+ : data->req.max_perf))) )
+ return -EINVAL;
+ /*
+ * Desired performance may be set to any performance value in the range
+ * [Minimum Performance, Maximum Performance], inclusive.
+ */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
+ {
+ if ( active_mode )
+ return -EOPNOTSUPP;
+
+ if ( (set_cppc->desired >
+ (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM
+ ? set_cppc->maximum
+ : data->req.max_perf)) ||
+ (set_cppc->desired <
+ (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM
+ ? set_cppc->minimum
+ : data->req.min_perf)) )
+ return -EINVAL;
+ }
+ /*
+ * Energy Performance Preference may be set with a range of values
+ * from 0 to 0xFF
+ */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF )
+ {
+ if ( !active_mode )
+ return -EOPNOTSUPP;
+
+ if ( set_cppc->energy_perf > UINT8_MAX )
+ return -EINVAL;
+ }
+
+ /* Activity window not supported in MSR */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW )
+ return -EOPNOTSUPP;
+
+ epp = per_cpu(epp_init, cpu);
+ min_perf = data->caps.lowest_nonlinear_perf;
+ max_perf = data->caps.highest_perf;
+ des_perf = data->req.des_perf;
+ /*
+ * Apply presets:
+ * XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE/PERFORMANCE/ONDEMAND are
+ * only available when CPPC in active mode
+ */
+ switch ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_PRESET_MASK )
+ {
+ case XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE:
+ if ( !active_mode )
+ return -EINVAL;
+ policy->policy = CPUFREQ_POLICY_POWERSAVE;
+ /*
+ * Lower max_perf to nonlinear_lowest to achieve
+ * ultmost power saviongs
+ */
+ max_perf = min_perf;
+ epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
+ break;
+
+ case XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE:
+ if ( !active_mode )
+ return -EINVAL;
+ policy->policy = CPUFREQ_POLICY_PERFORMANCE;
+ /* Increase min_perf to highest to achieve ultmost performance */
+ min_perf = max_perf;
+ epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
+ break;
+
+ case XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND:
+ if ( !active_mode )
+ return -EINVAL;
+ policy->policy = CPUFREQ_POLICY_ONDEMAND;
+ /*
+ * Take medium value to show no preference over
+ * performance or powersave
+ */
+ epp = CPPC_ENERGY_PERF_BALANCE;
+ break;
+
+ case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
+ if ( active_mode )
+ policy->policy = CPUFREQ_POLICY_UNKNOWN;
+ break;
+
+ default:
+ return -EINVAL;
+ }
+
+ /* Further customize presets if needed */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM )
+ min_perf = set_cppc->minimum;
+
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM )
+ max_perf = set_cppc->maximum;
+
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF )
+ epp = set_cppc->energy_perf;
+
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
+ des_perf = set_cppc->desired;
+
+ amd_cppc_write_request(cpu, min_perf, des_perf, max_perf, epp);
+
+ return 0;
+}
+#endif /* CONFIG_PM_OP */
+
static const struct cpufreq_driver __initconst_cf_clobber
amd_cppc_cpufreq_driver =
{
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 2b4c8070aa..195dcb88b5 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -84,6 +84,8 @@ static int get_cpufreq_cppc(unsigned int cpu,
if ( hwp_active() )
ret = get_hwp_para(cpu, cppc_para);
+ else if ( processor_pminfo[cpu]->init & XEN_CPPC_INIT )
+ ret = get_amd_cppc_para(per_cpu(cpufreq_cpu_policy, cpu), cppc_para);
return ret;
}
@@ -317,10 +319,12 @@ static int set_cpufreq_cppc(struct xen_sysctl_pm_op *op)
if ( !policy || !policy->governor )
return -ENOENT;
- if ( !hwp_active() )
- return -EOPNOTSUPP;
+ if ( hwp_active() )
+ return set_hwp_para(policy, &op->u.set_cppc);
+ if ( processor_pminfo[op->cpuid]->init & XEN_CPPC_INIT )
+ return set_amd_cppc_para(policy, &op->u.set_cppc);
- return set_hwp_para(policy, &op->u.set_cppc);
+ return -EOPNOTSUPP;
}
int do_pm_op(struct xen_sysctl_pm_op *op)
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index babc4a1a2c..6ca686c4b2 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -81,7 +81,18 @@ struct cpufreq_policy {
int8_t turbo; /* tristate flag: 0 for unsupported
* -1 for disable, 1 for enabled
* See CPUFREQ_TURBO_* below for defines */
- unsigned int policy; /* CPUFREQ_POLICY_* */
+ unsigned int policy; /* Performance Policy
+ * If cpufreq_driver->target() exists,
+ * the ->governor decides what frequency
+ * within the limits is used.
+ * If cpufreq_driver->setpolicy() exists, these
+ * following policies are available:
+ * CPUFREQ_POLICY_PERFORMANCE represents
+ * maximum performance
+ * CPUFREQ_POLICY_POWERSAVE represents least
+ * power consumption
+ * CPUFREQ_POLICY_ONDEMAND represents no
+ * preference over performance or powersave */
};
DECLARE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_policy);
@@ -132,21 +143,6 @@ extern int cpufreq_register_governor(struct cpufreq_governor *governor);
extern struct cpufreq_governor *__find_governor(const char *governor);
#define CPUFREQ_DEFAULT_GOVERNOR &cpufreq_gov_dbs
-/*
- * Performance Policy
- * If cpufreq_driver->target() exists, the ->governor decides what frequency
- * within the limits is used. If cpufreq_driver->setpolicy() exists, these
- * following policies are available:
- * CPUFREQ_POLICY_PERFORMANCE represents maximum performance
- * CPUFREQ_POLICY_POWERSAVE represents least power consumption
- * CPUFREQ_POLICY_ONDEMAND represents no preference over performance or
- * powersave
- */
-#define CPUFREQ_POLICY_UNKNOWN 0
-#define CPUFREQ_POLICY_POWERSAVE 1
-#define CPUFREQ_POLICY_PERFORMANCE 2
-#define CPUFREQ_POLICY_ONDEMAND 3
-
unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov);
/* pass a target to the cpufreq driver */
@@ -293,6 +289,10 @@ int acpi_cpufreq_register(void);
int amd_cppc_cmdline_parse(const char *s, const char *e);
int amd_cppc_register_driver(void);
+int get_amd_cppc_para(const struct cpufreq_policy *policy,
+ struct xen_get_cppc_para *cppc_para);
+int set_amd_cppc_para(struct cpufreq_policy *policy,
+ const struct xen_set_cppc_para *set_cppc);
bool cpufreq_is_governorless(unsigned int cpuid);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 3f654f98ab..c50fa7bb3c 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -336,8 +336,14 @@ struct xen_ondemand {
uint32_t up_threshold;
};
+#define CPUFREQ_POLICY_UNKNOWN 0
+#define CPUFREQ_POLICY_POWERSAVE 1
+#define CPUFREQ_POLICY_PERFORMANCE 2
+#define CPUFREQ_POLICY_ONDEMAND 3
+
struct xen_get_cppc_para {
/* OUT */
+ uint32_t policy; /* CPUFREQ_POLICY_xxx */
/* activity_window supported if set */
#define XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW (1 << 0)
uint32_t features; /* bit flags for features */
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v7 03/13] tools: fix help info for "xenpm set-cpufreq-cppc"
2025-08-22 10:52 ` [PATCH v7 03/13] tools: fix help info for "xenpm set-cpufreq-cppc" Penny Zheng
@ 2025-08-25 14:30 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 14:30 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Anthony PERARD, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> Change "balance" to "ondemand" in help info for "xenpm set-cpufreq-cppc"
>
> Fixes: 81ce87fc5e36 (xen/cpufreq: rename cppc preset name to "XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND")
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 04/13] xen/cpufreq: add missing default: case for x86 vendor
2025-08-22 10:52 ` [PATCH v7 04/13] xen/cpufreq: add missing default: case for x86 vendor Penny Zheng
@ 2025-08-25 14:43 ` Jan Beulich
2025-08-26 4:23 ` Penny, Zheng
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 14:43 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Andrew Cooper, Roger Pau Monné, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> Since we are missing default case for x86 vendor, there is possibility (i.e.
> new vendor introduced) that we will return successfully while missing the
> whole cpufreq driver initialization process.
> Move "ret = -ENOENTRY" forward to cover default case for x86 vendor, and
> add error log
Requested-by: Jan Beulich <jbeulich@suse.com>
(or Suggested-by: if you like that better)
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 05/13] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-08-22 10:52 ` [PATCH v7 05/13] xen/cpufreq: refactor cmdline "cpufreq=xxx" Penny Zheng
@ 2025-08-25 14:45 ` Jan Beulich
2025-08-26 7:38 ` Jan Beulich
1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 14:45 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> A helper function handle_cpufreq_cmdline() is introduced to tidy different
> handling pathes.
> We also add a new helper cpufreq_opts_contain() to ignore redundant setting,
> like "cpufreq=hwp;hwp;xen"
> As only slot 0 of cpufreq_xen_opts[] needs explicit initializing with
> non-zero CPUFREQ_xen, dropping full array initializer could avoid touching
> initializer every time it grows
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-08-22 10:52 ` [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data Penny Zheng
@ 2025-08-25 15:01 ` Jan Beulich
2025-08-26 5:53 ` Penny, Zheng
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 15:01 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> --- a/xen/arch/x86/x86_64/cpufreq.c
> +++ b/xen/arch/x86/x86_64/cpufreq.c
> @@ -54,3 +54,22 @@ int compat_set_px_pminfo(uint32_t acpi_id,
>
> return set_px_pminfo(acpi_id, xen_perf);
> }
> +
> +int compat_set_cppc_pminfo(unsigned int acpi_id,
> + const struct compat_processor_cppc *cppc_data)
> +
> +{
> + struct xen_processor_cppc *xen_cppc;
> + unsigned long xlat_page_current;
> +
> + xlat_malloc_init(xlat_page_current);
> +
> + xen_cppc = xlat_malloc_array(xlat_page_current,
> + struct xen_processor_cppc, 1);
> + if ( unlikely(xen_cppc == NULL) )
> + return -EFAULT;
I think we want to avoid repeating the earlier mistake with using a wrong
error code. It's ENOMEM or ENOSPC or some such.
> --- a/xen/drivers/acpi/pm-op.c
> +++ b/xen/drivers/acpi/pm-op.c
> @@ -91,7 +91,9 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
> pmpt = processor_pminfo[op->cpuid];
> policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
>
> - if ( !pmpt || !pmpt->perf.states ||
> + if ( !pmpt ||
> + ((pmpt->init & XEN_PX_INIT) && !pmpt->perf.states) ||
> + ((pmpt->init & XEN_CPPC_INIT) && pmpt->perf.state_count) ||
I fear I don't understand this: In the PX case we check whether necessary
data is lacking. In the CPPC case you check that some data was provided
that we don't want to use? Why not similarly check that data we need was
provided?
> @@ -693,6 +699,120 @@ int acpi_set_pdc_bits(unsigned int acpi_id, XEN_GUEST_HANDLE(uint32) pdc)
> return ret;
> }
>
> +static void print_CPPC(const struct xen_processor_cppc *cppc_data)
> +{
> + printk("\t_CPC: highest_perf=%u, lowest_perf=%u, "
> + "nominal_perf=%u, lowest_nonlinear_perf=%u, "
> + "nominal_mhz=%uMHz, lowest_mhz=%uMHz\n",
> + cppc_data->cpc.highest_perf, cppc_data->cpc.lowest_perf,
> + cppc_data->cpc.nominal_perf, cppc_data->cpc.lowest_nonlinear_perf,
> + cppc_data->cpc.nominal_mhz, cppc_data->cpc.lowest_mhz);
> +}
> +
> +int set_cppc_pminfo(unsigned int acpi_id,
> + const struct xen_processor_cppc *cppc_data)
> +{
> + int ret = 0, cpuid;
> + struct processor_pminfo *pm_info;
> +
> + cpuid = get_cpu_id(acpi_id);
> + if ( cpuid < 0 )
> + {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if ( cppc_data->pad[0] || cppc_data->pad[1] || cppc_data->pad[2] )
> + {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if ( cpufreq_verbose )
> + printk("Set CPU%d (ACPI ID %u) CPPC state info:\n",
> + cpuid, acpi_id);
> +
> + pm_info = processor_pminfo[cpuid];
> + if ( !pm_info )
> + {
> + pm_info = xvzalloc(struct processor_pminfo);
> + if ( !pm_info )
> + {
> + ret = -ENOMEM;
> + goto out;
> + }
> + processor_pminfo[cpuid] = pm_info;
> + }
> + pm_info->acpi_id = acpi_id;
> + pm_info->id = cpuid;
> + pm_info->cppc_data = *cppc_data;
> +
> + if ( (cppc_data->flags & XEN_CPPC_PSD) &&
> + !check_psd_pminfo(cppc_data->shared_type) )
> + {
> + ret = -EINVAL;
> + goto out;
Indentation.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 07/13] xen/cpufreq: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-08-22 10:52 ` [PATCH v7 07/13] xen/cpufreq: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver Penny Zheng
@ 2025-08-25 15:07 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 15:07 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> @@ -131,12 +131,13 @@ static int __init cf_check cpufreq_driver_init(void)
>
> if ( cpufreq_controller == FREQCTL_xen )
> {
> + unsigned int i;
> ret = -ENOENT;
Blank line between declaration(s) and statement(s) please.
Then:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 08/13] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode
2025-08-22 10:52 ` [PATCH v7 08/13] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode Penny Zheng
@ 2025-08-25 15:14 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 15:14 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> @@ -50,10 +141,335 @@ int __init amd_cppc_cmdline_parse(const char *s, const char *e)
> return 0;
> }
>
> +/*
> + * If CPPC lowest_freq and nominal_freq registers are exposed then we can
> + * use them to convert perf to freq and vice versa. The conversion is
> + * extrapolated as an linear function passing by the 2 points:
> + * - (Low perf, Low freq)
> + * - (Nominal perf, Nominal freq)
> + * Parameter freq is always in kHz.
> + */
> +static int amd_cppc_khz_to_perf(const struct amd_cppc_drv_data *data,
> + unsigned int freq, uint8_t *perf)
> +{
> + const struct xen_processor_cppc *cppc_data = data->cppc_data;
> + unsigned int mul, div;
> + int offset = 0, res;
> +
> + if ( cppc_data->cpc.lowest_mhz &&
> + data->caps.nominal_perf > data->caps.lowest_perf &&
> + cppc_data->cpc.nominal_mhz > cppc_data->cpc.lowest_mhz )
> + {
> + mul = data->caps.nominal_perf - data->caps.lowest_perf;
> + div = cppc_data->cpc.nominal_mhz - cppc_data->cpc.lowest_mhz;
> +
> + /*
> + * We don't need to convert to kHz for computing offset and can
> + * directly use nominal_mhz and lowest_mhz as the division
> + * will remove the frequency unit.
> + */
> + offset = data->caps.nominal_perf -
> + (mul * cppc_data->cpc.nominal_mhz) / div;
> + }
> + else
> + {
> + /* Read Processor Max Speed(MHz) as anchor point */
> + mul = data->caps.highest_perf;
> + div = this_cpu(pxfreq_mhz);
> + if ( !div )
> + return -EOPNOTSUPP;
> + }
> +
> + res = offset + (mul * freq) / (div * 1000);
> + if ( res > UINT8_MAX )
> + {
> + printk_once(XENLOG_WARNING
> + "Perf value exceeds maximum value 255: %d\n", res);
> + *perf = UINT8_MAX;
> + return 0;
> + }
> + if ( res <= 0 )
> + {
> + printk_once(XENLOG_WARNING
> + "Perf value smaller than minimum value 0: %d\n", res);
The message text doesn't fit the if() condition anymore. Perhaps simply
omit the "0" from the text? Then:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 09/13] xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode
2025-08-22 10:52 ` [PATCH v7 09/13] xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode Penny Zheng
@ 2025-08-25 15:19 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 15:19 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> @@ -270,6 +288,10 @@ static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
>
> data->req.min_perf = min_perf;
> data->req.max_perf = max_perf;
> +#ifndef NDEBUG
> + if ( opt_active_mode )
> + ASSERT(!des_perf);
> +#endif
Simply
ASSERT(!opt_active_mode || !des_perf);
(without any #ifndef)? Then once again:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 10/13] xen/cpufreq: get performance policy from governor set via xenpm
2025-08-22 10:52 ` [PATCH v7 10/13] xen/cpufreq: get performance policy from governor set via xenpm Penny Zheng
@ 2025-08-25 15:23 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 15:23 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> Even if Xen governor is not used in amd-cppc active mode, we could
> somehow deduce which performance policy (CPUFREQ_POLICY_xxx) user wants to
> apply through which governor they choose, such as:
> If user chooses performance governor, they want maximum performance, then
> the policy shall be CPUFREQ_POLICY_PERFORMANCE
> If user chooses powersave governor, they want the least power consumption,
> then the policy shall be CPUFREQ_POLICY_POWERSAVE
> Function cpufreq_policy_from_governor() is responsible for above transition,
> and it shall be also effective when users setting new governor through xenpm.
>
> userspace are forbidden choices, and if users specify such options,
Odd use of plural here, when only one bad variant is named.
> --- a/xen/drivers/acpi/pm-op.c
> +++ b/xen/drivers/acpi/pm-op.c
> @@ -206,6 +206,14 @@ static int set_cpufreq_gov(struct xen_sysctl_pm_op *op)
> if ( new_policy.governor == NULL )
> return -EINVAL;
>
> + new_policy.policy = cpufreq_policy_from_governor(new_policy.governor);
> + if ( new_policy.policy == CPUFREQ_POLICY_UNKNOWN )
> + {
> + printk("Failed to get performance policy from %s, Try \"xenpm set-cpufreq-cppc\"\n",
> + new_policy.governor->name);
> + return -EINVAL;
> + }
Don't you also need to check for CPPC mode, or else you reject "userspace" for
other drivers as well?
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
2025-08-22 10:52 ` [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para Penny Zheng
@ 2025-08-25 15:36 ` Jan Beulich
2025-08-26 8:21 ` Penny, Zheng
2025-08-27 15:22 ` Anthony PERARD
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 15:36 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Anthony PERARD, Juergen Gross, Andrew Cooper,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> We extract cppc info from "struct xen_get_cpufreq_para", where it acts as
> a member of union, and share the space with governor info.
> However, it may fail in amd-cppc passive mode, in which governor info and
> CPPC info could co-exist, and both need to be printed together via xenpm tool.
> If we tried to still put it in "struct xen_get_cpufreq_para" (e.g. just move
> out of union), "struct xen_get_cpufreq_para" will enlarge too much to further
> make xen_sysctl.u exceed 128 bytes.
>
> So we introduce a new sub-field GET_CPUFREQ_CPPC to dedicatedly acquire
> CPPC-related para, and make get-cpufreq-para invoke GET_CPUFREQ_CPPC
> if available.
> New helpers print_cppc_para() and get_cpufreq_cppc() are introduced to
> extract CPPC-related parameters process from cpufreq para.
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com> # hypervisor
> --- a/tools/libs/ctrl/xc_pm.c
> +++ b/tools/libs/ctrl/xc_pm.c
> @@ -288,7 +288,6 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
> CHK_FIELD(s.scaling_min_freq);
> CHK_FIELD(s.u.userspace);
> CHK_FIELD(s.u.ondemand);
> - CHK_FIELD(cppc_para);
>
> #undef CHK_FIELD
What is done here is already less than what could be done; I think ...
> @@ -366,6 +365,33 @@ int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
> return ret;
> }
>
> +int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
> + xc_cppc_para_t *cppc_para)
> +{
> + int ret;
> + struct xen_sysctl sysctl = {};
> + struct xen_get_cppc_para *sys_cppc_para = &sysctl.u.pm_op.u.get_cppc;
> +
> + if ( !xch || !cppc_para )
> + {
> + errno = EINVAL;
> + return -1;
> + }
> +
> + sysctl.cmd = XEN_SYSCTL_pm_op;
> + sysctl.u.pm_op.cmd = GET_CPUFREQ_CPPC;
> + sysctl.u.pm_op.cpuid = cpuid;
> +
> + ret = xc_sysctl(xch, &sysctl);
> + if ( ret )
> + return ret;
> +
> + BUILD_BUG_ON(sizeof(*cppc_para) != sizeof(*sys_cppc_para));
> + memcpy(cppc_para, sys_cppc_para, sizeof(*sys_cppc_para));
... you minimally want to apply as much checking here.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 12/13] xen/cpufreq: bypass governor-related para for amd-cppc-epp
2025-08-22 10:52 ` [PATCH v7 12/13] xen/cpufreq: bypass governor-related para for amd-cppc-epp Penny Zheng
@ 2025-08-25 15:44 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 15:44 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Anthony PERARD, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> --- a/tools/misc/xenpm.c
> +++ b/tools/misc/xenpm.c
> @@ -832,9 +832,13 @@ static void print_cppc_para(unsigned int cpuid,
> /* print out parameters about cpu frequency */
> static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
> {
> - bool hwp = strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) == 0;
> + bool is_goverless = false;
I first thought this would be a typo just in the description. Please make
this is_governor_less, or - if you absolutely want to have a shorter
identifier - is_govless (albeit then you might also consider to drop the
is_ prefix).
> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -956,3 +956,17 @@ int __init cpufreq_register_driver(const struct cpufreq_driver *driver_data)
>
> return 0;
> }
> +
> +#ifdef CONFIG_PM_OP
> +/*
> + * Governor-less cpufreq driver indicates the driver doesn't rely on Xen
> + * governor to do performance tuning, mostly it has hardware built-in
> + * algorithm to calculate runtime workload and adjust cores frequency
> + * automatically. like Intel HWP, or CPPC in AMD.
Nit: Capital letter please at the start of a sentence.
> + */
> +bool cpufreq_is_governorless(unsigned int cpuid)
> +{
> + return processor_pminfo[cpuid]->init && (hwp_active() ||
> + cpufreq_driver.setpolicy);
> +}
> +#endif /* CONFIG_PM_OP */
Aiui the #ifdef is to please Misra, but that's not very nice to have. Does
any of the constituents stand in the way of this becoming an inline function
in a suitable header file?
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-22 10:52 ` [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver Penny Zheng
@ 2025-08-25 16:02 ` Jan Beulich
2025-08-28 4:06 ` Penny, Zheng
2025-08-27 15:58 ` Anthony PERARD
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-25 16:02 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Anthony PERARD, Andrew Cooper, Michal Orzel,
Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> @@ -557,6 +557,187 @@ static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
> return 0;
> }
>
> +#ifdef CONFIG_PM_OP
> +int get_amd_cppc_para(const struct cpufreq_policy *policy,
> + struct xen_get_cppc_para *cppc_para)
amd_cppc_get_para() and ...
> +{
> + const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
> + policy->cpu);
> +
> + if ( data == NULL )
> + return -ENODATA;
> +
> + cppc_para->policy = policy->policy;
> + cppc_para->lowest = data->caps.lowest_perf;
> + cppc_para->lowest_nonlinear = data->caps.lowest_nonlinear_perf;
> + cppc_para->nominal = data->caps.nominal_perf;
> + cppc_para->highest = data->caps.highest_perf;
> + cppc_para->minimum = data->req.min_perf;
> + cppc_para->maximum = data->req.max_perf;
> + cppc_para->desired = data->req.des_perf;
> + cppc_para->energy_perf = data->req.epp;
> +
> + return 0;
> +}
> +
> +int set_amd_cppc_para(struct cpufreq_policy *policy,
> + const struct xen_set_cppc_para *set_cppc)
... amd_cppc_set_para() would imo be more consistent names, considering how
other functions are named.
> +{
> + unsigned int cpu = policy->cpu;
> + struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
> + uint8_t max_perf, min_perf, des_perf, epp;
> + bool active_mode = cpufreq_is_governorless(cpu);
> +
> + if ( data == NULL )
> + return -ENOENT;
> +
> + /* Return if there is nothing to do. */
> + if ( set_cppc->set_params == 0 )
> + return 0;
That is ...
> + /* Only allow values if params bit is set. */
> + if ( (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
> + set_cppc->desired) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
> + set_cppc->minimum) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
> + set_cppc->maximum) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF) &&
> + set_cppc->energy_perf) )
> + return -EINVAL;
... all the errors checked here are to be ignored when no flag is set at
all?
> + /*
> + * Validate all parameters
> + * Maximum performance may be set to any performance value in the range
> + * [Nonlinear Lowest Performance, Highest Performance], inclusive but must
> + * be set to a value that is larger than or equal to minimum Performance.
> + */
> + if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
> + (set_cppc->maximum > data->caps.highest_perf ||
> + set_cppc->maximum <
> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM
> + ? set_cppc->minimum
> + : data->req.min_perf)) )
Too deep indentation (more of this throughout the function), and seeing ...
> + return -EINVAL;
> + /*
> + * Minimum performance may be set to any performance value in the range
> + * [Nonlinear Lowest Performance, Highest Performance], inclusive but must
> + * be set to a value that is less than or equal to Maximum Performance.
> + */
> + if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
> + (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
> + (set_cppc->minimum >
... this, one more pair of parentheses may also help there. (Recall:
symmetry where possible.)
> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM
> + ? set_cppc->maximum
> + : data->req.max_perf))) )
> + return -EINVAL;
> + /*
> + * Desired performance may be set to any performance value in the range
> + * [Minimum Performance, Maximum Performance], inclusive.
> + */
> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
> + {
> + if ( active_mode )
> + return -EOPNOTSUPP;
> +
> + if ( (set_cppc->desired >
> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM
> + ? set_cppc->maximum
> + : data->req.max_perf)) ||
> + (set_cppc->desired <
> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM
> + ? set_cppc->minimum
> + : data->req.min_perf)) )
> + return -EINVAL;
> + }
> + /*
> + * Energy Performance Preference may be set with a range of values
> + * from 0 to 0xFF
> + */
> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF )
> + {
> + if ( !active_mode )
> + return -EOPNOTSUPP;
> +
> + if ( set_cppc->energy_perf > UINT8_MAX )
> + return -EINVAL;
> + }
> +
> + /* Activity window not supported in MSR */
> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW )
> + return -EOPNOTSUPP;
> +
> + epp = per_cpu(epp_init, cpu);
> + min_perf = data->caps.lowest_nonlinear_perf;
> + max_perf = data->caps.highest_perf;
> + des_perf = data->req.des_perf;
> + /*
> + * Apply presets:
> + * XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE/PERFORMANCE/ONDEMAND are
> + * only available when CPPC in active mode
> + */
> + switch ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_PRESET_MASK )
> + {
> + case XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE:
> + if ( !active_mode )
> + return -EINVAL;
> + policy->policy = CPUFREQ_POLICY_POWERSAVE;
> + /*
> + * Lower max_perf to nonlinear_lowest to achieve
> + * ultmost power saviongs
> + */
> + max_perf = min_perf;
> + epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
> + break;
> +
> + case XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE:
> + if ( !active_mode )
> + return -EINVAL;
> + policy->policy = CPUFREQ_POLICY_PERFORMANCE;
> + /* Increase min_perf to highest to achieve ultmost performance */
> + min_perf = max_perf;
> + epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
> + break;
> +
> + case XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND:
> + if ( !active_mode )
> + return -EINVAL;
> + policy->policy = CPUFREQ_POLICY_ONDEMAND;
> + /*
> + * Take medium value to show no preference over
> + * performance or powersave
> + */
> + epp = CPPC_ENERGY_PERF_BALANCE;
> + break;
> +
> + case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
> + if ( active_mode )
> + policy->policy = CPUFREQ_POLICY_UNKNOWN;
> + break;
> +
> + default:
> + return -EINVAL;
> + }
Much of this looks very similar to what patch 09 introduces in
amd_cppc_epp_set_policy(). Is it not possible to reduce the redundancy?
> --- a/xen/include/acpi/cpufreq/cpufreq.h
> +++ b/xen/include/acpi/cpufreq/cpufreq.h
> @@ -81,7 +81,18 @@ struct cpufreq_policy {
> int8_t turbo; /* tristate flag: 0 for unsupported
> * -1 for disable, 1 for enabled
> * See CPUFREQ_TURBO_* below for defines */
> - unsigned int policy; /* CPUFREQ_POLICY_* */
> + unsigned int policy; /* Performance Policy
> + * If cpufreq_driver->target() exists,
> + * the ->governor decides what frequency
> + * within the limits is used.
> + * If cpufreq_driver->setpolicy() exists, these
> + * following policies are available:
> + * CPUFREQ_POLICY_PERFORMANCE represents
> + * maximum performance
> + * CPUFREQ_POLICY_POWERSAVE represents least
> + * power consumption
> + * CPUFREQ_POLICY_ONDEMAND represents no
> + * preference over performance or powersave */
Besides not being a well-formed comment, this is close to unreadable in this
shape. This much text wants putting ahead of the field.
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -336,8 +336,14 @@ struct xen_ondemand {
> uint32_t up_threshold;
> };
>
> +#define CPUFREQ_POLICY_UNKNOWN 0
> +#define CPUFREQ_POLICY_POWERSAVE 1
> +#define CPUFREQ_POLICY_PERFORMANCE 2
> +#define CPUFREQ_POLICY_ONDEMAND 3
Without XEN_ prefixes they shouldn't appear in a public header. But do
we need ...
> struct xen_get_cppc_para {
> /* OUT */
> + uint32_t policy; /* CPUFREQ_POLICY_xxx */
... the new field at all? Can't you synthesize the kind-of-governor into
struct xen_get_cpufreq_para's respective field? You invoke both sub-ops
from xenpm now anyway ...
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 04/13] xen/cpufreq: add missing default: case for x86 vendor
2025-08-25 14:43 ` Jan Beulich
@ 2025-08-26 4:23 ` Penny, Zheng
0 siblings, 0 replies; 43+ messages in thread
From: Penny, Zheng @ 2025-08-26 4:23 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Andrew Cooper, Roger Pau Monné,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, August 25, 2025 10:43 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v7 04/13] xen/cpufreq: add missing default: case for x86
> vendor
>
> On 22.08.2025 12:52, Penny Zheng wrote:
> > Since we are missing default case for x86 vendor, there is possibility (i.e.
> > new vendor introduced) that we will return successfully while missing
> > the whole cpufreq driver initialization process.
> > Move "ret = -ENOENTRY" forward to cover default case for x86 vendor,
Typo: -ENOENT
> > and add error log
>
> Requested-by: Jan Beulich <jbeulich@suse.com> (or Suggested-by: if you like that
> better)
Ack
>
> > Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thx
>
> Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-08-25 15:01 ` Jan Beulich
@ 2025-08-26 5:53 ` Penny, Zheng
2025-08-26 5:58 ` Jan Beulich
0 siblings, 1 reply; 43+ messages in thread
From: Penny, Zheng @ 2025-08-26 5:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Orzel, Michal, Julien Grall, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, August 25, 2025 11:02 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>;
> Anthony PERARD <anthony.perard@vates.tech>; Orzel, Michal
> <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to
> propagate CPPC data
>
> On 22.08.2025 12:52, Penny Zheng wrote:
> > --- a/xen/arch/x86/x86_64/cpufreq.c
> > +++ b/xen/arch/x86/x86_64/cpufreq.c
> > @@ -54,3 +54,22 @@ int compat_set_px_pminfo(uint32_t acpi_id,
> >
> > return set_px_pminfo(acpi_id, xen_perf); }
> > +
> > +int compat_set_cppc_pminfo(unsigned int acpi_id,
> > + const struct compat_processor_cppc
> > +*cppc_data)
> > +
> > +{
> > + struct xen_processor_cppc *xen_cppc;
> > + unsigned long xlat_page_current;
> > +
> > + xlat_malloc_init(xlat_page_current);
> > +
> > + xen_cppc = xlat_malloc_array(xlat_page_current,
> > + struct xen_processor_cppc, 1);
> > + if ( unlikely(xen_cppc == NULL) )
> > + return -EFAULT;
>
> I think we want to avoid repeating the earlier mistake with using a wrong error code.
> It's ENOMEM or ENOSPC or some such.
>
Understood, I'll change it to -ENOMEM
> > --- a/xen/drivers/acpi/pm-op.c
> > +++ b/xen/drivers/acpi/pm-op.c
> > @@ -91,7 +91,9 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
> > pmpt = processor_pminfo[op->cpuid];
> > policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
> >
> > - if ( !pmpt || !pmpt->perf.states ||
> > + if ( !pmpt ||
> > + ((pmpt->init & XEN_PX_INIT) && !pmpt->perf.states) ||
> > + ((pmpt->init & XEN_CPPC_INIT) && pmpt->perf.state_count) ||
>
> I fear I don't understand this: In the PX case we check whether necessary data is
> lacking. In the CPPC case you check that some data was provided that we don't
> want to use? Why not similarly check that data we need was provided?
>
We are introducing another checking line for CPPC is actually to avoid NULL deref of state[i]:
```
for ( i = 0; i < op->u.get_para.freq_num; i++ )
data[i] = pmpt->perf.states[i].core_frequency * 1000;
```
We want to ensure "op->u.get_para.freq_num" is always zero in CPPC mode, which is validated against pmpt->perf.state_count.
We have similar discussion in here https://old-list-archives.xen.org/archives/html/xen-devel/2025-06/msg01160.html
>
> Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-08-26 5:53 ` Penny, Zheng
@ 2025-08-26 5:58 ` Jan Beulich
2025-08-26 6:38 ` Penny, Zheng
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-26 5:58 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Orzel, Michal, Julien Grall, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 26.08.2025 07:53, Penny, Zheng wrote:
> [Public]
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, August 25, 2025 11:02 PM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>;
>> Anthony PERARD <anthony.perard@vates.tech>; Orzel, Michal
>> <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
>> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
>> Subject: Re: [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to
>> propagate CPPC data
>>
>> On 22.08.2025 12:52, Penny Zheng wrote:
>>> --- a/xen/arch/x86/x86_64/cpufreq.c
>>> +++ b/xen/arch/x86/x86_64/cpufreq.c
>>> @@ -54,3 +54,22 @@ int compat_set_px_pminfo(uint32_t acpi_id,
>>>
>>> return set_px_pminfo(acpi_id, xen_perf); }
>>> +
>>> +int compat_set_cppc_pminfo(unsigned int acpi_id,
>>> + const struct compat_processor_cppc
>>> +*cppc_data)
>>> +
>>> +{
>>> + struct xen_processor_cppc *xen_cppc;
>>> + unsigned long xlat_page_current;
>>> +
>>> + xlat_malloc_init(xlat_page_current);
>>> +
>>> + xen_cppc = xlat_malloc_array(xlat_page_current,
>>> + struct xen_processor_cppc, 1);
>>> + if ( unlikely(xen_cppc == NULL) )
>>> + return -EFAULT;
>>
>> I think we want to avoid repeating the earlier mistake with using a wrong error code.
>> It's ENOMEM or ENOSPC or some such.
>>
>
> Understood, I'll change it to -ENOMEM
>
>>> --- a/xen/drivers/acpi/pm-op.c
>>> +++ b/xen/drivers/acpi/pm-op.c
>>> @@ -91,7 +91,9 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
>>> pmpt = processor_pminfo[op->cpuid];
>>> policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
>>>
>>> - if ( !pmpt || !pmpt->perf.states ||
>>> + if ( !pmpt ||
>>> + ((pmpt->init & XEN_PX_INIT) && !pmpt->perf.states) ||
>>> + ((pmpt->init & XEN_CPPC_INIT) && pmpt->perf.state_count) ||
>>
>> I fear I don't understand this: In the PX case we check whether necessary data is
>> lacking. In the CPPC case you check that some data was provided that we don't
>> want to use? Why not similarly check that data we need was provided?
>>
>
> We are introducing another checking line for CPPC is actually to avoid NULL deref of state[i]:
> ```
> for ( i = 0; i < op->u.get_para.freq_num; i++ )
> data[i] = pmpt->perf.states[i].core_frequency * 1000;
> ```
> We want to ensure "op->u.get_para.freq_num" is always zero in CPPC mode, which is validated against pmpt->perf.state_count.
> We have similar discussion in here https://old-list-archives.xen.org/archives/html/xen-devel/2025-06/msg01160.html
Indeed I was thinking that we would have touched this before. As to your reply:
This explains the .state_count check (which imo wants a comment). It doesn't,
however, explain the absence of a "have we got the data we need" part. Unless
of course there simply isn't anything to check for.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-08-26 5:58 ` Jan Beulich
@ 2025-08-26 6:38 ` Penny, Zheng
0 siblings, 0 replies; 43+ messages in thread
From: Penny, Zheng @ 2025-08-26 6:38 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Orzel, Michal, Julien Grall, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, August 26, 2025 1:59 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>;
> Anthony PERARD <anthony.perard@vates.tech>; Orzel, Michal
> <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to
> propagate CPPC data
>
> On 26.08.2025 07:53, Penny, Zheng wrote:
> > [Public]
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Monday, August 25, 2025 11:02 PM
> >> To: Penny, Zheng <penny.zheng@amd.com>
> >> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> >> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>;
> >> Anthony PERARD <anthony.perard@vates.tech>; Orzel, Michal
> >> <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Stefano
> >> Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> >> Subject: Re: [PATCH v7 06/13] xen/cpufreq: introduce new
> >> sub-hypercall to propagate CPPC data
> >>
> >> On 22.08.2025 12:52, Penny Zheng wrote:
> >>> --- a/xen/arch/x86/x86_64/cpufreq.c
> >>> +++ b/xen/arch/x86/x86_64/cpufreq.c
> >>> @@ -54,3 +54,22 @@ int compat_set_px_pminfo(uint32_t acpi_id,
> >>>
> >>> return set_px_pminfo(acpi_id, xen_perf); }
> >>> +
> >>> +int compat_set_cppc_pminfo(unsigned int acpi_id,
> >>> + const struct compat_processor_cppc
> >>> +*cppc_data)
> >>> +
> >>> +{
> >>> + struct xen_processor_cppc *xen_cppc;
> >>> + unsigned long xlat_page_current;
> >>> +
> >>> + xlat_malloc_init(xlat_page_current);
> >>> +
> >>> + xen_cppc = xlat_malloc_array(xlat_page_current,
> >>> + struct xen_processor_cppc, 1);
> >>> + if ( unlikely(xen_cppc == NULL) )
> >>> + return -EFAULT;
> >>
> >> I think we want to avoid repeating the earlier mistake with using a wrong error
> code.
> >> It's ENOMEM or ENOSPC or some such.
> >>
> >
> > Understood, I'll change it to -ENOMEM
> >
> >>> --- a/xen/drivers/acpi/pm-op.c
> >>> +++ b/xen/drivers/acpi/pm-op.c
> >>> @@ -91,7 +91,9 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op
> *op)
> >>> pmpt = processor_pminfo[op->cpuid];
> >>> policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
> >>>
> >>> - if ( !pmpt || !pmpt->perf.states ||
> >>> + if ( !pmpt ||
> >>> + ((pmpt->init & XEN_PX_INIT) && !pmpt->perf.states) ||
> >>> + ((pmpt->init & XEN_CPPC_INIT) && pmpt->perf.state_count)
> >>> + ||
> >>
> >> I fear I don't understand this: In the PX case we check whether
> >> necessary data is lacking. In the CPPC case you check that some data
> >> was provided that we don't want to use? Why not similarly check that data we
> need was provided?
> >>
> >
> > We are introducing another checking line for CPPC is actually to avoid NULL
> deref of state[i]:
> > ```
> > for ( i = 0; i < op->u.get_para.freq_num; i++ )
> > data[i] = pmpt->perf.states[i].core_frequency * 1000;
> > ``` We want to ensure "op->u.get_para.freq_num" is always zero in CPPC
> > mode, which is validated against pmpt->perf.state_count.
> > We have similar discussion in here
> > https://old-list-archives.xen.org/archives/html/xen-devel/2025-06/msg0
> > 1160.html
>
> Indeed I was thinking that we would have touched this before. As to your reply:
> This explains the .state_count check (which imo wants a comment). It doesn't,
Understood, I'll complement
> however, explain the absence of a "have we got the data we need" part. Unless of
> course there simply isn't anything to check for.
>
Yes, imo, there isn’t anything to check.
In get_cpufreq_para(). we are not accessing data specific to CPPC.
> Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 05/13] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-08-22 10:52 ` [PATCH v7 05/13] xen/cpufreq: refactor cmdline "cpufreq=xxx" Penny Zheng
2025-08-25 14:45 ` Jan Beulich
@ 2025-08-26 7:38 ` Jan Beulich
1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-26 7:38 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, xen-devel
On 22.08.2025 12:52, Penny Zheng wrote:
> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -64,12 +64,49 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
> /* set xen as default cpufreq */
> enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
>
> -enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
> - CPUFREQ_none };
> +enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen };
The pre-push pipeline flagged a Misra rule 9.3 violation here: A dedicated
initializer is needed to have the intended effect. I've fixed this up for
you.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
2025-08-25 15:36 ` Jan Beulich
@ 2025-08-26 8:21 ` Penny, Zheng
2025-08-26 8:32 ` Jan Beulich
0 siblings, 1 reply; 43+ messages in thread
From: Penny, Zheng @ 2025-08-26 8:21 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Juergen Gross, Andrew Cooper,
Orzel, Michal, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, August 25, 2025 11:37 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Juergen Gross <jgross@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; Orzel, Michal <Michal.Orzel@amd.com>;
> Julien Grall <julien@xen.org>; Roger Pau Monné <roger.pau@citrix.com>; Stefano
> Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
>
> On 22.08.2025 12:52, Penny Zheng wrote:
> > We extract cppc info from "struct xen_get_cpufreq_para", where it acts
> > as a member of union, and share the space with governor info.
> > However, it may fail in amd-cppc passive mode, in which governor info
> > and CPPC info could co-exist, and both need to be printed together via xenpm
> tool.
> > If we tried to still put it in "struct xen_get_cpufreq_para" (e.g.
> > just move out of union), "struct xen_get_cpufreq_para" will enlarge
> > too much to further make xen_sysctl.u exceed 128 bytes.
> >
> > So we introduce a new sub-field GET_CPUFREQ_CPPC to dedicatedly
> > acquire CPPC-related para, and make get-cpufreq-para invoke
> > GET_CPUFREQ_CPPC if available.
> > New helpers print_cppc_para() and get_cpufreq_cppc() are introduced to
> > extract CPPC-related parameters process from cpufreq para.
> >
> > Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>
> Acked-by: Jan Beulich <jbeulich@suse.com> # hypervisor
>
Thx
> > --- a/tools/libs/ctrl/xc_pm.c
> > +++ b/tools/libs/ctrl/xc_pm.c
> > @@ -288,7 +288,6 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
> > CHK_FIELD(s.scaling_min_freq);
> > CHK_FIELD(s.u.userspace);
> > CHK_FIELD(s.u.ondemand);
> > - CHK_FIELD(cppc_para);
> >
> > #undef CHK_FIELD
>
> What is done here is already less than what could be done; I think ...
>
Emm, maybe because we define two different cpufreq para structures for user space and sysctl, struct xc_get_cpufreq_para and struct xen_get_cppc_para.
But for cppc para, it is an alias:
typedef struct xen_get_cppc_para xc_cppc_para_t;
So ...
> > @@ -366,6 +365,33 @@ int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
> > return ret;
> > }
> >
> > +int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
> > + xc_cppc_para_t *cppc_para) {
> > + int ret;
> > + struct xen_sysctl sysctl = {};
> > + struct xen_get_cppc_para *sys_cppc_para =
> > +&sysctl.u.pm_op.u.get_cppc;
> > +
> > + if ( !xch || !cppc_para )
> > + {
> > + errno = EINVAL;
> > + return -1;
> > + }
> > +
> > + sysctl.cmd = XEN_SYSCTL_pm_op;
> > + sysctl.u.pm_op.cmd = GET_CPUFREQ_CPPC;
> > + sysctl.u.pm_op.cpuid = cpuid;
> > +
> > + ret = xc_sysctl(xch, &sysctl);
> > + if ( ret )
> > + return ret;
> > +
> > + BUILD_BUG_ON(sizeof(*cppc_para) != sizeof(*sys_cppc_para));
... maybe whole structure size checking is enough?
> > + memcpy(cppc_para, sys_cppc_para, sizeof(*sys_cppc_para));
>
> ... you minimally want to apply as much checking here.
>
> Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
2025-08-26 8:21 ` Penny, Zheng
@ 2025-08-26 8:32 ` Jan Beulich
2025-08-26 9:12 ` Penny, Zheng
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-26 8:32 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Anthony PERARD, Juergen Gross, Andrew Cooper,
Orzel, Michal, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel@lists.xenproject.org
On 26.08.2025 10:21, Penny, Zheng wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, August 25, 2025 11:37 PM
>>
>> On 22.08.2025 12:52, Penny Zheng wrote:
>>> --- a/tools/libs/ctrl/xc_pm.c
>>> +++ b/tools/libs/ctrl/xc_pm.c
>>> @@ -288,7 +288,6 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
>>> CHK_FIELD(s.scaling_min_freq);
>>> CHK_FIELD(s.u.userspace);
>>> CHK_FIELD(s.u.ondemand);
>>> - CHK_FIELD(cppc_para);
>>>
>>> #undef CHK_FIELD
>>
>> What is done here is already less than what could be done; I think ...
>>
>
> Emm, maybe because we define two different cpufreq para structures for user space and sysctl, struct xc_get_cpufreq_para and struct xen_get_cppc_para.
> But for cppc para, it is an alias:
> typedef struct xen_get_cppc_para xc_cppc_para_t;
Oh. Then ...
> So ...
>
>>> @@ -366,6 +365,33 @@ int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
>>> return ret;
>>> }
>>>
>>> +int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
>>> + xc_cppc_para_t *cppc_para) {
>>> + int ret;
>>> + struct xen_sysctl sysctl = {};
>>> + struct xen_get_cppc_para *sys_cppc_para =
>>> +&sysctl.u.pm_op.u.get_cppc;
>>> +
>>> + if ( !xch || !cppc_para )
>>> + {
>>> + errno = EINVAL;
>>> + return -1;
>>> + }
>>> +
>>> + sysctl.cmd = XEN_SYSCTL_pm_op;
>>> + sysctl.u.pm_op.cmd = GET_CPUFREQ_CPPC;
>>> + sysctl.u.pm_op.cpuid = cpuid;
>>> +
>>> + ret = xc_sysctl(xch, &sysctl);
>>> + if ( ret )
>>> + return ret;
>>> +
>>> + BUILD_BUG_ON(sizeof(*cppc_para) != sizeof(*sys_cppc_para));
... why is this here, when ...
>>> + memcpy(cppc_para, sys_cppc_para, sizeof(*sys_cppc_para));
>>
>> ... you minimally want to apply as much checking here.
... a better effect can be had by
cppc_para = sys_cppc_para;
?
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
2025-08-26 8:32 ` Jan Beulich
@ 2025-08-26 9:12 ` Penny, Zheng
0 siblings, 0 replies; 43+ messages in thread
From: Penny, Zheng @ 2025-08-26 9:12 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Juergen Gross, Andrew Cooper,
Orzel, Michal, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, August 26, 2025 4:33 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Juergen Gross <jgross@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; Orzel, Michal <Michal.Orzel@amd.com>;
> Julien Grall <julien@xen.org>; Roger Pau Monné <roger.pau@citrix.com>; Stefano
> Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
>
> On 26.08.2025 10:21, Penny, Zheng wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Monday, August 25, 2025 11:37 PM
> >>
> >> On 22.08.2025 12:52, Penny Zheng wrote:
> >>> --- a/tools/libs/ctrl/xc_pm.c
> >>> +++ b/tools/libs/ctrl/xc_pm.c
> >>> @@ -288,7 +288,6 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
> >>> CHK_FIELD(s.scaling_min_freq);
> >>> CHK_FIELD(s.u.userspace);
> >>> CHK_FIELD(s.u.ondemand);
> >>> - CHK_FIELD(cppc_para);
> >>>
> >>> #undef CHK_FIELD
> >>
> >> What is done here is already less than what could be done; I think ...
> >>
> >
> > Emm, maybe because we define two different cpufreq para structures for user
> space and sysctl, struct xc_get_cpufreq_para and struct xen_get_cppc_para.
> > But for cppc para, it is an alias:
> > typedef struct xen_get_cppc_para xc_cppc_para_t;
>
> Oh. Then ...
>
> > So ...
> >
> >>> @@ -366,6 +365,33 @@ int xc_set_cpufreq_cppc(xc_interface *xch, int
> cpuid,
> >>> return ret;
> >>> }
> >>>
> >>> +int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
> >>> + xc_cppc_para_t *cppc_para) {
> >>> + int ret;
> >>> + struct xen_sysctl sysctl = {};
> >>> + struct xen_get_cppc_para *sys_cppc_para =
> >>> +&sysctl.u.pm_op.u.get_cppc;
> >>> +
> >>> + if ( !xch || !cppc_para )
> >>> + {
> >>> + errno = EINVAL;
> >>> + return -1;
> >>> + }
> >>> +
> >>> + sysctl.cmd = XEN_SYSCTL_pm_op;
> >>> + sysctl.u.pm_op.cmd = GET_CPUFREQ_CPPC;
> >>> + sysctl.u.pm_op.cpuid = cpuid;
> >>> +
> >>> + ret = xc_sysctl(xch, &sysctl);
> >>> + if ( ret )
> >>> + return ret;
> >>> +
> >>> + BUILD_BUG_ON(sizeof(*cppc_para) != sizeof(*sys_cppc_para));
>
> ... why is this here, when ...
>
> >>> + memcpy(cppc_para, sys_cppc_para, sizeof(*sys_cppc_para));
> >>
> >> ... you minimally want to apply as much checking here.
>
> ... a better effect can be had by
>
> cppc_para = sys_cppc_para;
>
> ?
>
True, no need to do memory copy then if it is an alias
> Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
2025-08-22 10:52 ` [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para Penny Zheng
2025-08-25 15:36 ` Jan Beulich
@ 2025-08-27 15:22 ` Anthony PERARD
2025-08-28 4:14 ` Penny, Zheng
1 sibling, 1 reply; 43+ messages in thread
From: Anthony PERARD @ 2025-08-27 15:22 UTC (permalink / raw)
To: Penny Zheng
Cc: xen-devel, ray.huang, Anthony PERARD, Juergen Gross,
Andrew Cooper, Michal Orzel, Jan Beulich, Julien Grall,
Roger Pau Monné, Stefano Stabellini
On Fri, Aug 22, 2025 at 06:52:16PM +0800, Penny Zheng wrote:
> diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
> index 6b054b10a4..8fc1d7cc65 100644
> --- a/tools/misc/xenpm.c
> +++ b/tools/misc/xenpm.c
> @@ -898,6 +900,23 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
> printf("\n");
> }
>
> +/* show cpu cppc parameters information on CPU cpuid */
> +static int show_cppc_para_by_cpuid(xc_interface *xc_handle, unsigned int cpuid)
> +{
> + int ret;
> + xc_cppc_para_t cppc_para;
> +
> + ret = xc_get_cppc_para(xc_handle, cpuid, &cppc_para);
> + if ( !ret )
> + print_cppc_para(cpuid, &cppc_para);
> + else if ( errno == ENODEV )
> + ret = 0; /* Ignore unsupported platform */
> + else
> + fprintf(stderr, "[CPU%u] failed to get cppc parameter\n", cpuid);
You might want to add ": %s" strerror(errno) to the error printed, which
could help figure out why we failed to get the parameters.
The rest of the tool side of the patch, with Jan suggestion, looks good
to me, so Acked-by: Anthony PERARD <anthony.perard@vates.tech> for the
next round.
Thanks,
--
Anthony PERARD
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-22 10:52 ` [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver Penny Zheng
2025-08-25 16:02 ` Jan Beulich
@ 2025-08-27 15:58 ` Anthony PERARD
2025-08-27 16:08 ` Jan Beulich
1 sibling, 1 reply; 43+ messages in thread
From: Anthony PERARD @ 2025-08-27 15:58 UTC (permalink / raw)
To: Penny Zheng
Cc: xen-devel, ray.huang, Anthony PERARD, Andrew Cooper, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
On Fri, Aug 22, 2025 at 06:52:18PM +0800, Penny Zheng wrote:
> diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
> index 02981c4583..eedb745a46 100644
> --- a/tools/misc/xenpm.c
> +++ b/tools/misc/xenpm.c
> @@ -38,6 +38,13 @@
> static xc_interface *xc_handle;
> static unsigned int max_cpu_nr;
>
> +static const char cpufreq_policy_str[][12] = {
Is it necessary to hard-code an hand calculated size of the literal
strings? Can't we let the compiler do that for us? With this as type:
static const char *cpufreq_policy_str[] = {
The compiler might not detect an issue if we write "11" instead of "12",
for example.
> + [CPUFREQ_POLICY_UNKNOWN] = "unknown",
> + [CPUFREQ_POLICY_POWERSAVE] = "powersave",
> + [CPUFREQ_POLICY_PERFORMANCE] = "performance",
> + [CPUFREQ_POLICY_ONDEMAND] = "ondemand",
> +};
> +
> /* help message */
> void show_help(void)
> {
Otherwise the tool side of the patch looks fine to me,
so: Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Thanks,
--
Anthony PERARD
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-27 15:58 ` Anthony PERARD
@ 2025-08-27 16:08 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2025-08-27 16:08 UTC (permalink / raw)
To: Anthony PERARD
Cc: xen-devel, ray.huang, Anthony PERARD, Andrew Cooper, Michal Orzel,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
Penny Zheng
On 27.08.2025 17:58, Anthony PERARD wrote:
> On Fri, Aug 22, 2025 at 06:52:18PM +0800, Penny Zheng wrote:
>> diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
>> index 02981c4583..eedb745a46 100644
>> --- a/tools/misc/xenpm.c
>> +++ b/tools/misc/xenpm.c
>> @@ -38,6 +38,13 @@
>> static xc_interface *xc_handle;
>> static unsigned int max_cpu_nr;
>>
>> +static const char cpufreq_policy_str[][12] = {
>
> Is it necessary to hard-code an hand calculated size of the literal
> strings? Can't we let the compiler do that for us? With this as type:
>
> static const char *cpufreq_policy_str[] = {
I think it was me to request this. Your approach has an extra level of
indirection (perhaps not a big problem here), and requires runtime
relocations when building as PIE (maybe also not a big problem here).
The 2nd const that's wanted is also, as can be seen, frequently
omitted. Overall I'm generally striving towards using more efficient
code also where efficiency isn't of primary concern, simply because
code is being copied, often without looking very closely.
What we may want to do is bump to 12 to 16, adding some leeway and
making calculations a little easier.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-25 16:02 ` Jan Beulich
@ 2025-08-28 4:06 ` Penny, Zheng
2025-08-28 6:35 ` Jan Beulich
0 siblings, 1 reply; 43+ messages in thread
From: Penny, Zheng @ 2025-08-28 4:06 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, August 26, 2025 12:03 AM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger
> Pau Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
> xen_sysctl_pm_op for amd-cppc driver
>
> On 22.08.2025 12:52, Penny Zheng wrote:
> > --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > + /* Only allow values if params bit is set. */
> > + if ( (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
> > + set_cppc->desired) ||
> > + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
> > + set_cppc->minimum) ||
> > + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
> > + set_cppc->maximum) ||
> > + (!(set_cppc->set_params &
> XEN_SYSCTL_CPPC_SET_ENERGY_PERF) &&
> > + set_cppc->energy_perf) )
> > + return -EINVAL;
>
> ... all the errors checked here are to be ignored when no flag is set at all?
>
Yes, values are only meaningful when according flag is properly set, which has been described in the comment for "struct xen_set_cppc_para"
> > + /*
> > + * Validate all parameters
> > + * Maximum performance may be set to any performance value in the range
> > + * [Nonlinear Lowest Performance, Highest Performance], inclusive but
> must
> > + * be set to a value that is larger than or equal to minimum Performance.
> > + */
> > + if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
> > + (set_cppc->maximum > data->caps.highest_perf ||
> > + set_cppc->maximum <
> > + (set_cppc->set_params &
> XEN_SYSCTL_CPPC_SET_MINIMUM
> > + ? set_cppc->minimum
> > + : data->req.min_perf)) )
>
> Too deep indentation (more of this throughout the function), and seeing ...
Maybe four indention is more proper
```
if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
(set_cppc->maximum > data->caps.highest_perf ||
(set_cppc->maximum <
(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM
? set_cppc->minimum
: data->req.min_perf))) )
```
> > + case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
> > + if ( active_mode )
> > + policy->policy = CPUFREQ_POLICY_UNKNOWN;
> > + break;
> > +
> > + default:
> > + return -EINVAL;
> > + }
>
> Much of this looks very similar to what patch 09 introduces in
> amd_cppc_epp_set_policy(). Is it not possible to reduce the redundancy?
>
I'll add a new helper to amd_cppc_prepare_policy() to extract common
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -336,8 +336,14 @@ struct xen_ondemand {
> > uint32_t up_threshold;
> > };
> >
> > +#define CPUFREQ_POLICY_UNKNOWN 0
> > +#define CPUFREQ_POLICY_POWERSAVE 1
> > +#define CPUFREQ_POLICY_PERFORMANCE 2
> > +#define CPUFREQ_POLICY_ONDEMAND 3
>
> Without XEN_ prefixes they shouldn't appear in a public header. But do we
> need ...
>
> > struct xen_get_cppc_para {
> > /* OUT */
> > + uint32_t policy; /* CPUFREQ_POLICY_xxx */
>
> ... the new field at all? Can't you synthesize the kind-of-governor into struct
> xen_get_cpufreq_para's respective field? You invoke both sub-ops from xenpm
> now anyway ...
>
Maybe I could borrow governor field to indicate policy info, like the following in print_cpufreq_para(), then we don't need to add the new filed "policy"
```
+ /* Translate governor info to policy info in CPPC active mode */
+ if ( is_cppc_active )
+ {
+ if ( !strncmp(p_cpufreq->u.s.scaling_governor,
+ "ondemand", CPUFREQ_NAME_LEN) )
+ printf("cppc policy : ondemand\n");
+ else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
+ "performance", CPUFREQ_NAME_LEN) )
+ printf("cppc policy : performance\n");
+
+ else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
+ "powersave", CPUFREQ_NAME_LEN) )
+ printf("cppc policy : powersave\n");
+ else
+ printf("cppc policy : unknown\n");
+ }
+
```
> Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
2025-08-27 15:22 ` Anthony PERARD
@ 2025-08-28 4:14 ` Penny, Zheng
0 siblings, 0 replies; 43+ messages in thread
From: Penny, Zheng @ 2025-08-28 4:14 UTC (permalink / raw)
To: Anthony PERARD
Cc: xen-devel@lists.xenproject.org, Huang, Ray, Anthony PERARD,
Juergen Gross, Andrew Cooper, Orzel, Michal, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
[Public]
> -----Original Message-----
> From: Anthony PERARD <anthony@xenproject.org>
> Sent: Wednesday, August 27, 2025 11:22 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: xen-devel@lists.xenproject.org; Huang, Ray <Ray.Huang@amd.com>; Anthony
> PERARD <anthony.perard@vates.tech>; Juergen Gross <jgross@suse.com>;
> Andrew Cooper <andrew.cooper3@citrix.com>; Orzel, Michal
> <Michal.Orzel@amd.com>; Jan Beulich <jbeulich@suse.com>; Julien Grall
> <julien@xen.org>; Roger Pau Monné <roger.pau@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Subject: Re: [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para
>
> On Fri, Aug 22, 2025 at 06:52:16PM +0800, Penny Zheng wrote:
> > diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c index
> > 6b054b10a4..8fc1d7cc65 100644
> > --- a/tools/misc/xenpm.c
> > +++ b/tools/misc/xenpm.c
> > @@ -898,6 +900,23 @@ static void print_cpufreq_para(int cpuid, struct
> xc_get_cpufreq_para *p_cpufreq)
> > printf("\n");
> > }
> >
> > +/* show cpu cppc parameters information on CPU cpuid */ static int
> > +show_cppc_para_by_cpuid(xc_interface *xc_handle, unsigned int cpuid)
> > +{
> > + int ret;
> > + xc_cppc_para_t cppc_para;
> > +
> > + ret = xc_get_cppc_para(xc_handle, cpuid, &cppc_para);
> > + if ( !ret )
> > + print_cppc_para(cpuid, &cppc_para);
> > + else if ( errno == ENODEV )
> > + ret = 0; /* Ignore unsupported platform */
> > + else
> > + fprintf(stderr, "[CPU%u] failed to get cppc parameter\n",
> > + cpuid);
>
> You might want to add ": %s" strerror(errno) to the error printed, which could help
> figure out why we failed to get the parameters.
>
Ack
>
> The rest of the tool side of the patch, with Jan suggestion, looks good to me, so
> Acked-by: Anthony PERARD <anthony.perard@vates.tech> for the next round.
>
Thanks
> Thanks,
>
> --
> Anthony PERARD
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-28 4:06 ` Penny, Zheng
@ 2025-08-28 6:35 ` Jan Beulich
2025-08-28 6:37 ` Jan Beulich
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-28 6:35 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 28.08.2025 06:06, Penny, Zheng wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, August 26, 2025 12:03 AM
>>
>> On 22.08.2025 12:52, Penny Zheng wrote:
>>> --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
>>> +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
>>> + /* Only allow values if params bit is set. */
>>> + if ( (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
>>> + set_cppc->desired) ||
>>> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
>>> + set_cppc->minimum) ||
>>> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
>>> + set_cppc->maximum) ||
>>> + (!(set_cppc->set_params &
>> XEN_SYSCTL_CPPC_SET_ENERGY_PERF) &&
>>> + set_cppc->energy_perf) )
>>> + return -EINVAL;
>>
>> ... all the errors checked here are to be ignored when no flag is set at all?
>
> Yes, values are only meaningful when according flag is properly set, which has been described in the comment for "struct xen_set_cppc_para"
Especially since you stripped the initial part of this comment of mine, it feels
as if you misunderstood my request. What it boils down to is the question whether
"if ( set_cppc->set_params == 0 )" shouldn't move after the if() you left in
context above.
>>> + /*
>>> + * Validate all parameters
>>> + * Maximum performance may be set to any performance value in the range
>>> + * [Nonlinear Lowest Performance, Highest Performance], inclusive but
>> must
>>> + * be set to a value that is larger than or equal to minimum Performance.
>>> + */
>>> + if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
>>> + (set_cppc->maximum > data->caps.highest_perf ||
>>> + set_cppc->maximum <
>>> + (set_cppc->set_params &
>> XEN_SYSCTL_CPPC_SET_MINIMUM
>>> + ? set_cppc->minimum
>>> + : data->req.min_perf)) )
>>
>> Too deep indentation (more of this throughout the function), and seeing ...
>
> Maybe four indention is more proper
> ```
> if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
> (set_cppc->maximum > data->caps.highest_perf ||
> (set_cppc->maximum <
> (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM
> ? set_cppc->minimum
> : data->req.min_perf))) )
> ```
No. In expressions you always want to indent according to pending open
parentheses:
if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
(set_cppc->maximum > data->caps.highest_perf ||
(set_cppc->maximum <
(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM
? set_cppc->minimum
: data->req.min_perf))) )
>>> + case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
>>> + if ( active_mode )
>>> + policy->policy = CPUFREQ_POLICY_UNKNOWN;
>>> + break;
>>> +
>>> + default:
>>> + return -EINVAL;
>>> + }
>>
>> Much of this looks very similar to what patch 09 introduces in
>> amd_cppc_epp_set_policy(). Is it not possible to reduce the redundancy?
>>
>
> I'll add a new helper to amd_cppc_prepare_policy() to extract common
>
>>> --- a/xen/include/public/sysctl.h
>>> +++ b/xen/include/public/sysctl.h
>>> @@ -336,8 +336,14 @@ struct xen_ondemand {
>>> uint32_t up_threshold;
>>> };
>>>
>>> +#define CPUFREQ_POLICY_UNKNOWN 0
>>> +#define CPUFREQ_POLICY_POWERSAVE 1
>>> +#define CPUFREQ_POLICY_PERFORMANCE 2
>>> +#define CPUFREQ_POLICY_ONDEMAND 3
>>
>> Without XEN_ prefixes they shouldn't appear in a public header. But do we
>> need ...
>>
>>> struct xen_get_cppc_para {
>>> /* OUT */
>>> + uint32_t policy; /* CPUFREQ_POLICY_xxx */
>>
>> ... the new field at all? Can't you synthesize the kind-of-governor into struct
>> xen_get_cpufreq_para's respective field? You invoke both sub-ops from xenpm
>> now anyway ...
>>
>
> Maybe I could borrow governor field to indicate policy info, like the following in print_cpufreq_para(), then we don't need to add the new filed "policy"
> ```
> + /* Translate governor info to policy info in CPPC active mode */
> + if ( is_cppc_active )
> + {
> + if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> + "ondemand", CPUFREQ_NAME_LEN) )
> + printf("cppc policy : ondemand\n");
> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> + "performance", CPUFREQ_NAME_LEN) )
> + printf("cppc policy : performance\n");
> +
> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> + "powersave", CPUFREQ_NAME_LEN) )
> + printf("cppc policy : powersave\n");
> + else
> + printf("cppc policy : unknown\n");
> + }
> +
> ```
Something like this is what I was thinking of, yes.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-28 6:35 ` Jan Beulich
@ 2025-08-28 6:37 ` Jan Beulich
2025-08-28 6:54 ` Penny, Zheng
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-28 6:37 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 28.08.2025 08:35, Jan Beulich wrote:
> On 28.08.2025 06:06, Penny, Zheng wrote:
>>> -----Original Message-----
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: Tuesday, August 26, 2025 12:03 AM
>>>
>>> On 22.08.2025 12:52, Penny Zheng wrote:
>>>> --- a/xen/include/public/sysctl.h
>>>> +++ b/xen/include/public/sysctl.h
>>>> @@ -336,8 +336,14 @@ struct xen_ondemand {
>>>> uint32_t up_threshold;
>>>> };
>>>>
>>>> +#define CPUFREQ_POLICY_UNKNOWN 0
>>>> +#define CPUFREQ_POLICY_POWERSAVE 1
>>>> +#define CPUFREQ_POLICY_PERFORMANCE 2
>>>> +#define CPUFREQ_POLICY_ONDEMAND 3
>>>
>>> Without XEN_ prefixes they shouldn't appear in a public header. But do we
>>> need ...
>>>
>>>> struct xen_get_cppc_para {
>>>> /* OUT */
>>>> + uint32_t policy; /* CPUFREQ_POLICY_xxx */
>>>
>>> ... the new field at all? Can't you synthesize the kind-of-governor into struct
>>> xen_get_cpufreq_para's respective field? You invoke both sub-ops from xenpm
>>> now anyway ...
>>>
>>
>> Maybe I could borrow governor field to indicate policy info, like the following in print_cpufreq_para(), then we don't need to add the new filed "policy"
>> ```
>> + /* Translate governor info to policy info in CPPC active mode */
>> + if ( is_cppc_active )
>> + {
>> + if ( !strncmp(p_cpufreq->u.s.scaling_governor,
>> + "ondemand", CPUFREQ_NAME_LEN) )
>> + printf("cppc policy : ondemand\n");
>> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
>> + "performance", CPUFREQ_NAME_LEN) )
>> + printf("cppc policy : performance\n");
>> +
>> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
>> + "powersave", CPUFREQ_NAME_LEN) )
>> + printf("cppc policy : powersave\n");
>> + else
>> + printf("cppc policy : unknown\n");
>> + }
>> +
>> ```
>
> Something like this is what I was thinking of, yes.
Albeit - why the complicated if/else sequence? Why not simply print
the field the hypercall returned?
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-28 6:37 ` Jan Beulich
@ 2025-08-28 6:54 ` Penny, Zheng
2025-08-28 7:09 ` Jan Beulich
0 siblings, 1 reply; 43+ messages in thread
From: Penny, Zheng @ 2025-08-28 6:54 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, August 28, 2025 2:38 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
> xen_sysctl_pm_op for amd-cppc driver
>
> On 28.08.2025 08:35, Jan Beulich wrote:
> > On 28.08.2025 06:06, Penny, Zheng wrote:
> >>> -----Original Message-----
> >>> From: Jan Beulich <jbeulich@suse.com>
> >>> Sent: Tuesday, August 26, 2025 12:03 AM
> >>>
> >>> On 22.08.2025 12:52, Penny Zheng wrote:
> >>>> --- a/xen/include/public/sysctl.h
> >>>> +++ b/xen/include/public/sysctl.h
> >>>> @@ -336,8 +336,14 @@ struct xen_ondemand {
> >>>> uint32_t up_threshold;
> >>>> };
> >>>>
> >>>> +#define CPUFREQ_POLICY_UNKNOWN 0
> >>>> +#define CPUFREQ_POLICY_POWERSAVE 1
> >>>> +#define CPUFREQ_POLICY_PERFORMANCE 2
> >>>> +#define CPUFREQ_POLICY_ONDEMAND 3
> >>>
> >>> Without XEN_ prefixes they shouldn't appear in a public header. But
> >>> do we need ...
> >>>
> >>>> struct xen_get_cppc_para {
> >>>> /* OUT */
> >>>> + uint32_t policy; /* CPUFREQ_POLICY_xxx */
> >>>
> >>> ... the new field at all? Can't you synthesize the kind-of-governor
> >>> into struct xen_get_cpufreq_para's respective field? You invoke both
> >>> sub-ops from xenpm now anyway ...
> >>>
> >>
> >> Maybe I could borrow governor field to indicate policy info, like the following in
> print_cpufreq_para(), then we don't need to add the new filed "policy"
> >> ```
> >> + /* Translate governor info to policy info in CPPC active mode */
> >> + if ( is_cppc_active )
> >> + {
> >> + if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> >> + "ondemand", CPUFREQ_NAME_LEN) )
> >> + printf("cppc policy : ondemand\n");
> >> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> >> + "performance", CPUFREQ_NAME_LEN) )
> >> + printf("cppc policy : performance\n");
> >> +
> >> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> >> + "powersave", CPUFREQ_NAME_LEN) )
> >> + printf("cppc policy : powersave\n");
> >> + else
> >> + printf("cppc policy : unknown\n");
> >> + }
> >> +
> >> ```
> >
> > Something like this is what I was thinking of, yes.
>
> Albeit - why the complicated if/else sequence? Why not simply print the field the
> hypercall returned?
>
userspace governor doesn't have according policy. I could simplify it to
```
if ( !strncmp(p_cpufreq->u.s.scaling_governor,
"userspace", CPUFREQ_NAME_LEN) )
printf("policy : unknown\n");
else
printf("policy : %s\n",
p_cpufreq->u.s.scaling_governor);
```
> Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-28 6:54 ` Penny, Zheng
@ 2025-08-28 7:09 ` Jan Beulich
2025-08-28 7:52 ` Penny, Zheng
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2025-08-28 7:09 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 28.08.2025 08:54, Penny, Zheng wrote:
> [Public]
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, August 28, 2025 2:38 PM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
>> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
>> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
>> xen_sysctl_pm_op for amd-cppc driver
>>
>> On 28.08.2025 08:35, Jan Beulich wrote:
>>> On 28.08.2025 06:06, Penny, Zheng wrote:
>>>>> -----Original Message-----
>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>> Sent: Tuesday, August 26, 2025 12:03 AM
>>>>>
>>>>> On 22.08.2025 12:52, Penny Zheng wrote:
>>>>>> --- a/xen/include/public/sysctl.h
>>>>>> +++ b/xen/include/public/sysctl.h
>>>>>> @@ -336,8 +336,14 @@ struct xen_ondemand {
>>>>>> uint32_t up_threshold;
>>>>>> };
>>>>>>
>>>>>> +#define CPUFREQ_POLICY_UNKNOWN 0
>>>>>> +#define CPUFREQ_POLICY_POWERSAVE 1
>>>>>> +#define CPUFREQ_POLICY_PERFORMANCE 2
>>>>>> +#define CPUFREQ_POLICY_ONDEMAND 3
>>>>>
>>>>> Without XEN_ prefixes they shouldn't appear in a public header. But
>>>>> do we need ...
>>>>>
>>>>>> struct xen_get_cppc_para {
>>>>>> /* OUT */
>>>>>> + uint32_t policy; /* CPUFREQ_POLICY_xxx */
>>>>>
>>>>> ... the new field at all? Can't you synthesize the kind-of-governor
>>>>> into struct xen_get_cpufreq_para's respective field? You invoke both
>>>>> sub-ops from xenpm now anyway ...
>>>>>
>>>>
>>>> Maybe I could borrow governor field to indicate policy info, like the following in
>> print_cpufreq_para(), then we don't need to add the new filed "policy"
>>>> ```
>>>> + /* Translate governor info to policy info in CPPC active mode */
>>>> + if ( is_cppc_active )
>>>> + {
>>>> + if ( !strncmp(p_cpufreq->u.s.scaling_governor,
>>>> + "ondemand", CPUFREQ_NAME_LEN) )
>>>> + printf("cppc policy : ondemand\n");
>>>> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
>>>> + "performance", CPUFREQ_NAME_LEN) )
>>>> + printf("cppc policy : performance\n");
>>>> +
>>>> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
>>>> + "powersave", CPUFREQ_NAME_LEN) )
>>>> + printf("cppc policy : powersave\n");
>>>> + else
>>>> + printf("cppc policy : unknown\n");
>>>> + }
>>>> +
>>>> ```
>>>
>>> Something like this is what I was thinking of, yes.
>>
>> Albeit - why the complicated if/else sequence? Why not simply print the field the
>> hypercall returned?
>
> userspace governor doesn't have according policy. I could simplify it to
> ```
> if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> "userspace", CPUFREQ_NAME_LEN) )
> printf("policy : unknown\n");
> else
> printf("policy : %s\n",
> p_cpufreq->u.s.scaling_governor);
> ```
But the hypervisor shouldn't report back "userspace" when the CPPC driver
is in use. ANd I think the tool is okay to trust the hypervisor.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-28 7:09 ` Jan Beulich
@ 2025-08-28 7:52 ` Penny, Zheng
0 siblings, 0 replies; 43+ messages in thread
From: Penny, Zheng @ 2025-08-28 7:52 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, August 28, 2025 3:09 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
> xen_sysctl_pm_op for amd-cppc driver
>
> On 28.08.2025 08:54, Penny, Zheng wrote:
> > [Public]
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Thursday, August 28, 2025 2:38 PM
> >> To: Penny, Zheng <penny.zheng@amd.com>
> >> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> >> <anthony.perard@vates.tech>; Andrew Cooper
> >> <andrew.cooper3@citrix.com>; Orzel, Michal <Michal.Orzel@amd.com>;
> >> Julien Grall <julien@xen.org>; Roger Pau Monné
> >> <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> >> xen- devel@lists.xenproject.org
> >> Subject: Re: [PATCH v7 13/13] xen/cpufreq: Adapt
> SET/GET_CPUFREQ_CPPC
> >> xen_sysctl_pm_op for amd-cppc driver
> >>
> >> On 28.08.2025 08:35, Jan Beulich wrote:
> >>> On 28.08.2025 06:06, Penny, Zheng wrote:
> >>>>> -----Original Message-----
> >>>>> From: Jan Beulich <jbeulich@suse.com>
> >>>>> Sent: Tuesday, August 26, 2025 12:03 AM
> >>>>>
> >>>>> On 22.08.2025 12:52, Penny Zheng wrote:
> >>>>>> --- a/xen/include/public/sysctl.h
> >>>>>> +++ b/xen/include/public/sysctl.h
> >>>>>> @@ -336,8 +336,14 @@ struct xen_ondemand {
> >>>>>> uint32_t up_threshold;
> >>>>>> };
> >>>>>>
> >>>>>> +#define CPUFREQ_POLICY_UNKNOWN 0
> >>>>>> +#define CPUFREQ_POLICY_POWERSAVE 1
> >>>>>> +#define CPUFREQ_POLICY_PERFORMANCE 2
> >>>>>> +#define CPUFREQ_POLICY_ONDEMAND 3
> >>>>>
> >>>>> Without XEN_ prefixes they shouldn't appear in a public header.
> >>>>> But do we need ...
> >>>>>
> >>>>>> struct xen_get_cppc_para {
> >>>>>> /* OUT */
> >>>>>> + uint32_t policy; /* CPUFREQ_POLICY_xxx */
> >>>>>
> >>>>> ... the new field at all? Can't you synthesize the
> >>>>> kind-of-governor into struct xen_get_cpufreq_para's respective
> >>>>> field? You invoke both sub-ops from xenpm now anyway ...
> >>>>>
> >>>>
> >>>> Maybe I could borrow governor field to indicate policy info, like
> >>>> the following in
> >> print_cpufreq_para(), then we don't need to add the new filed "policy"
> >>>> ```
> >>>> + /* Translate governor info to policy info in CPPC active mode */
> >>>> + if ( is_cppc_active )
> >>>> + {
> >>>> + if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> >>>> + "ondemand", CPUFREQ_NAME_LEN) )
> >>>> + printf("cppc policy : ondemand\n");
> >>>> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> >>>> + "performance", CPUFREQ_NAME_LEN) )
> >>>> + printf("cppc policy : performance\n");
> >>>> +
> >>>> + else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> >>>> + "powersave", CPUFREQ_NAME_LEN) )
> >>>> + printf("cppc policy : powersave\n");
> >>>> + else
> >>>> + printf("cppc policy : unknown\n");
> >>>> + }
> >>>> +
> >>>> ```
> >>>
> >>> Something like this is what I was thinking of, yes.
> >>
> >> Albeit - why the complicated if/else sequence? Why not simply print
> >> the field the hypercall returned?
> >
> > userspace governor doesn't have according policy. I could simplify it
> > to ```
> > if ( !strncmp(p_cpufreq->u.s.scaling_governor,
> > "userspace", CPUFREQ_NAME_LEN) )
> > printf("policy : unknown\n");
> > else
> > printf("policy : %s\n",
> > p_cpufreq->u.s.scaling_governor); ```
>
> But the hypervisor shouldn't report back "userspace" when the CPPC driver is in
> use. ANd I think the tool is okay to trust the hypervisor.
True, we shall make sure governor is set properly in hypervisor side even in cppc mode
>
> Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2025-08-28 7:52 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 10:52 [PATCH v7 00/13] amd-cppc CPU Performance Scaling Driver Penny Zheng
2025-08-22 10:52 ` [PATCH v7 01/13] tools: drop "has_num" condition check for cppc mode Penny Zheng
2025-08-22 10:52 ` [PATCH v7 02/13] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para" Penny Zheng
2025-08-22 10:52 ` [PATCH v7 03/13] tools: fix help info for "xenpm set-cpufreq-cppc" Penny Zheng
2025-08-25 14:30 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 04/13] xen/cpufreq: add missing default: case for x86 vendor Penny Zheng
2025-08-25 14:43 ` Jan Beulich
2025-08-26 4:23 ` Penny, Zheng
2025-08-22 10:52 ` [PATCH v7 05/13] xen/cpufreq: refactor cmdline "cpufreq=xxx" Penny Zheng
2025-08-25 14:45 ` Jan Beulich
2025-08-26 7:38 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 06/13] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data Penny Zheng
2025-08-25 15:01 ` Jan Beulich
2025-08-26 5:53 ` Penny, Zheng
2025-08-26 5:58 ` Jan Beulich
2025-08-26 6:38 ` Penny, Zheng
2025-08-22 10:52 ` [PATCH v7 07/13] xen/cpufreq: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver Penny Zheng
2025-08-25 15:07 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 08/13] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode Penny Zheng
2025-08-25 15:14 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 09/13] xen/cpufreq: implement amd-cppc-epp driver for CPPC in active mode Penny Zheng
2025-08-25 15:19 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 10/13] xen/cpufreq: get performance policy from governor set via xenpm Penny Zheng
2025-08-25 15:23 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 11/13] tools/cpufreq: extract CPPC para from cpufreq para Penny Zheng
2025-08-25 15:36 ` Jan Beulich
2025-08-26 8:21 ` Penny, Zheng
2025-08-26 8:32 ` Jan Beulich
2025-08-26 9:12 ` Penny, Zheng
2025-08-27 15:22 ` Anthony PERARD
2025-08-28 4:14 ` Penny, Zheng
2025-08-22 10:52 ` [PATCH v7 12/13] xen/cpufreq: bypass governor-related para for amd-cppc-epp Penny Zheng
2025-08-25 15:44 ` Jan Beulich
2025-08-22 10:52 ` [PATCH v7 13/13] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver Penny Zheng
2025-08-25 16:02 ` Jan Beulich
2025-08-28 4:06 ` Penny, Zheng
2025-08-28 6:35 ` Jan Beulich
2025-08-28 6:37 ` Jan Beulich
2025-08-28 6:54 ` Penny, Zheng
2025-08-28 7:09 ` Jan Beulich
2025-08-28 7:52 ` Penny, Zheng
2025-08-27 15:58 ` Anthony PERARD
2025-08-27 16:08 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.