* [PATCH v6 01/19] xen/amd: introduce amd_process_freq() to get processor frequency
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-11 3:50 ` [PATCH v6 02/19] tools: drop "has_num" condition check for cppc mode Penny Zheng
` (17 subsequent siblings)
18 siblings, 0 replies; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Jan Beulich, Andrew Cooper,
Roger Pau Monné
When _CPC table could not provide processor frequency range
values for Xen governor, we need to read processor max frequency
as anchor point.
So we extract amd cpu core frequency calculation logic from amd_log_freq(),
and wrap it as a new helper amd_process_freq().
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v1 -> v2:
- new commit
---
v3 -> v4
- introduce amd_process_freq()
---
v4 -> v5:
- make amd_process_freq() static to statisfy Misra demand
- change "low_mhz", "nom_mhz" and "hi_mhz" parameter to unsigned int
- fix order of logged frequencies
---
v5 -> v6:
- fix bogus non-zero check
---
xen/arch/x86/cpu/amd.c | 58 +++++++++++++++++++++++++++++-------------
1 file changed, 40 insertions(+), 18 deletions(-)
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index f10e762d76..eb428f284e 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -613,14 +613,15 @@ static unsigned int attr_const amd_parse_freq(unsigned int family,
return freq;
}
-void amd_log_freq(const struct cpuinfo_x86 *c)
+static void amd_process_freq(const struct cpuinfo_x86 *c,
+ unsigned int *low_mhz,
+ unsigned int *nom_mhz,
+ unsigned int *hi_mhz)
{
unsigned int idx = 0, h;
uint64_t hi, lo, val;
- if (c->x86 < 0x10 || c->x86 > 0x1A ||
- (c != &boot_cpu_data &&
- (!opt_cpu_info || (c->apicid & (c->x86_num_siblings - 1)))))
+ if (c->x86 < 0x10 || c->x86 > 0x1A)
return;
if (c->x86 < 0x17) {
@@ -701,20 +702,20 @@ void amd_log_freq(const struct cpuinfo_x86 *c)
if (idx && idx < h &&
!rdmsr_safe(0xC0010064 + idx, val) && (val >> 63) &&
- !rdmsr_safe(0xC0010064, hi) && (hi >> 63))
- printk("CPU%u: %u (%u ... %u) MHz\n",
- smp_processor_id(),
- amd_parse_freq(c->x86, val),
- amd_parse_freq(c->x86, lo),
- amd_parse_freq(c->x86, hi));
- else if (h && !rdmsr_safe(0xC0010064, hi) && (hi >> 63))
- printk("CPU%u: %u ... %u MHz\n",
- smp_processor_id(),
- amd_parse_freq(c->x86, lo),
- amd_parse_freq(c->x86, hi));
- else
- printk("CPU%u: %u MHz\n", smp_processor_id(),
- amd_parse_freq(c->x86, lo));
+ !rdmsr_safe(0xC0010064, hi) && (hi >> 63)) {
+ if (nom_mhz)
+ *nom_mhz = amd_parse_freq(c->x86, val);
+ if (low_mhz)
+ *low_mhz = amd_parse_freq(c->x86, lo);
+ if (hi_mhz)
+ *hi_mhz = amd_parse_freq(c->x86, hi);
+ } else if (h && !rdmsr_safe(0xC0010064, hi) && (hi >> 63)) {
+ if (low_mhz)
+ *low_mhz = amd_parse_freq(c->x86, lo);
+ if (hi_mhz)
+ *hi_mhz = amd_parse_freq(c->x86, hi);
+ } else if (low_mhz)
+ *low_mhz = amd_parse_freq(c->x86, lo);
}
void cf_check early_init_amd(struct cpuinfo_x86 *c)
@@ -725,6 +726,27 @@ void cf_check early_init_amd(struct cpuinfo_x86 *c)
ctxt_switch_levelling(NULL);
}
+void amd_log_freq(const struct cpuinfo_x86 *c)
+{
+ unsigned int low_mhz = 0, nom_mhz = 0, hi_mhz = 0;
+
+ if (c != &boot_cpu_data &&
+ (!opt_cpu_info || (c->apicid & (c->x86_num_siblings - 1))))
+ return;
+
+ amd_process_freq(c, &low_mhz, &nom_mhz, &hi_mhz);
+
+ if (low_mhz && nom_mhz && hi_mhz)
+ printk("CPU%u: %u (%u ... %u) MHz\n",
+ smp_processor_id(),
+ nom_mhz, low_mhz, hi_mhz);
+ else if (low_mhz && hi_mhz)
+ printk("CPU%u: %u ... %u MHz\n",
+ smp_processor_id(), low_mhz, hi_mhz);
+ else if (low_mhz)
+ printk("CPU%u: %u MHz\n", smp_processor_id(), low_mhz);
+}
+
void amd_init_lfence(struct cpuinfo_x86 *c)
{
uint64_t value;
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* [PATCH v6 02/19] tools: drop "has_num" condition check for cppc mode
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
2025-07-11 3:50 ` [PATCH v6 01/19] xen/amd: introduce amd_process_freq() to get processor frequency Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-11 6:42 ` Jan Beulich
2025-07-28 12:53 ` Anthony PERARD
2025-07-11 3:50 ` [PATCH v6 03/19] tools: optimize cpufreq average freq print Penny Zheng
` (16 subsequent siblings)
18 siblings, 2 replies; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Anthony PERARD, Juergen Gross
In `xenpm get-cpufreq-para <cpuid>`, ->freq_num and ->cpu_num checking are
tied together via variable "has_num", while ->freq_num only has non-zero value
when cpufreq driver in legacy P-states mode.
So we drop the "has_num" condition check, and mirror the ->gov_num check for
both ->freq_num and ->cpu_num in xc_get_cpufreq_para().
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v3 -> v4:
- drop the "has_num" condition check
---
v4 -> v5:
- refactor title and commit
- make all three pieces (xc_hypercall_bounce_pre()) be as similar as possible
---
v5 -> v6:
- move set_xen_guest_handle() up to the bottom of the identical conditional
---
tools/libs/ctrl/xc_pm.c | 41 +++++++++++++++++++++--------------------
1 file changed, 21 insertions(+), 20 deletions(-)
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index 1f2430cac2..6fda973f1f 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -210,33 +210,36 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
DECLARE_NAMED_HYPERCALL_BOUNCE(scaling_available_governors,
user_para->scaling_available_governors,
user_para->gov_num * CPUFREQ_NAME_LEN * sizeof(char), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
- bool has_num = user_para->cpu_num && user_para->freq_num;
- if ( has_num )
+ if ( (user_para->cpu_num && !user_para->affected_cpus) ||
+ (user_para->freq_num && !user_para->scaling_available_frequencies) ||
+ (user_para->gov_num && !user_para->scaling_available_governors) )
+ {
+ errno = EINVAL;
+ return -1;
+ }
+ if ( user_para->cpu_num )
{
- if ( (!user_para->affected_cpus) ||
- (!user_para->scaling_available_frequencies) ||
- (user_para->gov_num && !user_para->scaling_available_governors) )
- {
- errno = EINVAL;
- return -1;
- }
ret = xc_hypercall_bounce_pre(xch, affected_cpus);
if ( ret )
return ret;
+ set_xen_guest_handle(sys_para->affected_cpus, affected_cpus);
+ }
+ if ( user_para->freq_num )
+ {
ret = xc_hypercall_bounce_pre(xch, scaling_available_frequencies);
if ( ret )
goto unlock_2;
- if ( user_para->gov_num )
- ret = xc_hypercall_bounce_pre(xch, scaling_available_governors);
+ set_xen_guest_handle(sys_para->scaling_available_frequencies,
+ scaling_available_frequencies);
+ }
+ if ( user_para->gov_num )
+ {
+ ret = xc_hypercall_bounce_pre(xch, scaling_available_governors);
if ( ret )
goto unlock_3;
-
- set_xen_guest_handle(sys_para->affected_cpus, affected_cpus);
- set_xen_guest_handle(sys_para->scaling_available_frequencies, scaling_available_frequencies);
- if ( user_para->gov_num )
- set_xen_guest_handle(sys_para->scaling_available_governors,
- scaling_available_governors);
+ set_xen_guest_handle(sys_para->scaling_available_governors,
+ scaling_available_governors);
}
sysctl.cmd = XEN_SYSCTL_pm_op;
@@ -256,9 +259,7 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
user_para->gov_num = sys_para->gov_num;
}
- if ( has_num )
- goto unlock_4;
- return ret;
+ goto unlock_4;
}
else
{
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 02/19] tools: drop "has_num" condition check for cppc mode
2025-07-11 3:50 ` [PATCH v6 02/19] tools: drop "has_num" condition check for cppc mode Penny Zheng
@ 2025-07-11 6:42 ` Jan Beulich
2025-07-28 12:53 ` Anthony PERARD
1 sibling, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-11 6:42 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Anthony PERARD, Juergen Gross, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> In `xenpm get-cpufreq-para <cpuid>`, ->freq_num and ->cpu_num checking are
> tied together via variable "has_num", while ->freq_num only has non-zero value
> when cpufreq driver in legacy P-states mode.
>
> So we drop the "has_num" condition check, and mirror the ->gov_num check for
> both ->freq_num and ->cpu_num in xc_get_cpufreq_para().
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 02/19] tools: drop "has_num" condition check for cppc mode
2025-07-11 3:50 ` [PATCH v6 02/19] tools: drop "has_num" condition check for cppc mode Penny Zheng
2025-07-11 6:42 ` Jan Beulich
@ 2025-07-28 12:53 ` Anthony PERARD
1 sibling, 0 replies; 66+ messages in thread
From: Anthony PERARD @ 2025-07-28 12:53 UTC (permalink / raw)
To: Penny Zheng; +Cc: xen-devel, ray.huang, Anthony PERARD, Juergen Gross
On Fri, Jul 11, 2025 at 11:50:49AM +0800, Penny Zheng wrote:
> In `xenpm get-cpufreq-para <cpuid>`, ->freq_num and ->cpu_num checking are
> tied together via variable "has_num", while ->freq_num only has non-zero value
> when cpufreq driver in legacy P-states mode.
>
> So we drop the "has_num" condition check, and mirror the ->gov_num check for
> both ->freq_num and ->cpu_num in xc_get_cpufreq_para().
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
Thanks,
--
Anthony PERARD
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 03/19] tools: optimize cpufreq average freq print
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
2025-07-11 3:50 ` [PATCH v6 01/19] xen/amd: introduce amd_process_freq() to get processor frequency Penny Zheng
2025-07-11 3:50 ` [PATCH v6 02/19] tools: drop "has_num" condition check for cppc mode Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 14:43 ` Jan Beulich
2025-07-11 3:50 ` [PATCH v6 04/19] x86/cpufreq: continue looping other than -EBUSY or successful return Penny Zheng
` (15 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Anthony PERARD
Unlike Cx/Px states, for which we need an extra loop to summerize residency (
sum_cx[]/sum_px[]), we could call get_avgfreq_by_cpuid() right before printing.
Also, with later introduction of CPPC mode, average frequency print shall
not depend on the existence of legacy P-states, so we remove "px_cap"
dependancy check.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v3 -> v4:
- new commit
---
v4 -> v5:
- refactor title and commit message
- call get_avgfreq_by_cpuid() right before printing
---
v5 -> v6:
- remove "Fixes:xxx"
---
tools/misc/xenpm.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index d5387f5f06..bbe45fa548 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -510,9 +510,6 @@ static void signal_int_handler(int signo)
pxstat_start[i].pt[j].residency;
}
- for ( i = 0; i < max_cpu_nr; i++ )
- get_avgfreq_by_cpuid(xc_handle, i, &avgfreq[i]);
-
printf("Elapsed time (ms): %"PRIu64"\n", (usec_end - usec_start) / 1000UL);
for ( i = 0; i < max_cpu_nr; i++ )
{
@@ -553,7 +550,8 @@ static void signal_int_handler(int signo)
res / 1000000UL, 100UL * res / (double)sum_px[i]);
}
}
- if ( px_cap && avgfreq[i] )
+ get_avgfreq_by_cpuid(xc_handle, i, &avgfreq[i]);
+ if ( avgfreq[i] )
printf(" Avg freq\t%d\tKHz\n", avgfreq[i]);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 03/19] tools: optimize cpufreq average freq print
2025-07-11 3:50 ` [PATCH v6 03/19] tools: optimize cpufreq average freq print Penny Zheng
@ 2025-07-16 14:43 ` Jan Beulich
2025-07-28 13:03 ` Anthony PERARD
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 14:43 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Anthony PERARD, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> Unlike Cx/Px states, for which we need an extra loop to summerize residency (
> sum_cx[]/sum_px[]), we could call get_avgfreq_by_cpuid() right before printing.
> Also, with later introduction of CPPC mode, average frequency print shall
> not depend on the existence of legacy P-states, so we remove "px_cap"
> dependancy check.
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 03/19] tools: optimize cpufreq average freq print
2025-07-16 14:43 ` Jan Beulich
@ 2025-07-28 13:03 ` Anthony PERARD
0 siblings, 0 replies; 66+ messages in thread
From: Anthony PERARD @ 2025-07-28 13:03 UTC (permalink / raw)
To: Jan Beulich; +Cc: Penny Zheng, ray.huang, Anthony PERARD, xen-devel
On Wed, Jul 16, 2025 at 04:43:32PM +0200, Jan Beulich wrote:
> On 11.07.2025 05:50, Penny Zheng wrote:
> > Unlike Cx/Px states, for which we need an extra loop to summerize residency (
> > sum_cx[]/sum_px[]), we could call get_avgfreq_by_cpuid() right before printing.
> > Also, with later introduction of CPPC mode, average frequency print shall
> > not depend on the existence of legacy P-states, so we remove "px_cap"
> > dependancy check.
> >
> > Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Thanks,
--
Anthony PERARD
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 04/19] x86/cpufreq: continue looping other than -EBUSY or successful return
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (2 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 03/19] tools: optimize cpufreq average freq print Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 14:47 ` Jan Beulich
2025-07-11 3:50 ` [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx" Penny Zheng
` (14 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Jan Beulich, Andrew Cooper,
Roger Pau Monné
Right now, only when we failed cpufreq driver registration with -ENODEV, we get
the chance to try the fallback option.
There are two code path erroring out other than -ENODEV in cpufreq driver
registration: one is when the driver itself is broken, like missing mandatory
hooks, cpufreq_register_driver() will fail with -EINVAL, in which we shall
be able to try the fallback option, and the other is -EBUSY due to repeated
registration, in which we shall just exit the loop.
So in conclusion, when error code is -EBUSY or successful return, both
indicating a proper driver is already registered, we shall bail the loop,
other than that, we shall continue to try the fallback option.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v5 -> v6:
- new commit
---
xen/arch/x86/acpi/cpufreq/cpufreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 61e98b67bd..45f301f354 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -150,7 +150,7 @@ static int __init cf_check cpufreq_driver_init(void)
break;
}
- if ( ret != -ENODEV )
+ if ( !ret || ret == -EBUSY )
break;
}
break;
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 04/19] x86/cpufreq: continue looping other than -EBUSY or successful return
2025-07-11 3:50 ` [PATCH v6 04/19] x86/cpufreq: continue looping other than -EBUSY or successful return Penny Zheng
@ 2025-07-16 14:47 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 14:47 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Andrew Cooper, Roger Pau Monné, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> Right now, only when we failed cpufreq driver registration with -ENODEV, we get
> the chance to try the fallback option.
> There are two code path erroring out other than -ENODEV in cpufreq driver
> registration: one is when the driver itself is broken, like missing mandatory
> hooks, cpufreq_register_driver() will fail with -EINVAL, in which we shall
> be able to try the fallback option, and the other is -EBUSY due to repeated
> registration, in which we shall just exit the loop.
>
> So in conclusion, when error code is -EBUSY or successful return, both
> indicating a proper driver is already registered, we shall bail the loop,
> other than that, we shall continue to try the fallback option.
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(and perhaps also a Reported-by: or Suggested-by: ...)
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (3 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 04/19] x86/cpufreq: continue looping other than -EBUSY or successful return Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 15:00 ` Jan Beulich
2025-07-11 3:50 ` [PATCH v6 06/19] xen/cpufreq: make _PSD info common Penny Zheng
` (13 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Jan Beulich
A helper function handle_cpufreq_cmdline() is introduced to tidy different
handling pathes.
We also add a new helper cpufreq_opts_contain() to ignore redundant setting,
like "cpufreq=hwp;hwp;xen"
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v2 -> v3:
- new commit
---
v3 -> v4:
- add one single helper to do the tidy work
- ignore and warn user redundant setting
---
v4 -> v5:
- make "cpufreq_opts_str" static and the string literals end up in
.init.rodata.
- use "CPUFREQ_xxx" as array slot index
- blank line between non-fall-through case blocks
---
v5 -> v6:
- change to "while ( count-- )"
- remove unnecessary warning
- add an assertion to ensure not overruning the array
- add ASSERT_UNREACHABLE()
- check ret of handle_cpufreq_cmdline() and error out
---
xen/drivers/cpufreq/cpufreq.c | 59 ++++++++++++++++++++++++------
xen/include/acpi/cpufreq/cpufreq.h | 3 +-
2 files changed, 49 insertions(+), 13 deletions(-)
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 564f926341..887bc5953d 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -64,12 +64,53 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
/* set xen as default cpufreq */
enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
-enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
- CPUFREQ_none };
+enum cpufreq_xen_opt __initdata cpufreq_xen_opts[NR_CPUFREQ_OPTS] = {
+ CPUFREQ_xen,
+ CPUFREQ_none
+};
unsigned int __initdata cpufreq_xen_cnt = 1;
static int __init cpufreq_cmdline_parse(const char *s, const char *e);
+static bool __init cpufreq_opts_contain(enum cpufreq_xen_opt option)
+{
+ unsigned int count = cpufreq_xen_cnt;
+
+ while ( count-- )
+ {
+ if ( cpufreq_xen_opts[count] == option )
+ return true;
+ }
+
+ return false;
+}
+
+static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
+{
+ int ret = 0;
+
+ if ( cpufreq_opts_contain(option) )
+ return 0;
+
+ cpufreq_controller = FREQCTL_xen;
+ ASSERT(cpufreq_xen_cnt < NR_CPUFREQ_OPTS);
+ cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
+ switch ( option )
+ {
+ case CPUFREQ_hwp:
+ case CPUFREQ_xen:
+ xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
+ break;
+
+ default:
+ ASSERT_UNREACHABLE();
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
static int __init cf_check setup_cpufreq_option(const char *str)
{
const char *arg = strpbrk(str, ",:;");
@@ -113,21 +154,15 @@ static int __init cf_check setup_cpufreq_option(const char *str)
if ( choice > 0 || !cmdline_strcmp(str, "xen") )
{
- xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
- cpufreq_controller = FREQCTL_xen;
- cpufreq_xen_opts[cpufreq_xen_cnt++] = CPUFREQ_xen;
- ret = 0;
- if ( arg[0] && arg[1] )
+ ret = handle_cpufreq_cmdline(CPUFREQ_xen);
+ if ( !ret && arg[0] && arg[1] )
ret = cpufreq_cmdline_parse(arg + 1, end);
}
else if ( IS_ENABLED(CONFIG_INTEL) && choice < 0 &&
!cmdline_strcmp(str, "hwp") )
{
- xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
- cpufreq_controller = FREQCTL_xen;
- cpufreq_xen_opts[cpufreq_xen_cnt++] = CPUFREQ_hwp;
- ret = 0;
- if ( arg[0] && arg[1] )
+ ret = handle_cpufreq_cmdline(CPUFREQ_hwp);
+ if ( !ret && arg[0] && arg[1] )
ret = hwp_cmdline_parse(arg + 1, end);
}
else
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 0742aa9f44..948530218a 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -27,7 +27,8 @@ enum cpufreq_xen_opt {
CPUFREQ_xen,
CPUFREQ_hwp,
};
-extern enum cpufreq_xen_opt cpufreq_xen_opts[2];
+#define NR_CPUFREQ_OPTS 2
+extern enum cpufreq_xen_opt cpufreq_xen_opts[NR_CPUFREQ_OPTS];
extern unsigned int cpufreq_xen_cnt;
struct cpufreq_governor;
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-07-11 3:50 ` [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx" Penny Zheng
@ 2025-07-16 15:00 ` Jan Beulich
2025-08-04 5:47 ` Penny, Zheng
2025-08-04 6:04 ` Penny, Zheng
0 siblings, 2 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 15:00 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -64,12 +64,53 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
> /* set xen as default cpufreq */
> enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
>
> -enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
> - CPUFREQ_none };
> +enum cpufreq_xen_opt __initdata cpufreq_xen_opts[NR_CPUFREQ_OPTS] = {
> + CPUFREQ_xen,
> + CPUFREQ_none
> +};
> unsigned int __initdata cpufreq_xen_cnt = 1;
Given this, isn't the array index 1 initializer quite pointless above? Or
else, if you really mean to explicitly fill all slots with CPUFREQ_none
(despite that deliberately having numeric value 0), why not
"[1 ... NR_CPUFREQ_OPTS - 1] = CPUFREQ_none" (or using ARRAY_SIZE(), as
per below)?
> static int __init cpufreq_cmdline_parse(const char *s, const char *e);
>
> +static bool __init cpufreq_opts_contain(enum cpufreq_xen_opt option)
> +{
> + unsigned int count = cpufreq_xen_cnt;
> +
> + while ( count-- )
> + {
> + if ( cpufreq_xen_opts[count] == option )
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
> +{
> + int ret = 0;
> +
> + if ( cpufreq_opts_contain(option) )
> + return 0;
> +
> + cpufreq_controller = FREQCTL_xen;
> + ASSERT(cpufreq_xen_cnt < NR_CPUFREQ_OPTS);
This would better use ARRAY_SIZE(), at which point NR_CPUFREQ_OPTS can go
away again. What's worse, though, is that on release builds ...
> + cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
... you then still overrun this array if something's wrong in this regard.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-07-16 15:00 ` Jan Beulich
@ 2025-08-04 5:47 ` Penny, Zheng
2025-08-04 7:19 ` Jan Beulich
2025-08-04 6:04 ` Penny, Zheng
1 sibling, 1 reply; 66+ messages in thread
From: Penny, Zheng @ 2025-08-04 5:47 UTC (permalink / raw)
To: Jan Beulich; +Cc: Huang, Ray, xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, July 16, 2025 11:01 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx"
>
> On 11.07.2025 05:50, Penny Zheng wrote:
> > --- a/xen/drivers/cpufreq/cpufreq.c
> > +++ b/xen/drivers/cpufreq/cpufreq.c
> > @@ -64,12 +64,53 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
> > /* set xen as default cpufreq */
> > enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
> >
> > -enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
> > - CPUFREQ_none };
> > +enum cpufreq_xen_opt __initdata cpufreq_xen_opts[NR_CPUFREQ_OPTS] = {
> > + CPUFREQ_xen,
> > + CPUFREQ_none
> > +};
> > unsigned int __initdata cpufreq_xen_cnt = 1;
>
> Given this, isn't the array index 1 initializer quite pointless above? Or else, if you
> really mean to explicitly fill all slots with CPUFREQ_none (despite that deliberately
> having numeric value 0), why not
> "[1 ... NR_CPUFREQ_OPTS - 1] = CPUFREQ_none" (or using ARRAY_SIZE(), as
> per below)?
>
The cpufreq_xen_cnt initialized as 1 is to have default CPUFREQ_xen value when there is no "cpufreq=xxx" cmdline option
I suppose you are pointing out that the macro NR_CPUFREQ_OPTS is pointless, as we could use ARRAY_SIZE().
> > static int __init cpufreq_cmdline_parse(const char *s, const char
> > *e);
> >
> > +static bool __init cpufreq_opts_contain(enum cpufreq_xen_opt option)
> > +{
> > + unsigned int count = cpufreq_xen_cnt;
> > +
> > + while ( count-- )
> > + {
> > + if ( cpufreq_xen_opts[count] == option )
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
> > +{
> > + int ret = 0;
> > +
> > + if ( cpufreq_opts_contain(option) )
> > + return 0;
> > +
> > + cpufreq_controller = FREQCTL_xen;
> > + ASSERT(cpufreq_xen_cnt < NR_CPUFREQ_OPTS);
>
> This would better use ARRAY_SIZE(), at which point NR_CPUFREQ_OPTS can go
> away again. What's worse, though, is that on release builds ...
>
Understood, will use ARRAY_SIZE(), and will use if() to error out
> > + cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
>
> ... you then still overrun this array if something's wrong in this regard.
>
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-08-04 5:47 ` Penny, Zheng
@ 2025-08-04 7:19 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-08-04 7:19 UTC (permalink / raw)
To: Penny, Zheng; +Cc: Huang, Ray, xen-devel@lists.xenproject.org
On 04.08.2025 07:47, Penny, Zheng wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, July 16, 2025 11:01 PM
>>
>> On 11.07.2025 05:50, Penny Zheng wrote:
>>> --- a/xen/drivers/cpufreq/cpufreq.c
>>> +++ b/xen/drivers/cpufreq/cpufreq.c
>>> @@ -64,12 +64,53 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
>>> /* set xen as default cpufreq */
>>> enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
>>>
>>> -enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
>>> - CPUFREQ_none };
>>> +enum cpufreq_xen_opt __initdata cpufreq_xen_opts[NR_CPUFREQ_OPTS] = {
>>> + CPUFREQ_xen,
>>> + CPUFREQ_none
>>> +};
>>> unsigned int __initdata cpufreq_xen_cnt = 1;
>>
>> Given this, isn't the array index 1 initializer quite pointless above? Or else, if you
>> really mean to explicitly fill all slots with CPUFREQ_none (despite that deliberately
>> having numeric value 0), why not
>> "[1 ... NR_CPUFREQ_OPTS - 1] = CPUFREQ_none" (or using ARRAY_SIZE(), as
>> per below)?
>>
>
> The cpufreq_xen_cnt initialized as 1 is to have default CPUFREQ_xen value when there is no "cpufreq=xxx" cmdline option
> I suppose you are pointing out that the macro NR_CPUFREQ_OPTS is pointless, as we could use ARRAY_SIZE().
That I'm suggesting further down, yes. But here I'm questioning the array
initializer: As said, I think only slot 0 needs explicit initializing. Or
else the initializer would need touching again when the array size is
grown. Which would be nice to avoid, providing doing so is correct.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* RE: [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-07-16 15:00 ` Jan Beulich
2025-08-04 5:47 ` Penny, Zheng
@ 2025-08-04 6:04 ` Penny, Zheng
2025-08-04 7:17 ` Jan Beulich
1 sibling, 1 reply; 66+ messages in thread
From: Penny, Zheng @ 2025-08-04 6:04 UTC (permalink / raw)
To: Jan Beulich; +Cc: Huang, Ray, xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, July 16, 2025 11:01 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx"
>
> On 11.07.2025 05:50, Penny Zheng wrote:
> > --- a/xen/drivers/cpufreq/cpufreq.c
> > +++ b/xen/drivers/cpufreq/cpufreq.c
> > @@ -64,12 +64,53 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
> > /* set xen as default cpufreq */
> > enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
> >
> > -enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
> > - CPUFREQ_none };
> > +enum cpufreq_xen_opt __initdata cpufreq_xen_opts[NR_CPUFREQ_OPTS] = {
> > + CPUFREQ_xen,
> > + CPUFREQ_none
> > +};
> > unsigned int __initdata cpufreq_xen_cnt = 1;
>
> Given this, isn't the array index 1 initializer quite pointless above? Or else, if you
> really mean to explicitly fill all slots with CPUFREQ_none (despite that deliberately
> having numeric value 0), why not
> "[1 ... NR_CPUFREQ_OPTS - 1] = CPUFREQ_none" (or using ARRAY_SIZE(), as
> per below)?
>
> > static int __init cpufreq_cmdline_parse(const char *s, const char
> > *e);
> >
> > +static bool __init cpufreq_opts_contain(enum cpufreq_xen_opt option)
> > +{
> > + unsigned int count = cpufreq_xen_cnt;
> > +
> > + while ( count-- )
> > + {
> > + if ( cpufreq_xen_opts[count] == option )
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
> > +{
> > + int ret = 0;
> > +
> > + if ( cpufreq_opts_contain(option) )
> > + return 0;
> > +
> > + cpufreq_controller = FREQCTL_xen;
> > + ASSERT(cpufreq_xen_cnt < NR_CPUFREQ_OPTS);
>
> This would better use ARRAY_SIZE(), at which point NR_CPUFREQ_OPTS can go
> away again. What's worse, though, is that on release builds ...
>
I found that we already have array index check in setup_cpufreq_option(), before calling handle_cpufreq_cmdline()
Then maybe there is no need to do it again here
> > + cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
>
> ... you then still overrun this array if something's wrong in this regard.
>
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx"
2025-08-04 6:04 ` Penny, Zheng
@ 2025-08-04 7:17 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-08-04 7:17 UTC (permalink / raw)
To: Penny, Zheng; +Cc: Huang, Ray, xen-devel@lists.xenproject.org
On 04.08.2025 08:04, Penny, Zheng wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, July 16, 2025 11:01 PM
>>
>> On 11.07.2025 05:50, Penny Zheng wrote:
>>> --- a/xen/drivers/cpufreq/cpufreq.c
>>> +++ b/xen/drivers/cpufreq/cpufreq.c
>>> @@ -64,12 +64,53 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
>>> /* set xen as default cpufreq */
>>> enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
>>>
>>> -enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
>>> - CPUFREQ_none };
>>> +enum cpufreq_xen_opt __initdata cpufreq_xen_opts[NR_CPUFREQ_OPTS] = {
>>> + CPUFREQ_xen,
>>> + CPUFREQ_none
>>> +};
>>> unsigned int __initdata cpufreq_xen_cnt = 1;
>>
>> Given this, isn't the array index 1 initializer quite pointless above? Or else, if you
>> really mean to explicitly fill all slots with CPUFREQ_none (despite that deliberately
>> having numeric value 0), why not
>> "[1 ... NR_CPUFREQ_OPTS - 1] = CPUFREQ_none" (or using ARRAY_SIZE(), as
>> per below)?
>>
>>> static int __init cpufreq_cmdline_parse(const char *s, const char
>>> *e);
>>>
>>> +static bool __init cpufreq_opts_contain(enum cpufreq_xen_opt option)
>>> +{
>>> + unsigned int count = cpufreq_xen_cnt;
>>> +
>>> + while ( count-- )
>>> + {
>>> + if ( cpufreq_xen_opts[count] == option )
>>> + return true;
>>> + }
>>> +
>>> + return false;
>>> +}
>>> +
>>> +static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
>>> +{
>>> + int ret = 0;
>>> +
>>> + if ( cpufreq_opts_contain(option) )
>>> + return 0;
>>> +
>>> + cpufreq_controller = FREQCTL_xen;
>>> + ASSERT(cpufreq_xen_cnt < NR_CPUFREQ_OPTS);
>>
>> This would better use ARRAY_SIZE(), at which point NR_CPUFREQ_OPTS can go
>> away again. What's worse, though, is that on release builds ...
>
> I found that we already have array index check in setup_cpufreq_option(), before calling handle_cpufreq_cmdline()
> Then maybe there is no need to do it again here
Well, you will still need to deal with the release build aspect, as per your
earlier reply. At which point you can easily place an ASSERT_UNREACHABLE()
there as well.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 06/19] xen/cpufreq: make _PSD info common
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (4 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 05/19] xen/cpufreq: refactor cmdline "cpufreq=xxx" Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 15:07 ` Jan Beulich
2025-07-11 3:50 ` [PATCH v6 07/19] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para" Penny Zheng
` (12 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Jan Beulich
_PSD info, consisted of "shared_type" and "struct xen_psd_package", will not
only be provided from px-specific "struct xen_processor_performance", but also
in CPPC data.
Two new helper functions are introduced to deal with _PSD. They will later be
re-used for handling the same data for CPPC.
In the meantime, the following style corrections get applied at the same time:
- add extra space before and after bracket of if()
- remove redundant parenthesis
- no need to put brace for printk() at a seperate line
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v3 -> v4:
- new commit
---
v4 -> v5:
- let check_psd_pminfo() pass in "uint32_t shared_type"
- replace unnessary parameter "uint32_t init" with processor_pminfo[cpu]->init
- replace structure copy with const pointer delivery through
"const struct xen_psd_package **"
- blank line between non-fall-through switch-case blocks
- remove unnessary "define XEN_CPUPERF_SHARED_TYPE_xxx" movement
---
v5 -> v6:
- remove redundant local variable "domain_info_ptr"
- change check_psd_pminfo() to bool return
- Comment wants to start with a capital letter
- reword title and commit message
---
xen/drivers/cpufreq/cpufreq.c | 100 ++++++++++++++++++++++++----------
1 file changed, 71 insertions(+), 29 deletions(-)
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 887bc5953d..e387b8a0d9 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -226,9 +226,29 @@ int cpufreq_limit_change(unsigned int cpu)
return __cpufreq_set_policy(data, &policy);
}
-int cpufreq_add_cpu(unsigned int cpu)
+static int get_psd_info(unsigned int cpu, uint32_t *shared_type,
+ const struct xen_psd_package **domain_info)
{
int ret = 0;
+
+ switch ( processor_pminfo[cpu]->init )
+ {
+ case XEN_PX_INIT:
+ *shared_type = processor_pminfo[cpu]->perf.shared_type;
+ *domain_info = &processor_pminfo[cpu]->perf.domain_info;
+ break;
+
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+int cpufreq_add_cpu(unsigned int cpu)
+{
+ int ret;
unsigned int firstcpu;
unsigned int dom, domexist = 0;
unsigned int hw_all = 0;
@@ -236,14 +256,13 @@ int cpufreq_add_cpu(unsigned int cpu)
struct cpufreq_dom *cpufreq_dom = NULL;
struct cpufreq_policy new_policy;
struct cpufreq_policy *policy;
- struct processor_performance *perf;
+ const struct xen_psd_package *domain_info;
+ uint32_t shared_type;
/* to protect the case when Px was not controlled by xen */
if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
return -EINVAL;
- perf = &processor_pminfo[cpu]->perf;
-
if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
return -EINVAL;
@@ -253,10 +272,14 @@ int cpufreq_add_cpu(unsigned int cpu)
if (per_cpu(cpufreq_cpu_policy, cpu))
return 0;
- if (perf->shared_type == CPUFREQ_SHARED_TYPE_HW)
+ ret = get_psd_info(cpu, &shared_type, &domain_info);
+ if ( ret )
+ return ret;
+
+ if ( shared_type == CPUFREQ_SHARED_TYPE_HW )
hw_all = 1;
- dom = perf->domain_info.domain;
+ dom = domain_info->domain;
list_for_each(pos, &cpufreq_dom_list_head) {
cpufreq_dom = list_entry(pos, struct cpufreq_dom, node);
@@ -279,21 +302,27 @@ int cpufreq_add_cpu(unsigned int cpu)
cpufreq_dom->dom = dom;
list_add(&cpufreq_dom->node, &cpufreq_dom_list_head);
} else {
+ uint32_t firstcpu_shared_type;
+ const struct xen_psd_package *firstcpu_domain_info;
+
/* domain sanity check under whatever coordination type */
firstcpu = cpumask_first(cpufreq_dom->map);
- if ((perf->domain_info.coord_type !=
- processor_pminfo[firstcpu]->perf.domain_info.coord_type) ||
- (perf->domain_info.num_processors !=
- processor_pminfo[firstcpu]->perf.domain_info.num_processors)) {
-
+ ret = get_psd_info(firstcpu, &firstcpu_shared_type,
+ &firstcpu_domain_info);
+ if ( ret )
+ return ret;
+
+ if ( domain_info->coord_type != firstcpu_domain_info->coord_type ||
+ domain_info->num_processors !=
+ firstcpu_domain_info->num_processors )
+ {
printk(KERN_WARNING "cpufreq fail to add CPU%d:"
"incorrect _PSD(%"PRIu64":%"PRIu64"), "
"expect(%"PRIu64"/%"PRIu64")\n",
- cpu, perf->domain_info.coord_type,
- perf->domain_info.num_processors,
- processor_pminfo[firstcpu]->perf.domain_info.coord_type,
- processor_pminfo[firstcpu]->perf.domain_info.num_processors
- );
+ cpu, domain_info->coord_type,
+ domain_info->num_processors,
+ firstcpu_domain_info->coord_type,
+ firstcpu_domain_info->num_processors);
return -EINVAL;
}
}
@@ -339,8 +368,9 @@ int cpufreq_add_cpu(unsigned int cpu)
if (ret)
goto err1;
- if (hw_all || (cpumask_weight(cpufreq_dom->map) ==
- perf->domain_info.num_processors)) {
+ if ( hw_all || cpumask_weight(cpufreq_dom->map) ==
+ domain_info->num_processors )
+ {
memcpy(&new_policy, policy, sizeof(struct cpufreq_policy));
/*
@@ -395,29 +425,33 @@ err0:
int cpufreq_del_cpu(unsigned int cpu)
{
+ int ret;
unsigned int dom, domexist = 0;
unsigned int hw_all = 0;
struct list_head *pos;
struct cpufreq_dom *cpufreq_dom = NULL;
struct cpufreq_policy *policy;
- struct processor_performance *perf;
+ uint32_t shared_type;
+ const struct xen_psd_package *domain_info;
/* to protect the case when Px was not controlled by xen */
if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
return -EINVAL;
- perf = &processor_pminfo[cpu]->perf;
-
if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
return -EINVAL;
if (!per_cpu(cpufreq_cpu_policy, cpu))
return 0;
- if (perf->shared_type == CPUFREQ_SHARED_TYPE_HW)
+ ret = get_psd_info(cpu, &shared_type, &domain_info);
+ if ( ret )
+ return ret;
+
+ if ( shared_type == CPUFREQ_SHARED_TYPE_HW )
hw_all = 1;
- dom = perf->domain_info.domain;
+ dom = domain_info->domain;
policy = per_cpu(cpufreq_cpu_policy, cpu);
list_for_each(pos, &cpufreq_dom_list_head) {
@@ -433,8 +467,8 @@ int cpufreq_del_cpu(unsigned int cpu)
/* for HW_ALL, stop gov for each core of the _PSD domain */
/* for SW_ALL & SW_ANY, stop gov for the 1st core of the _PSD domain */
- if (hw_all || (cpumask_weight(cpufreq_dom->map) ==
- perf->domain_info.num_processors))
+ if ( hw_all || cpumask_weight(cpufreq_dom->map) ==
+ domain_info->num_processors )
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
cpufreq_statistic_exit(cpu);
@@ -499,6 +533,17 @@ static void print_PPC(unsigned int platform_limit)
printk("\t_PPC: %d\n", platform_limit);
}
+static bool check_psd_pminfo(uint32_t shared_type)
+{
+ /* Check domain coordination */
+ if ( shared_type != CPUFREQ_SHARED_TYPE_ALL &&
+ shared_type != CPUFREQ_SHARED_TYPE_ANY &&
+ shared_type != CPUFREQ_SHARED_TYPE_HW )
+ return false;
+
+ return true;
+}
+
int set_px_pminfo(uint32_t acpi_id, struct xen_processor_performance *perf)
{
int ret = 0, cpu;
@@ -581,10 +626,7 @@ int set_px_pminfo(uint32_t acpi_id, struct xen_processor_performance *perf)
if ( perf->flags & XEN_PX_PSD )
{
- /* check domain coordination */
- if ( perf->shared_type != CPUFREQ_SHARED_TYPE_ALL &&
- perf->shared_type != CPUFREQ_SHARED_TYPE_ANY &&
- perf->shared_type != CPUFREQ_SHARED_TYPE_HW )
+ if ( !check_psd_pminfo(perf->shared_type) )
{
ret = -EINVAL;
goto out;
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 06/19] xen/cpufreq: make _PSD info common
2025-07-11 3:50 ` [PATCH v6 06/19] xen/cpufreq: make _PSD info common Penny Zheng
@ 2025-07-16 15:07 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 15:07 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> _PSD info, consisted of "shared_type" and "struct xen_psd_package", will not
> only be provided from px-specific "struct xen_processor_performance", but also
> in CPPC data.
>
> Two new helper functions are introduced to deal with _PSD. They will later be
> re-used for handling the same data for CPPC.
> In the meantime, the following style corrections get applied at the same time:
> - add extra space before and after bracket of if()
> - remove redundant parenthesis
> - no need to put brace for printk() at a seperate line
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one further remark:
> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -226,9 +226,29 @@ int cpufreq_limit_change(unsigned int cpu)
> return __cpufreq_set_policy(data, &policy);
> }
>
> -int cpufreq_add_cpu(unsigned int cpu)
> +static int get_psd_info(unsigned int cpu, uint32_t *shared_type,
Here any below I question the need to use a fixed-width type. "unsigned int"
will do fine here, I expect, and that's what ./CODING_STYLE also mandates in
such cases. I may take the liberty and change that while committing.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 07/19] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para"
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (5 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 06/19] xen/cpufreq: make _PSD info common Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 15:10 ` Jan Beulich
2025-07-28 13:09 ` Anthony PERARD
2025-07-11 3:50 ` [PATCH v6 08/19] xen/cpufreq: rename cppc preset name to "XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND" Penny Zheng
` (11 subsequent siblings)
18 siblings, 2 replies; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Juergen Gross,
Andrew Cooper, Michal Orzel, Jan Beulich, Julien Grall,
Roger Pau Monné, Stefano Stabellini
As we are going to add "struct xen_cppc_para" in "struct xen_sysctl_pm_op" as
a new xenpm sub-op later to specifically dealing with CPPC-info, we need to
follow the naming pattern, to change the struct name to "xen_get_cppc_para",
which is more suitable than "xen_cppc_para".
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v5 -> v6:
- new commit
---
tools/include/xenctrl.h | 2 +-
xen/arch/x86/acpi/cpufreq/hwp.c | 2 +-
xen/include/acpi/cpufreq/cpufreq.h | 2 +-
xen/include/public/sysctl.h | 6 +++---
4 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 4955981231..965d3b585a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1906,7 +1906,7 @@ int xc_smt_disable(xc_interface *xch);
*/
typedef struct xen_userspace xc_userspace_t;
typedef struct xen_ondemand xc_ondemand_t;
-typedef struct xen_cppc_para xc_cppc_para_t;
+typedef struct xen_get_cppc_para xc_cppc_para_t;
struct xc_get_cpufreq_para {
/* IN/OUT variable */
diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index e4c09244ab..7bf475ecb5 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -525,7 +525,7 @@ hwp_cpufreq_driver = {
#ifdef CONFIG_PM_OP
int get_hwp_para(unsigned int cpu,
- struct xen_cppc_para *cppc_para)
+ struct xen_get_cppc_para *cppc_para)
{
const struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 948530218a..7f26b5653a 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -267,7 +267,7 @@ static inline bool hwp_active(void) { return false; }
#endif
int get_hwp_para(unsigned int cpu,
- struct xen_cppc_para *cppc_para);
+ struct xen_get_cppc_para *cppc_para);
int set_hwp_para(struct cpufreq_policy *policy,
struct xen_set_cppc_para *set_cppc);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index c9d96a06ff..86b6df30e7 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -336,7 +336,7 @@ struct xen_ondemand {
uint32_t up_threshold;
};
-struct xen_cppc_para {
+struct xen_get_cppc_para {
/* OUT */
/* activity_window supported if set */
#define XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW (1 << 0)
@@ -442,7 +442,7 @@ struct xen_set_cppc_para {
XEN_SYSCTL_CPPC_SET_ACT_WINDOW )
/* IN/OUT */
uint32_t set_params; /* bitflags for valid values */
- /* See comments in struct xen_cppc_para. */
+ /* See comments in struct xen_get_cppc_para. */
/* IN */
uint32_t minimum;
uint32_t maximum;
@@ -490,7 +490,7 @@ struct xen_get_cpufreq_para {
struct xen_ondemand ondemand;
} u;
} s;
- struct xen_cppc_para cppc_para;
+ struct xen_get_cppc_para cppc_para;
} u;
int32_t turbo_enabled;
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 07/19] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para"
2025-07-11 3:50 ` [PATCH v6 07/19] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para" Penny Zheng
@ 2025-07-16 15:10 ` Jan Beulich
2025-07-28 13:09 ` Anthony PERARD
1 sibling, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 15:10 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Anthony PERARD, Juergen Gross, Andrew Cooper,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> As we are going to add "struct xen_cppc_para" in "struct xen_sysctl_pm_op" as
> a new xenpm sub-op later to specifically dealing with CPPC-info, we need to
> follow the naming pattern, to change the struct name to "xen_get_cppc_para",
> which is more suitable than "xen_cppc_para".
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 07/19] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para"
2025-07-11 3:50 ` [PATCH v6 07/19] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para" Penny Zheng
2025-07-16 15:10 ` Jan Beulich
@ 2025-07-28 13:09 ` Anthony PERARD
1 sibling, 0 replies; 66+ messages in thread
From: Anthony PERARD @ 2025-07-28 13:09 UTC (permalink / raw)
To: Penny Zheng
Cc: xen-devel, ray.huang, Anthony PERARD, Juergen Gross,
Andrew Cooper, Michal Orzel, Jan Beulich, Julien Grall,
Roger Pau Monné, Stefano Stabellini
On Fri, Jul 11, 2025 at 11:50:54AM +0800, Penny Zheng wrote:
> As we are going to add "struct xen_cppc_para" in "struct xen_sysctl_pm_op" as
> a new xenpm sub-op later to specifically dealing with CPPC-info, we need to
> follow the naming pattern, to change the struct name to "xen_get_cppc_para",
> which is more suitable than "xen_cppc_para".
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Thanks,
--
Anthony PERARD
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 08/19] xen/cpufreq: rename cppc preset name to "XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND"
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (6 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 07/19] cpufreq: rename "xen_cppc_para" to "xen_get_cppc_para" Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 15:18 ` Jan Beulich
2025-07-11 3:50 ` [PATCH v6 09/19] xen/cpufreq: neglect unsupported-mode request from DOM0 Penny Zheng
` (10 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Andrew Cooper,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
"ondemand" is more suitable to describe a preset in which epp value is set
with medium(CPPC_ENERGY_PERF_BALANCE), showing no preference over performance or
powersave, minimum with lowest and maximum with highest.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v5 -> v6:
- new commit
---
tools/misc/xenpm.c | 4 ++--
xen/arch/x86/acpi/cpufreq/hwp.c | 2 +-
xen/include/public/sysctl.h | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index bbe45fa548..55b0b0c482 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -1445,9 +1445,9 @@ static int parse_cppc_opts(xc_set_cppc_para_t *set_cppc, int *cpuid,
set_cppc->set_params = XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE;
i++;
}
- else if ( strcasecmp(argv[i], "balance") == 0 )
+ else if ( strcasecmp(argv[i], "ondemand") == 0 )
{
- set_cppc->set_params = XEN_SYSCTL_CPPC_SET_PRESET_BALANCE;
+ set_cppc->set_params = XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND;
i++;
}
diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 7bf475ecb5..cec2ee262e 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -610,7 +610,7 @@ int set_hwp_para(struct cpufreq_policy *policy,
data->desired = 0;
break;
- case XEN_SYSCTL_CPPC_SET_PRESET_BALANCE:
+ case XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND:
data->minimum = data->hw.lowest;
data->maximum = data->hw.highest;
data->activity_window = 0;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 86b6df30e7..aafa7fcf2b 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -430,7 +430,7 @@ struct xen_set_cppc_para {
#define XEN_SYSCTL_CPPC_SET_ACT_WINDOW (1U << 4)
#define XEN_SYSCTL_CPPC_SET_PRESET_MASK 0xf0000000U
#define XEN_SYSCTL_CPPC_SET_PRESET_NONE 0x00000000U
-#define XEN_SYSCTL_CPPC_SET_PRESET_BALANCE 0x10000000U
+#define XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND 0x10000000U
#define XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE 0x20000000U
#define XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE 0x30000000U
#define XEN_SYSCTL_CPPC_SET_PARAM_MASK \
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 08/19] xen/cpufreq: rename cppc preset name to "XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND"
2025-07-11 3:50 ` [PATCH v6 08/19] xen/cpufreq: rename cppc preset name to "XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND" Penny Zheng
@ 2025-07-16 15:18 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 15:18 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Anthony PERARD, Andrew Cooper, Michal Orzel,
Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> "ondemand" is more suitable to describe a preset in which epp value is set
> with medium(CPPC_ENERGY_PERF_BALANCE), showing no preference over performance or
> powersave, minimum with lowest and maximum with highest.
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 09/19] xen/cpufreq: neglect unsupported-mode request from DOM0
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (7 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 08/19] xen/cpufreq: rename cppc preset name to "XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND" Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 15:19 ` Jan Beulich
2025-07-11 3:50 ` [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data Penny Zheng
` (9 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Jan Beulich, Andrew Cooper,
Roger Pau Monné
DOM0 could deliever whatever performance statistic (Px, _CPC) it parses, it is
Xen's responsibility to decide which one it shall accept.
Xen rely on XEN_PROCESSOR_PM_xxx flag to tell which mode ( Px or CPPC )
current cpufreq driver supports, and accepts relative info. It will neglect
unsupported-mode request and yields success.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v5 -> v6:
- new commit
---
xen/arch/x86/platform_hypercall.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 90abd3197f..3eba791889 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -539,9 +539,14 @@ ret_t do_platform_op(
case XEN_PM_PX:
if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
{
- ret = -ENOSYS;
+ /*
+ * Neglect Px-info when registered cpufreq driver
+ * isn't in Px mode
+ */
+ ret = 0;
break;
}
+
ret = set_px_pminfo(op->u.set_pminfo.id, &op->u.set_pminfo.u.perf);
break;
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 09/19] xen/cpufreq: neglect unsupported-mode request from DOM0
2025-07-11 3:50 ` [PATCH v6 09/19] xen/cpufreq: neglect unsupported-mode request from DOM0 Penny Zheng
@ 2025-07-16 15:19 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 15:19 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Andrew Cooper, Roger Pau Monné, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> DOM0 could deliever whatever performance statistic (Px, _CPC) it parses, it is
> Xen's responsibility to decide which one it shall accept.
> Xen rely on XEN_PROCESSOR_PM_xxx flag to tell which mode ( Px or CPPC )
> current cpufreq driver supports, and accepts relative info. It will neglect
> unsupported-mode request and yields success.
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (8 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 09/19] xen/cpufreq: neglect unsupported-mode request from DOM0 Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 15:39 ` Jan Beulich
2025-07-11 3:50 ` [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver Penny Zheng
` (8 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini
In order to provide backward compatibility with existing governors
that represent performance as frequency, like ondemand, the _CPC
table can optionally provide processor frequency range values, Lowest
frequency and Nominal frequency, to let OS use Lowest Frequency/
Performance and Nominal Frequency/Performance as anchor points to
create linear mapping of CPPC performance to CPU frequency.
As Xen is uncapable of parsing the ACPI dynamic table, we'd like to
introduce a new sub-hypercall "XEN_PM_CPPC" to propagate required CPPC
data from dom0 kernel to Xen.
In the according handler set_cppc_pminfo(), we do _CPC and _PSD
sanitization check, as both _PSD and _CPC info are necessary for correctly
initializing cpufreq cores in CPPC mode.
Users shall be warned that if we failed at this point,
no fallback scheme, like legacy P-state could be switched to.
A new flag "XEN_CPPC_INIT" is also introduced for cpufreq core initialised in
CPPC mode. Then all .init flag checking shall be updated to
consider "XEN_CPPC_INIT" too.
We want to bypass construction of px statistic info in cpufreq_statistic_init()
for CPPC mode, while not bypassing cpufreq_statistic_lock initialization for a
good reason. The same check is unnecessary for cpufreq_statistic_exit(),
since it has already been covered by px statistic variable
"cpufreq_statistic_data" check
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Remove unnecessary figure braces
- Pointer-to-const for print_CPPC and set_cppc_pminfo
- Structure allocation shall use xvzalloc()
- Unnecessary memcpy(), and change it to a (type safe) structure assignment
- Add comment for struct xen_processor_cppc, and keep the chosen fields
in the order _CPC has them
- Obey to alphabetic sorting, and prefix compat structures with ? instead
of !
---
v2 -> v3:
- Trim too long line
- Re-place set_cppc_pminfo() past set_px_pminfo()
- Fix Misra violations: Declaration and definition ought to agree
in parameter names
- Introduce a new flag XEN_PM_CPPC to reflect processor initialised in CPPC
mode
---
v3 -> v4:
- Refactor commit message
- make "acpi_id" unsigned int
- Add warning message when cpufreq_cpu_init() failed only under debug mode
- Expand "struct xen_processor_cppc" to include _PSD and shared type
- add sanity check for ACPI CPPC data
---
v4 -> v5:
- remove the ordering check between lowest_nonlinear_perf and lowest_perf
- use printk_once() for cppc perf value warning
- complement comment for cppc perf value check
- remove redundant check and pointless parenthesizing
- use dprintk() for warning under #ifndef NDEBUG
- refactor warning message when having non-zero ret of cpufreq_cpu_init()
- With introduction of "struct xen_psd_package" in "struct xen_processor_cppc",
use ! and the respective XLAT_* macro(s) to wrap.
---
v5 -> v6:
- remove unnecessary input parameter check
- use print_once() instead of dprintk() and reword the log message
- adhere to designated comment style
- relative ordering shall be consistent between different declaration groups
- add alphabetically in xlat.lst
- in get_cpufreq_para(), add must-zero check for ->perf.state_count in CPPC mode
---
xen/arch/x86/platform_hypercall.c | 5 +
xen/arch/x86/x86_64/cpufreq.c | 19 ++++
xen/arch/x86/x86_64/platform_hypercall.c | 3 +
xen/drivers/acpi/pm-op.c | 5 +-
xen/drivers/cpufreq/cpufreq.c | 126 +++++++++++++++++++++-
xen/include/acpi/cpufreq/processor_perf.h | 4 +-
xen/include/public/platform.h | 26 +++++
xen/include/xen/pmstat.h | 5 +
xen/include/xlat.lst | 1 +
9 files changed, 189 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 3eba791889..42b3b8b95a 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -577,6 +577,11 @@ ret_t do_platform_op(
break;
}
+ case XEN_PM_CPPC:
+ ret = set_cppc_pminfo(op->u.set_pminfo.id,
+ &op->u.set_pminfo.u.cppc_data);
+ break;
+
default:
ret = -EINVAL;
break;
diff --git a/xen/arch/x86/x86_64/cpufreq.c b/xen/arch/x86/x86_64/cpufreq.c
index e4f3d5b436..8d57f67c2e 100644
--- a/xen/arch/x86/x86_64/cpufreq.c
+++ b/xen/arch/x86/x86_64/cpufreq.c
@@ -54,3 +54,22 @@ int compat_set_px_pminfo(uint32_t acpi_id,
return set_px_pminfo(acpi_id, xen_perf);
}
+
+int compat_set_cppc_pminfo(unsigned int acpi_id,
+ const struct compat_processor_cppc *cppc_data)
+
+{
+ struct xen_processor_cppc *xen_cppc;
+ unsigned long xlat_page_current;
+
+ xlat_malloc_init(xlat_page_current);
+
+ xen_cppc = xlat_malloc_array(xlat_page_current,
+ struct xen_processor_cppc, 1);
+ if ( unlikely(xen_cppc == NULL) )
+ return -EFAULT;
+
+ XLAT_processor_cppc(xen_cppc, cppc_data);
+
+ return set_cppc_pminfo(acpi_id, xen_cppc);
+}
diff --git a/xen/arch/x86/x86_64/platform_hypercall.c b/xen/arch/x86/x86_64/platform_hypercall.c
index 9ab631c17f..0288f68df9 100644
--- a/xen/arch/x86/x86_64/platform_hypercall.c
+++ b/xen/arch/x86/x86_64/platform_hypercall.c
@@ -14,6 +14,9 @@ EMIT_FILE;
#define efi_get_info efi_compat_get_info
#define efi_runtime_call(x) efi_compat_runtime_call(x)
+#define xen_processor_cppc compat_processor_cppc
+#define set_cppc_pminfo compat_set_cppc_pminfo
+
#define xen_processor_performance compat_processor_performance
#define set_px_pminfo compat_set_px_pminfo
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 9a1970df34..49b4067dec 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -91,7 +91,8 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
pmpt = processor_pminfo[op->cpuid];
policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
- if ( !pmpt || !pmpt->perf.states ||
+ if ( !pmpt || ((pmpt->init & XEN_PX_INIT) && !pmpt->perf.states) ||
+ ((pmpt->init & XEN_CPPC_INIT) && pmpt->perf.state_count) ||
!policy || !policy->governor )
return -EINVAL;
@@ -351,7 +352,7 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
case CPUFREQ_PARA:
if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
return -ENODEV;
- if ( !pmpt || !(pmpt->init & XEN_PX_INIT) )
+ if ( !pmpt || !(pmpt->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
return -EINVAL;
break;
}
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index e387b8a0d9..065fdf4106 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -40,6 +40,7 @@
#include <xen/domain.h>
#include <xen/cpu.h>
#include <xen/pmstat.h>
+#include <xen/xvmalloc.h>
#include <asm/io.h>
#include <asm/processor.h>
@@ -238,6 +239,11 @@ static int get_psd_info(unsigned int cpu, uint32_t *shared_type,
*domain_info = &processor_pminfo[cpu]->perf.domain_info;
break;
+ case XEN_CPPC_INIT:
+ *shared_type = processor_pminfo[cpu]->cppc_data.shared_type;
+ *domain_info = &processor_pminfo[cpu]->cppc_data.domain_info;
+ break;
+
default:
ret = -EINVAL;
break;
@@ -263,7 +269,7 @@ int cpufreq_add_cpu(unsigned int cpu)
if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
return -EINVAL;
- if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
+ if ( !(processor_pminfo[cpu]->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
return -EINVAL;
if (!cpufreq_driver.init)
@@ -438,7 +444,7 @@ int cpufreq_del_cpu(unsigned int cpu)
if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
return -EINVAL;
- if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
+ if ( !(processor_pminfo[cpu]->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
return -EINVAL;
if (!per_cpu(cpufreq_cpu_policy, cpu))
@@ -697,6 +703,122 @@ int acpi_set_pdc_bits(unsigned int acpi_id, XEN_GUEST_HANDLE(uint32) pdc)
return ret;
}
+static void print_CPPC(const struct xen_processor_cppc *cppc_data)
+{
+ printk("\t_CPC: highest_perf=%u, lowest_perf=%u, "
+ "nominal_perf=%u, lowest_nonlinear_perf=%u, "
+ "nominal_mhz=%uMHz, lowest_mhz=%uMHz\n",
+ cppc_data->cpc.highest_perf, cppc_data->cpc.lowest_perf,
+ cppc_data->cpc.nominal_perf, cppc_data->cpc.lowest_nonlinear_perf,
+ cppc_data->cpc.nominal_mhz, cppc_data->cpc.lowest_mhz);
+}
+
+int set_cppc_pminfo(unsigned int acpi_id,
+ const struct xen_processor_cppc *cppc_data)
+{
+ int ret = 0, cpuid;
+ struct processor_pminfo *pm_info;
+
+ cpuid = get_cpu_id(acpi_id);
+ if ( cpuid < 0 )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if ( cppc_data->pad[0] || cppc_data->pad[1] || cppc_data->pad[2] )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if ( cpufreq_verbose )
+ printk("Set CPU acpi_id(%u) cpuid(%d) CPPC State info:\n",
+ acpi_id, cpuid);
+
+ pm_info = processor_pminfo[cpuid];
+ if ( !pm_info )
+ {
+ pm_info = xvzalloc(struct processor_pminfo);
+ if ( !pm_info )
+ {
+ ret = -ENOMEM;
+ goto out;
+ }
+ processor_pminfo[cpuid] = pm_info;
+ }
+ pm_info->acpi_id = acpi_id;
+ pm_info->id = cpuid;
+ pm_info->cppc_data = *cppc_data;
+
+ if ( cppc_data->flags & XEN_CPPC_PSD )
+ if ( !check_psd_pminfo(cppc_data->shared_type) )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if ( cppc_data->flags & XEN_CPPC_CPC )
+ {
+ if ( cppc_data->cpc.highest_perf == 0 ||
+ cppc_data->cpc.highest_perf > UINT8_MAX ||
+ cppc_data->cpc.nominal_perf == 0 ||
+ cppc_data->cpc.nominal_perf > UINT8_MAX ||
+ cppc_data->cpc.lowest_nonlinear_perf == 0 ||
+ cppc_data->cpc.lowest_nonlinear_perf > UINT8_MAX ||
+ cppc_data->cpc.lowest_perf == 0 ||
+ cppc_data->cpc.lowest_perf > UINT8_MAX ||
+ cppc_data->cpc.lowest_perf >
+ cppc_data->cpc.lowest_nonlinear_perf ||
+ cppc_data->cpc.lowest_nonlinear_perf >
+ cppc_data->cpc.nominal_perf ||
+ cppc_data->cpc.nominal_perf > cppc_data->cpc.highest_perf )
+ /*
+ * Right now, Xen doesn't actually use highest_perf/nominal_perf/
+ * lowest_nonlinear_perf/lowest_perf values read from ACPI _CPC
+ * table. Xen reads CPPC capability MSR to get these four values.
+ * So warning is enough.
+ */
+ printk_once(XENLOG_WARNING
+ "Broken CPPC perf values: lowest(%u), nonlinear_lowest(%u), nominal(%u), highest(%u)\n",
+ cppc_data->cpc.lowest_perf,
+ cppc_data->cpc.lowest_nonlinear_perf,
+ cppc_data->cpc.nominal_perf,
+ cppc_data->cpc.highest_perf);
+
+ /* lowest_mhz and nominal_mhz are optional value */
+ if ( cppc_data->cpc.lowest_mhz > cppc_data->cpc.nominal_mhz )
+ {
+ printk_once(XENLOG_WARNING
+ "Broken CPPC freq values: lowest(%u), nominal(%u)\n",
+ cppc_data->cpc.lowest_mhz,
+ cppc_data->cpc.nominal_mhz);
+ /* Re-set with zero values, instead of keeping invalid values */
+ pm_info->cppc_data.cpc.nominal_mhz = 0;
+ pm_info->cppc_data.cpc.lowest_mhz = 0;
+ }
+ }
+
+ if ( cppc_data->flags == (XEN_CPPC_PSD | XEN_CPPC_CPC) )
+ {
+ if ( cpufreq_verbose )
+ {
+ print_PSD(&pm_info->cppc_data.domain_info);
+ print_CPPC(&pm_info->cppc_data);
+ }
+
+ pm_info->init = XEN_CPPC_INIT;
+ ret = cpufreq_cpu_init(cpuid);
+ if ( ret )
+ printk_once(XENLOG_WARNING
+ "CPU%u failed amd-cppc mode init; use \"cpufreq=xen\" instead",
+ cpuid);
+ }
+
+ out:
+ return ret;
+}
+
static void cpufreq_cmdline_common_para(struct cpufreq_policy *new_policy)
{
if (usr_max_freq)
diff --git a/xen/include/acpi/cpufreq/processor_perf.h b/xen/include/acpi/cpufreq/processor_perf.h
index caa768626c..f80495fc96 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -5,7 +5,8 @@
#include <public/sysctl.h>
#include <xen/acpi.h>
-#define XEN_PX_INIT 0x80000000U
+#define XEN_CPPC_INIT 0x40000000U
+#define XEN_PX_INIT 0x80000000U
unsigned int powernow_register_driver(void);
unsigned int get_measured_perf(unsigned int cpu, unsigned int flag);
@@ -43,6 +44,7 @@ struct processor_pminfo {
uint32_t acpi_id;
uint32_t id;
struct processor_performance perf;
+ struct xen_processor_cppc cppc_data;
uint32_t init;
};
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 2725b8d104..9103315af6 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -363,6 +363,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
#define XEN_PM_PX 1
#define XEN_PM_TX 2
#define XEN_PM_PDC 3
+#define XEN_PM_CPPC 4
/* Px sub info type */
#define XEN_PX_PCT 1
@@ -370,6 +371,10 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
#define XEN_PX_PPC 4
#define XEN_PX_PSD 8
+/* CPPC sub info type */
+#define XEN_CPPC_PSD 1
+#define XEN_CPPC_CPC 2
+
struct xen_power_register {
uint32_t space_id;
uint32_t bit_width;
@@ -457,6 +462,26 @@ struct xen_processor_performance {
typedef struct xen_processor_performance xen_processor_performance_t;
DEFINE_XEN_GUEST_HANDLE(xen_processor_performance_t);
+struct xen_processor_cppc {
+ uint8_t flags; /* IN: XEN_CPPC_xxx */
+ uint8_t pad[3];
+ /*
+ * IN: Subset _CPC fields useful for CPPC-compatible cpufreq
+ * driver's initialization
+ */
+ struct {
+ uint32_t highest_perf;
+ uint32_t nominal_perf;
+ uint32_t lowest_nonlinear_perf;
+ uint32_t lowest_perf;
+ uint32_t lowest_mhz;
+ uint32_t nominal_mhz;
+ } cpc;
+ uint32_t shared_type; /* IN: XEN_CPUPERF_SHARED_TYPE_xxx */
+ struct xen_psd_package domain_info; /* IN: _PSD */
+};
+typedef struct xen_processor_cppc xen_processor_cppc_t;
+
struct xenpf_set_processor_pminfo {
/* IN variables */
uint32_t id; /* ACPI CPU ID */
@@ -465,6 +490,7 @@ struct xenpf_set_processor_pminfo {
struct xen_processor_power power;/* Cx: _CST/_CSD */
struct xen_processor_performance perf; /* Px: _PPC/_PCT/_PSS/_PSD */
XEN_GUEST_HANDLE(uint32) pdc; /* _PDC */
+ xen_processor_cppc_t cppc_data; /* CPPC: _CPC and _PSD */
} u;
};
typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t;
diff --git a/xen/include/xen/pmstat.h b/xen/include/xen/pmstat.h
index 8350403e95..6096560d3c 100644
--- a/xen/include/xen/pmstat.h
+++ b/xen/include/xen/pmstat.h
@@ -7,12 +7,17 @@
int set_px_pminfo(uint32_t acpi_id, struct xen_processor_performance *perf);
long set_cx_pminfo(uint32_t acpi_id, struct xen_processor_power *power);
+int set_cppc_pminfo(unsigned int acpi_id,
+ const struct xen_processor_cppc *cppc_data);
#ifdef CONFIG_COMPAT
struct compat_processor_performance;
int compat_set_px_pminfo(uint32_t acpi_id, struct compat_processor_performance *perf);
struct compat_processor_power;
long compat_set_cx_pminfo(uint32_t acpi_id, struct compat_processor_power *power);
+struct compat_processor_cppc;
+int compat_set_cppc_pminfo(unsigned int acpi_id,
+ const struct compat_processor_cppc *cppc_data);
#endif
uint32_t pmstat_get_cx_nr(unsigned int cpu);
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 3c7b6c6830..ab2e207c77 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -162,6 +162,7 @@
! pct_register platform.h
! power_register platform.h
+! processor_cppc platform.h
? processor_csd platform.h
! processor_cx platform.h
! processor_flags platform.h
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-07-11 3:50 ` [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data Penny Zheng
@ 2025-07-16 15:39 ` Jan Beulich
2025-08-04 6:47 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 15:39 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> --- a/xen/drivers/acpi/pm-op.c
> +++ b/xen/drivers/acpi/pm-op.c
> @@ -91,7 +91,8 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
> pmpt = processor_pminfo[op->cpuid];
> policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
>
> - if ( !pmpt || !pmpt->perf.states ||
> + if ( !pmpt || ((pmpt->init & XEN_PX_INIT) && !pmpt->perf.states) ||
> + ((pmpt->init & XEN_CPPC_INIT) && pmpt->perf.state_count) ||
Nit: I think this would be neater if the PX_INIT part was also moved to its own
line.
> @@ -697,6 +703,122 @@ int acpi_set_pdc_bits(unsigned int acpi_id, XEN_GUEST_HANDLE(uint32) pdc)
> return ret;
> }
>
> +static void print_CPPC(const struct xen_processor_cppc *cppc_data)
> +{
> + printk("\t_CPC: highest_perf=%u, lowest_perf=%u, "
> + "nominal_perf=%u, lowest_nonlinear_perf=%u, "
> + "nominal_mhz=%uMHz, lowest_mhz=%uMHz\n",
> + cppc_data->cpc.highest_perf, cppc_data->cpc.lowest_perf,
> + cppc_data->cpc.nominal_perf, cppc_data->cpc.lowest_nonlinear_perf,
> + cppc_data->cpc.nominal_mhz, cppc_data->cpc.lowest_mhz);
> +}
> +
> +int set_cppc_pminfo(unsigned int acpi_id,
> + const struct xen_processor_cppc *cppc_data)
> +{
> + int ret = 0, cpuid;
> + struct processor_pminfo *pm_info;
> +
> + cpuid = get_cpu_id(acpi_id);
> + if ( cpuid < 0 )
> + {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if ( cppc_data->pad[0] || cppc_data->pad[1] || cppc_data->pad[2] )
> + {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if ( cpufreq_verbose )
> + printk("Set CPU acpi_id(%u) cpuid(%d) CPPC State info:\n",
May I suggest "Set CPU%d (ACPI ID %u) CPPC state info:\n"
> + acpi_id, cpuid);
> +
> + pm_info = processor_pminfo[cpuid];
> + if ( !pm_info )
> + {
> + pm_info = xvzalloc(struct processor_pminfo);
> + if ( !pm_info )
> + {
> + ret = -ENOMEM;
> + goto out;
> + }
> + processor_pminfo[cpuid] = pm_info;
> + }
> + pm_info->acpi_id = acpi_id;
> + pm_info->id = cpuid;
> + pm_info->cppc_data = *cppc_data;
> +
> + if ( cppc_data->flags & XEN_CPPC_PSD )
> + if ( !check_psd_pminfo(cppc_data->shared_type) )
Please convert these into a single if().
> + {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if ( cppc_data->flags & XEN_CPPC_CPC )
> + {
> + if ( cppc_data->cpc.highest_perf == 0 ||
> + cppc_data->cpc.highest_perf > UINT8_MAX ||
> + cppc_data->cpc.nominal_perf == 0 ||
> + cppc_data->cpc.nominal_perf > UINT8_MAX ||
> + cppc_data->cpc.lowest_nonlinear_perf == 0 ||
> + cppc_data->cpc.lowest_nonlinear_perf > UINT8_MAX ||
> + cppc_data->cpc.lowest_perf == 0 ||
> + cppc_data->cpc.lowest_perf > UINT8_MAX ||
> + cppc_data->cpc.lowest_perf >
> + cppc_data->cpc.lowest_nonlinear_perf ||
> + cppc_data->cpc.lowest_nonlinear_perf >
> + cppc_data->cpc.nominal_perf ||
Indentation is a little odd here. Best may be to use parentheses:
cppc_data->cpc.lowest_perf > UINT8_MAX ||
(cppc_data->cpc.lowest_perf >
cppc_data->cpc.lowest_nonlinear_perf) ||
(cppc_data->cpc.lowest_nonlinear_perf >
cppc_data->cpc.nominal_perf) ||
Otherwise, strictly speaking, no extra indentation should be used. I can see
though that this would hamper readability, so the next best alternative would
appear to be to make the extra indentation a proper level (i.e. 4 blanks):
cppc_data->cpc.lowest_perf > UINT8_MAX ||
cppc_data->cpc.lowest_perf >
cppc_data->cpc.lowest_nonlinear_perf ||
cppc_data->cpc.lowest_nonlinear_perf >
cppc_data->cpc.nominal_perf ||
> + cppc_data->cpc.nominal_perf > cppc_data->cpc.highest_perf )
> + /*
> + * Right now, Xen doesn't actually use highest_perf/nominal_perf/
> + * lowest_nonlinear_perf/lowest_perf values read from ACPI _CPC
> + * table. Xen reads CPPC capability MSR to get these four values.
> + * So warning is enough.
> + */
> + printk_once(XENLOG_WARNING
> + "Broken CPPC perf values: lowest(%u), nonlinear_lowest(%u), nominal(%u), highest(%u)\n",
> + cppc_data->cpc.lowest_perf,
> + cppc_data->cpc.lowest_nonlinear_perf,
> + cppc_data->cpc.nominal_perf,
> + cppc_data->cpc.highest_perf);
> +
> + /* lowest_mhz and nominal_mhz are optional value */
> + if ( cppc_data->cpc.lowest_mhz > cppc_data->cpc.nominal_mhz )
If they're optional, what if lowest_mhz is provided but nominal_mhz isn't?
Wouldn't the warning needlessly trigger in that case?
> + {
> + printk_once(XENLOG_WARNING
> + "Broken CPPC freq values: lowest(%u), nominal(%u)\n",
> + cppc_data->cpc.lowest_mhz,
> + cppc_data->cpc.nominal_mhz);
> + /* Re-set with zero values, instead of keeping invalid values */
> + pm_info->cppc_data.cpc.nominal_mhz = 0;
> + pm_info->cppc_data.cpc.lowest_mhz = 0;
> + }
> + }
> +
> + if ( cppc_data->flags == (XEN_CPPC_PSD | XEN_CPPC_CPC) )
> + {
> + if ( cpufreq_verbose )
> + {
> + print_PSD(&pm_info->cppc_data.domain_info);
> + print_CPPC(&pm_info->cppc_data);
> + }
> +
> + pm_info->init = XEN_CPPC_INIT;
> + ret = cpufreq_cpu_init(cpuid);
> + if ( ret )
> + printk_once(XENLOG_WARNING
> + "CPU%u failed amd-cppc mode init; use \"cpufreq=xen\" instead",
> + cpuid);
cpuid is still int, so wants printing with %d.
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -363,6 +363,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
> #define XEN_PM_PX 1
> #define XEN_PM_TX 2
> #define XEN_PM_PDC 3
> +#define XEN_PM_CPPC 4
>
> /* Px sub info type */
> #define XEN_PX_PCT 1
> @@ -370,6 +371,10 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
> #define XEN_PX_PPC 4
> #define XEN_PX_PSD 8
>
> +/* CPPC sub info type */
> +#define XEN_CPPC_PSD 1
> +#define XEN_CPPC_CPC 2
As per this, ...
> @@ -457,6 +462,26 @@ struct xen_processor_performance {
> typedef struct xen_processor_performance xen_processor_performance_t;
> DEFINE_XEN_GUEST_HANDLE(xen_processor_performance_t);
>
> +struct xen_processor_cppc {
> + uint8_t flags; /* IN: XEN_CPPC_xxx */
... it's a type that's living here, not a collection of flags. Any reason the
field isn't named "type"?
> + uint8_t pad[3];
> + /*
> + * IN: Subset _CPC fields useful for CPPC-compatible cpufreq
> + * driver's initialization
> + */
> + struct {
> + uint32_t highest_perf;
> + uint32_t nominal_perf;
> + uint32_t lowest_nonlinear_perf;
> + uint32_t lowest_perf;
> + uint32_t lowest_mhz;
> + uint32_t nominal_mhz;
> + } cpc;
What, again, was the reason to wrap these into a sub-struct?
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-07-16 15:39 ` Jan Beulich
@ 2025-08-04 6:47 ` Penny, Zheng
2025-08-04 7:25 ` Jan Beulich
0 siblings, 1 reply; 66+ messages in thread
From: Penny, Zheng @ 2025-08-04 6:47 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Orzel, Michal, Julien Grall, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, July 16, 2025 11:39 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>;
> Anthony PERARD <anthony.perard@vates.tech>; Orzel, Michal
> <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to
> propagate CPPC data
>
> On 11.07.2025 05:50, Penny Zheng wrote:
> > + cppc_data->cpc.nominal_perf > cppc_data->cpc.highest_perf )
> > + /*
> > + * Right now, Xen doesn't actually use highest_perf/nominal_perf/
> > + * lowest_nonlinear_perf/lowest_perf values read from ACPI _CPC
> > + * table. Xen reads CPPC capability MSR to get these four values.
> > + * So warning is enough.
> > + */
> > + printk_once(XENLOG_WARNING
> > + "Broken CPPC perf values: lowest(%u), nonlinear_lowest(%u),
> nominal(%u), highest(%u)\n",
> > + cppc_data->cpc.lowest_perf,
> > + cppc_data->cpc.lowest_nonlinear_perf,
> > + cppc_data->cpc.nominal_perf,
> > + cppc_data->cpc.highest_perf);
> > +
> > + /* lowest_mhz and nominal_mhz are optional value */
> > + if ( cppc_data->cpc.lowest_mhz > cppc_data->cpc.nominal_mhz )
>
> If they're optional, what if lowest_mhz is provided but nominal_mhz isn't?
> Wouldn't the warning needlessly trigger in that case?
>
Yes, only both are provided, this check is meaningful
+ if ( cppc_data->cpc.nominal_mhz &&
+ cppc_data->cpc.lowest_mhz > cppc_data->cpc.nominal_mhz )
> > --- a/xen/include/public/platform.h
> > +++ b/xen/include/public/platform.h
> > @@ -363,6 +363,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
> > #define XEN_PM_PX 1
> > #define XEN_PM_TX 2
> > #define XEN_PM_PDC 3
> > +#define XEN_PM_CPPC 4
> >
> > /* Px sub info type */
> > #define XEN_PX_PCT 1
> > @@ -370,6 +371,10 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
> > #define XEN_PX_PPC 4
> > #define XEN_PX_PSD 8
> >
> > +/* CPPC sub info type */
> > +#define XEN_CPPC_PSD 1
> > +#define XEN_CPPC_CPC 2
>
> As per this, ...
>
> > @@ -457,6 +462,26 @@ struct xen_processor_performance { typedef
> > struct xen_processor_performance xen_processor_performance_t;
> > DEFINE_XEN_GUEST_HANDLE(xen_processor_performance_t);
> >
> > +struct xen_processor_cppc {
> > + uint8_t flags; /* IN: XEN_CPPC_xxx */
>
> ... it's a type that's living here, not a collection of flags. Any reason the field isn't
> named "type"?
>
It is a collection of flags. Only when both XEN_CPPC_PSD and XEN_CPPC_CPC are set, we could run cpufreq_cpu_init() to initialize cpufreq core.
> > + uint8_t pad[3];
> > + /*
> > + * IN: Subset _CPC fields useful for CPPC-compatible cpufreq
> > + * driver's initialization
> > + */
> > + struct {
> > + uint32_t highest_perf;
> > + uint32_t nominal_perf;
> > + uint32_t lowest_nonlinear_perf;
> > + uint32_t lowest_perf;
> > + uint32_t lowest_mhz;
> > + uint32_t nominal_mhz;
> > + } cpc;
>
> What, again, was the reason to wrap these into a sub-struct?
I want to make these fields differentiated from the other two (shared_type and domain_info), as sub-struct cpc contains _CPC field info, and the other two contains _PSD info
>
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data
2025-08-04 6:47 ` Penny, Zheng
@ 2025-08-04 7:25 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-08-04 7:25 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Orzel, Michal, Julien Grall, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 04.08.2025 08:47, Penny, Zheng wrote:
> [Public]
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, July 16, 2025 11:39 PM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>;
>> Anthony PERARD <anthony.perard@vates.tech>; Orzel, Michal
>> <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
>> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
>> Subject: Re: [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to
>> propagate CPPC data
>>
>> On 11.07.2025 05:50, Penny Zheng wrote:
>>> + cppc_data->cpc.nominal_perf > cppc_data->cpc.highest_perf )
>>> + /*
>>> + * Right now, Xen doesn't actually use highest_perf/nominal_perf/
>>> + * lowest_nonlinear_perf/lowest_perf values read from ACPI _CPC
>>> + * table. Xen reads CPPC capability MSR to get these four values.
>>> + * So warning is enough.
>>> + */
>>> + printk_once(XENLOG_WARNING
>>> + "Broken CPPC perf values: lowest(%u), nonlinear_lowest(%u),
>> nominal(%u), highest(%u)\n",
>>> + cppc_data->cpc.lowest_perf,
>>> + cppc_data->cpc.lowest_nonlinear_perf,
>>> + cppc_data->cpc.nominal_perf,
>>> + cppc_data->cpc.highest_perf);
>>> +
>>> + /* lowest_mhz and nominal_mhz are optional value */
>>> + if ( cppc_data->cpc.lowest_mhz > cppc_data->cpc.nominal_mhz )
>>
>> If they're optional, what if lowest_mhz is provided but nominal_mhz isn't?
>> Wouldn't the warning needlessly trigger in that case?
>>
>
> Yes, only both are provided, this check is meaningful
> + if ( cppc_data->cpc.nominal_mhz &&
> + cppc_data->cpc.lowest_mhz > cppc_data->cpc.nominal_mhz )
>
>>> --- a/xen/include/public/platform.h
>>> +++ b/xen/include/public/platform.h
>>> @@ -363,6 +363,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
>>> #define XEN_PM_PX 1
>>> #define XEN_PM_TX 2
>>> #define XEN_PM_PDC 3
>>> +#define XEN_PM_CPPC 4
>>>
>>> /* Px sub info type */
>>> #define XEN_PX_PCT 1
>>> @@ -370,6 +371,10 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletime_t);
>>> #define XEN_PX_PPC 4
>>> #define XEN_PX_PSD 8
>>>
>>> +/* CPPC sub info type */
>>> +#define XEN_CPPC_PSD 1
>>> +#define XEN_CPPC_CPC 2
>>
>> As per this, ...
>>
>>> @@ -457,6 +462,26 @@ struct xen_processor_performance { typedef
>>> struct xen_processor_performance xen_processor_performance_t;
>>> DEFINE_XEN_GUEST_HANDLE(xen_processor_performance_t);
>>>
>>> +struct xen_processor_cppc {
>>> + uint8_t flags; /* IN: XEN_CPPC_xxx */
>>
>> ... it's a type that's living here, not a collection of flags. Any reason the field isn't
>> named "type"?
>
> It is a collection of flags. Only when both XEN_CPPC_PSD and XEN_CPPC_CPC are set, we could run cpufreq_cpu_init() to initialize cpufreq core.
Hmm, right. The next legitimate XEN_CPPC_* value to use would be 4, not 3.
That's not visible from how things are defined, though. May I suggest that
you use
/* CPPC sub info type */
#define XEN_CPPC_PSD (1U << 0)
#define XEN_CPPC_CPC (1U << 1)
instead then?
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (9 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 10/19] xen/cpufreq: introduce new sub-hypercall to propagate CPPC data Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-16 15:59 ` Jan Beulich
2025-07-11 3:50 ` [PATCH v6 12/19] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode Penny Zheng
` (7 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Andrew Cooper, Anthony PERARD,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
Users need to set "cpufreq=amd-cppc" in xen cmdline to enable amd-cppc driver,
which selects ACPI Collaborative Performance and Power Control (CPPC) on
supported AMD hardware to provide a finer grained frequency control mechanism.
`verbose` option can also be included to support verbose print.
When users setting "cpufreq=amd-cppc", a new amd-cppc driver
shall be registered and used. All hooks for amd-cppc driver are transiently
missing, and we temporarily make registration fail with -EOPNOTSUPP here. It
will be fixed along with the implementation.
New xen-pm internal flag XEN_PROCESSOR_PM_CPPC is introduced, to stand for
cpufreq driver in CPPC mode. We define XEN_PROCESSOR_PM_CPPC 0x100, as it is
the next value to use after 8-bits wide public xen-pm options. We also add
sanity check on compile time. All XEN_PROCESSOR_PM_xxx checking shall be
updated to consider "XEN_PROCESSOR_PM_CPPC" too.
XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX are firstly set when Xen parsed
relative driver signature from xen cmdline, and will become exclusive after
cpufreq driver registration. It is because that platform could not support
both or mixed mode (CPPC & legacy Px) operations, and only one cpufreq driver
could be registerd in Xen at one time, such as on AMD, it is either amd-cppc
or legacy P-states driver.
Xen rely on XEN_PROCESSOR_PM_CPPC flag to tell current cpufreq driver is in
CPPC mode, and accepts relative hypercall. It will neglect Px request and
yields success.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Obey to alphabetic sorting and also strict it with CONFIG_AMD
- Remove unnecessary empty comment line
- Use __initconst_cf_clobber for pre-filled structure cpufreq_driver
- Make new switch-case code apply to Hygon CPUs too
- Change ENOSYS with EOPNOTSUPP
- Blanks around binary operator
- Change all amd_/-pstate defined values to amd_/-cppc
---
v2 -> v3
- refactor too long lines
- Make sure XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC incompatible flags
after cpufreq register registrantion
---
v3 -> v4:
- introduce XEN_PROCESSOR_PM_CPPC in xen internal header
- complement "Hygon" in log message
- remove unnecessary if()
- grow cpufreq_xen_opts[] array
---
v4 -> v5:
- remove XEN_PROCESSOR_PM_xxx flag sanitization from individual driver
- prefer ! over "== 0" in purely boolean contexts
- Blank line between non-fall-through case blocks
- add build-time checking between internal and public XEN_PROCESSOR_PM_*
values
- define XEN_PROCESSOR_PM_CPPC with 0x100, as it is the next value to use
after public interface, and public mask SIF_PM_MASK is 8 bits wide.
- as Dom0 will send the CPPC/Px data whenever it could, the return value shall
be 0 instead of -ENOSYS/EOPNOTSUP when platform doesn't require these data.
---
v5 -> v6:
- do not register the driver when all hooks are NULL
- refactor the subject and commit message
- move pruning of xen_processor_pmbits into generic space
- add comment and build-time check for XEN_PROCESSOR_PM_CPPC
---
docs/misc/xen-command-line.pandoc | 7 ++-
xen/arch/x86/acpi/cpufreq/Makefile | 1 +
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 59 +++++++++++++++++++
xen/arch/x86/acpi/cpufreq/cpufreq.c | 72 ++++++++++++++++++++++-
xen/arch/x86/platform_hypercall.c | 14 +++++
xen/drivers/acpi/pm-op.c | 3 +-
xen/drivers/acpi/pmstat.c | 3 +
xen/drivers/cpufreq/cpufreq.c | 11 ++++
xen/include/acpi/cpufreq/cpufreq.h | 6 +-
xen/include/acpi/cpufreq/processor_perf.h | 10 ++++
10 files changed, 180 insertions(+), 6 deletions(-)
create mode 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 6865a61220..03761d9e3c 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -515,7 +515,7 @@ If set, force use of the performance counters for oprofile, rather than detectin
available support.
### cpufreq
-> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]]`
+> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[verbose]]`
> Default: `xen`
@@ -526,7 +526,7 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
* `<maxfreq>` and `<minfreq>` are integers which represent max and min processor frequencies
respectively.
* `verbose` option can be included as a string or also as `verbose=<integer>`
- for `xen`. It is a boolean for `hwp`.
+ for `xen`. It is a boolean for `hwp` and `amd-cppc`.
* `hwp` selects Hardware-Controlled Performance States (HWP) on supported Intel
hardware. HWP is a Skylake+ feature which provides better CPU power
management. The default is disabled. If `hwp` is selected, but hardware
@@ -534,6 +534,9 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC). HDC enables the
processor to autonomously force physical package components into idle state.
The default is enabled, but the option only applies when `hwp` is enabled.
+* `amd-cppc` selects ACPI Collaborative Performance and Power Control (CPPC)
+ on supported AMD hardware to provide finer grained frequency control
+ mechanism. The default is disabled.
There is also support for `;`-separated fallback options:
`cpufreq=hwp;xen,verbose`. This first tries `hwp` and falls back to `xen` if
diff --git a/xen/arch/x86/acpi/cpufreq/Makefile b/xen/arch/x86/acpi/cpufreq/Makefile
index e7dbe434a8..a2ba34bda0 100644
--- a/xen/arch/x86/acpi/cpufreq/Makefile
+++ b/xen/arch/x86/acpi/cpufreq/Makefile
@@ -1,4 +1,5 @@
obj-$(CONFIG_INTEL) += acpi.o
+obj-$(CONFIG_AMD) += amd-cppc.o
obj-y += cpufreq.o
obj-$(CONFIG_INTEL) += hwp.o
obj-$(CONFIG_AMD) += powernow.o
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
new file mode 100644
index 0000000000..3377783f7e
--- /dev/null
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * amd-cppc.c - AMD Processor CPPC Frequency Driver
+ *
+ * Copyright (C) 2025 Advanced Micro Devices, Inc. All Rights Reserved.
+ *
+ * Author: Penny Zheng <penny.zheng@amd.com>
+ *
+ * AMD CPPC cpufreq driver introduces a new CPU performance scaling design
+ * for AMD processors using the ACPI Collaborative Performance and Power
+ * Control (CPPC) feature which provides finer grained frequency control range.
+ */
+
+#include <xen/domain.h>
+#include <xen/init.h>
+#include <xen/param.h>
+#include <acpi/cpufreq/cpufreq.h>
+
+static bool __init amd_cppc_handle_option(const char *s, const char *end)
+{
+ int ret;
+
+ ret = parse_boolean("verbose", s, end);
+ if ( ret >= 0 )
+ {
+ cpufreq_verbose = ret;
+ return true;
+ }
+
+ return false;
+}
+
+int __init amd_cppc_cmdline_parse(const char *s, const char *e)
+{
+ do {
+ const char *end = strpbrk(s, ",;");
+
+ if ( !amd_cppc_handle_option(s, end) )
+ {
+ printk(XENLOG_WARNING
+ "cpufreq/amd-cppc: option '%.*s' not recognized\n",
+ (int)((end ?: e) - s), s);
+
+ return -EINVAL;
+ }
+
+ s = end ? end + 1 : NULL;
+ } while ( s && s < e );
+
+ return 0;
+}
+
+int __init amd_cppc_register_driver(void)
+{
+ if ( !cpu_has_cppc )
+ return -ENODEV;
+
+ return -EOPNOTSUPP;
+}
diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 45f301f354..b33699ef13 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -128,12 +128,14 @@ static int __init cf_check cpufreq_driver_init(void)
if ( cpufreq_controller == FREQCTL_xen )
{
+ unsigned int i = 0;
+
switch ( boot_cpu_data.x86_vendor )
{
case X86_VENDOR_INTEL:
ret = -ENOENT;
- for ( unsigned int i = 0; i < cpufreq_xen_cnt; i++ )
+ for ( i = 0; i < cpufreq_xen_cnt; i++ )
{
switch ( cpufreq_xen_opts[i] )
{
@@ -148,6 +150,11 @@ static int __init cf_check cpufreq_driver_init(void)
case CPUFREQ_none:
ret = 0;
break;
+
+ default:
+ printk(XENLOG_WARNING
+ "Unsupported cpufreq driver for vendor Intel\n");
+ break;
}
if ( !ret || ret == -EBUSY )
@@ -157,9 +164,70 @@ static int __init cf_check cpufreq_driver_init(void)
case X86_VENDOR_AMD:
case X86_VENDOR_HYGON:
- ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -ENODEV;
+ if ( !IS_ENABLED(CONFIG_AMD) )
+ {
+ ret = -ENODEV;
+ break;
+ }
+ ret = -ENOENT;
+
+ for ( i = 0; i < cpufreq_xen_cnt; i++ )
+ {
+ switch ( cpufreq_xen_opts[i] )
+ {
+ case CPUFREQ_xen:
+ ret = powernow_register_driver();
+ break;
+
+ case CPUFREQ_amd_cppc:
+ ret = amd_cppc_register_driver();
+ break;
+
+ case CPUFREQ_none:
+ ret = 0;
+ break;
+
+ default:
+ printk(XENLOG_WARNING
+ "Unsupported cpufreq driver for vendor AMD or Hygon\n");
+ break;
+ }
+
+ if ( !ret || ret == -EBUSY )
+ break;
+ }
+
break;
}
+
+ /*
+ * After successful cpufreq driver registeration, XEN_PROCESSOR_PM_CPPC
+ * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
+ */
+ if ( !ret )
+ {
+ ASSERT(i < cpufreq_xen_cnt);
+ switch ( cpufreq_xen_opts[i] )
+ {
+ case CPUFREQ_amd_cppc:
+ xen_processor_pmbits &= ~XEN_PROCESSOR_PM_PX;
+ break;
+
+ case CPUFREQ_hwp:
+ case CPUFREQ_xen:
+ xen_processor_pmbits &= ~XEN_PROCESSOR_PM_CPPC;
+ break;
+
+ default:
+ break;
+ }
+ } else if ( ret != -EBUSY )
+ /*
+ * No cpufreq driver gets registered, clear both
+ * XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX
+ */
+ xen_processor_pmbits &= ~(XEN_PROCESSOR_PM_CPPC |
+ XEN_PROCESSOR_PM_PX);
}
return ret;
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 42b3b8b95a..cf64b8a622 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -546,6 +546,8 @@ ret_t do_platform_op(
ret = 0;
break;
}
+ /* Xen doesn't support mixed mode */
+ ASSERT(!(xen_processor_pmbits & XEN_PROCESSOR_PM_CPPC));
ret = set_px_pminfo(op->u.set_pminfo.id, &op->u.set_pminfo.u.perf);
break;
@@ -578,6 +580,18 @@ ret_t do_platform_op(
}
case XEN_PM_CPPC:
+ if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_CPPC) )
+ {
+ /*
+ * Neglect CPPC-info when registered cpufreq driver
+ * isn't in CPPC mode
+ */
+ ret = 0;
+ break;
+ }
+ /* Xen doesn't support mixed mode */
+ ASSERT(!(xen_processor_pmbits & XEN_PROCESSOR_PM_PX));
+
ret = set_cppc_pminfo(op->u.set_pminfo.id,
&op->u.set_pminfo.u.cppc_data);
break;
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 49b4067dec..d10f6db0e4 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -350,7 +350,8 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
switch ( op->cmd & PM_PARA_CATEGORY_MASK )
{
case CPUFREQ_PARA:
- if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
+ if ( !(xen_processor_pmbits & (XEN_PROCESSOR_PM_PX |
+ XEN_PROCESSOR_PM_CPPC)) )
return -ENODEV;
if ( !pmpt || !(pmpt->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
return -EINVAL;
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index da7a1f81e1..e4e62966de 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -107,6 +107,9 @@ int cpufreq_statistic_init(unsigned int cpu)
if ( !pmpt )
return -EINVAL;
+ if ( !(pmpt->init & XEN_PX_INIT) )
+ return 0;
+
spin_lock(cpufreq_statistic_lock);
pxpt = per_cpu(cpufreq_statistic_data, cpu);
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 065fdf4106..cf1fcf1d22 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -98,6 +98,10 @@ static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
switch ( option )
{
+ case CPUFREQ_amd_cppc:
+ xen_processor_pmbits |= XEN_PROCESSOR_PM_CPPC;
+ break;
+
case CPUFREQ_hwp:
case CPUFREQ_xen:
xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
@@ -166,6 +170,13 @@ static int __init cf_check setup_cpufreq_option(const char *str)
if ( !ret && arg[0] && arg[1] )
ret = hwp_cmdline_parse(arg + 1, end);
}
+ else if ( IS_ENABLED(CONFIG_AMD) && choice < 0 &&
+ !cmdline_strcmp(str, "amd-cppc") )
+ {
+ ret = handle_cpufreq_cmdline(CPUFREQ_amd_cppc);
+ if ( !ret && arg[0] && arg[1] )
+ ret = amd_cppc_cmdline_parse(arg + 1, end);
+ }
else
ret = -EINVAL;
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 7f26b5653a..32cf905fb8 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -26,8 +26,9 @@ enum cpufreq_xen_opt {
CPUFREQ_none,
CPUFREQ_xen,
CPUFREQ_hwp,
+ CPUFREQ_amd_cppc,
};
-#define NR_CPUFREQ_OPTS 2
+#define NR_CPUFREQ_OPTS 3
extern enum cpufreq_xen_opt cpufreq_xen_opts[NR_CPUFREQ_OPTS];
extern unsigned int cpufreq_xen_cnt;
struct cpufreq_governor;
@@ -273,4 +274,7 @@ int set_hwp_para(struct cpufreq_policy *policy,
int acpi_cpufreq_register(void);
+int amd_cppc_cmdline_parse(const char *s, const char *e);
+int amd_cppc_register_driver(void);
+
#endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/acpi/cpufreq/processor_perf.h b/xen/include/acpi/cpufreq/processor_perf.h
index f80495fc96..6d8d29d440 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -5,6 +5,16 @@
#include <public/sysctl.h>
#include <xen/acpi.h>
+/*
+ * Internal xen-pm options
+ * They are extension to public xen-pm options (XEN_PROCESSOR_PM_xxx) defined
+ * in public/platform.h, guarded by SIF_PM_MASK
+ */
+#define XEN_PROCESSOR_PM_CPPC 0x100
+#if XEN_PROCESSOR_PM_CPPC & MASK_EXTR(~0, SIF_PM_MASK)
+# error "XEN_PROCESSOR_PM_CPPC shall not occupy bits reserved for public xen-pm options"
+#endif
+
#define XEN_CPPC_INIT 0x40000000U
#define XEN_PX_INIT 0x80000000U
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-07-11 3:50 ` [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver Penny Zheng
@ 2025-07-16 15:59 ` Jan Beulich
2025-08-04 8:09 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-16 15:59 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> @@ -128,12 +128,14 @@ static int __init cf_check cpufreq_driver_init(void)
>
> if ( cpufreq_controller == FREQCTL_xen )
> {
> + unsigned int i = 0;
Pointless initializer; both for() loops set i to 0. But also see further
down.
> @@ -157,9 +164,70 @@ static int __init cf_check cpufreq_driver_init(void)
>
> case X86_VENDOR_AMD:
> case X86_VENDOR_HYGON:
> - ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -ENODEV;
> + if ( !IS_ENABLED(CONFIG_AMD) )
> + {
> + ret = -ENODEV;
> + break;
> + }
> + ret = -ENOENT;
The code structure is sufficiently different from the Intel counterpart for
this to perhaps better move ...
> + for ( i = 0; i < cpufreq_xen_cnt; i++ )
> + {
> + switch ( cpufreq_xen_opts[i] )
> + {
> + case CPUFREQ_xen:
> + ret = powernow_register_driver();
> + break;
> +
> + case CPUFREQ_amd_cppc:
> + ret = amd_cppc_register_driver();
> + break;
> +
> + case CPUFREQ_none:
> + ret = 0;
> + break;
> +
> + default:
> + printk(XENLOG_WARNING
> + "Unsupported cpufreq driver for vendor AMD or Hygon\n");
> + break;
... here.
> + }
> +
> + if ( !ret || ret == -EBUSY )
> + break;
> + }
> +
> break;
> }
> +
> + /*
> + * After successful cpufreq driver registeration, XEN_PROCESSOR_PM_CPPC
> + * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
> + */
> + if ( !ret )
> + {
> + ASSERT(i < cpufreq_xen_cnt);
> + switch ( cpufreq_xen_opts[i] )
Hmm, this is using the the initializer of i that I commented on. I think there's
another default: case missing, where you simply "return 0" (to retain prior
behavior). But again see also yet further down.
> + {
> + case CPUFREQ_amd_cppc:
> + xen_processor_pmbits &= ~XEN_PROCESSOR_PM_PX;
> + break;
> +
> + case CPUFREQ_hwp:
> + case CPUFREQ_xen:
> + xen_processor_pmbits &= ~XEN_PROCESSOR_PM_CPPC;
> + break;
> +
> + default:
> + break;
> + }
> + } else if ( ret != -EBUSY )
Nit (style): Closing brace wants to be on its own line.
> + /*
> + * No cpufreq driver gets registered, clear both
> + * XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX
> + */
> + xen_processor_pmbits &= ~(XEN_PROCESSOR_PM_CPPC |
> + XEN_PROCESSOR_PM_PX);
Yet more hmm - this path you want to get through for the case mentioned above.
But only this code; specifically not the "switch ( cpufreq_xen_opts[i] )",
which really is "switch ( cpufreq_xen_opts[0] )" in that case, and that's
pretty clearly wrong to evaluate in then.
> --- a/xen/drivers/acpi/pmstat.c
> +++ b/xen/drivers/acpi/pmstat.c
> @@ -107,6 +107,9 @@ int cpufreq_statistic_init(unsigned int cpu)
> if ( !pmpt )
> return -EINVAL;
>
> + if ( !(pmpt->init & XEN_PX_INIT) )
> + return 0;
> +
> spin_lock(cpufreq_statistic_lock);
>
> pxpt = per_cpu(cpufreq_statistic_data, cpu);
This change could do with a code comment, I think.
> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -98,6 +98,10 @@ static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
> cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
> switch ( option )
> {
> + case CPUFREQ_amd_cppc:
> + xen_processor_pmbits |= XEN_PROCESSOR_PM_CPPC;
> + break;
> +
> case CPUFREQ_hwp:
> case CPUFREQ_xen:
> xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
Unless they're clearly "more important" (tm), please can insertions like
this not be done at the top of a switch() (or whatever else it is)? You
don't do so ...
> @@ -166,6 +170,13 @@ static int __init cf_check setup_cpufreq_option(const char *str)
> if ( !ret && arg[0] && arg[1] )
> ret = hwp_cmdline_parse(arg + 1, end);
> }
> + else if ( IS_ENABLED(CONFIG_AMD) && choice < 0 &&
> + !cmdline_strcmp(str, "amd-cppc") )
> + {
> + ret = handle_cpufreq_cmdline(CPUFREQ_amd_cppc);
> + if ( !ret && arg[0] && arg[1] )
> + ret = amd_cppc_cmdline_parse(arg + 1, end);
> + }
> else
> ret = -EINVAL;
... here, for example.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-07-16 15:59 ` Jan Beulich
@ 2025-08-04 8:09 ` Penny, Zheng
2025-08-04 8:48 ` Jan Beulich
2025-08-04 8:48 ` Jan Beulich
0 siblings, 2 replies; 66+ messages in thread
From: Penny, Zheng @ 2025-08-04 8:09 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Andrew Cooper, Anthony PERARD, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, July 17, 2025 12:00 AM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline
> and amd-cppc driver
>
> On 11.07.2025 05:50, Penny Zheng wrote:
> > --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> > +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> > @@ -128,12 +128,14 @@ static int __init cf_check
> > cpufreq_driver_init(void)
> >
> > if ( cpufreq_controller == FREQCTL_xen )
> > {
> > + unsigned int i = 0;
>
> Pointless initializer; both for() loops set i to 0. But also see further down.
>
> > @@ -157,9 +164,70 @@ static int __init cf_check
> > cpufreq_driver_init(void)
> >
> > case X86_VENDOR_AMD:
> > case X86_VENDOR_HYGON:
> > - ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -
> ENODEV;
> > + if ( !IS_ENABLED(CONFIG_AMD) )
> > + {
> > + ret = -ENODEV;
> > + break;
> > + }
> > + ret = -ENOENT;
>
> The code structure is sufficiently different from the Intel counterpart for this to
> perhaps better move ...
>
> > + for ( i = 0; i < cpufreq_xen_cnt; i++ )
> > + {
> > + switch ( cpufreq_xen_opts[i] )
> > + {
> > + case CPUFREQ_xen:
> > + ret = powernow_register_driver();
> > + break;
> > +
> > + case CPUFREQ_amd_cppc:
> > + ret = amd_cppc_register_driver();
> > + break;
> > +
> > + case CPUFREQ_none:
> > + ret = 0;
> > + break;
> > +
> > + default:
> > + printk(XENLOG_WARNING
> > + "Unsupported cpufreq driver for vendor AMD or Hygon\n");
> > + break;
>
> ... here.
>
Are we suggesting moving
"
if ( !IS_ENABLED(CONFIG_AMD) )
{
ret = -ENODEV;
break;
}
" here? In which case, When CONFIG_AMD=n and users doesn't provide "cpufreq=xxx", we will have cpufreq_xen_cnt initialized as 1 and cpufreq_xen_opts[0] = CPUFREQ_xen. powernow_register_driver() hence gets invoked. The thing is that we don't have stub for it and it is compiled under CONFIG_AMD
I suggest to change to use #ifdef CONFIG_AMD code wrapping
> > + }
> > +
> > + if ( !ret || ret == -EBUSY )
> > + break;
> > + }
> > +
> > break;
> > }
> > +
> > + /*
> > + * After successful cpufreq driver registeration,
> XEN_PROCESSOR_PM_CPPC
> > + * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
> > + */
> > + if ( !ret )
> > + {
> > + ASSERT(i < cpufreq_xen_cnt);
> > + switch ( cpufreq_xen_opts[i] )
>
> Hmm, this is using the the initializer of i that I commented on. I think there's
> another default: case missing, where you simply "return 0" (to retain prior behavior).
> But again see also yet further down.
>
>
> > + /*
> > + * No cpufreq driver gets registered, clear both
> > + * XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX
> > + */
> > + xen_processor_pmbits &= ~(XEN_PROCESSOR_PM_CPPC |
> > + XEN_PROCESSOR_PM_PX);
>
> Yet more hmm - this path you want to get through for the case mentioned above.
> But only this code; specifically not the "switch ( cpufreq_xen_opts[i] )", which really
> is "switch ( cpufreq_xen_opts[0] )" in that case, and that's pretty clearly wrong to
> evaluate in then.
>
Correct me if I understand you wrongly:
The above "case missing" , are we talking about is entering "case CPUFREQ_none" ?
IMO, it may never be entered. If users doesn't provide "cpufreq=xxx", we will have cpufreq_xen_cnt initialized as 1 and cpufreq_xen_opts[0] = CPUFREQ_xen. That is, we will have px states as default driver. Even if we have failed px-driver initialization, with cpufreq_xen_cnt limited to 1, we will not enter CPUFREQ_none.
CPUFREQ_none only could be set when users explicitly set "cpufreq=disabled/none/0", but in which case, cpufreq_controller will be set with FREQCTL_none. And the whole cpufreq_driver_init() is under " cpufreq_controller == FREQCTL_xen " condition
Or "case missing" is referring entering default case? In which case, we will have -ENOENT errno. As we have ret=-ENOENT in the very beginning
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-08-04 8:09 ` Penny, Zheng
@ 2025-08-04 8:48 ` Jan Beulich
2025-08-05 6:31 ` Penny, Zheng
2025-08-04 8:48 ` Jan Beulich
1 sibling, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-08-04 8:48 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Andrew Cooper, Anthony PERARD, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 04.08.2025 10:09, Penny, Zheng wrote:
> [Public]
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, July 17, 2025 12:00 AM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
>> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline
>> and amd-cppc driver
>>
>> On 11.07.2025 05:50, Penny Zheng wrote:
>>> --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
>>> +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
>>> @@ -128,12 +128,14 @@ static int __init cf_check
>>> cpufreq_driver_init(void)
>>>
>>> if ( cpufreq_controller == FREQCTL_xen )
>>> {
>>> + unsigned int i = 0;
>>
>> Pointless initializer; both for() loops set i to 0. But also see further down.
>>
>>> @@ -157,9 +164,70 @@ static int __init cf_check
>>> cpufreq_driver_init(void)
>>>
>>> case X86_VENDOR_AMD:
>>> case X86_VENDOR_HYGON:
>>> - ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -
>> ENODEV;
>>> + if ( !IS_ENABLED(CONFIG_AMD) )
>>> + {
>>> + ret = -ENODEV;
>>> + break;
>>> + }
>>> + ret = -ENOENT;
>>
>> The code structure is sufficiently different from the Intel counterpart for this to
>> perhaps better move ...
>>
>>> + for ( i = 0; i < cpufreq_xen_cnt; i++ )
>>> + {
>>> + switch ( cpufreq_xen_opts[i] )
>>> + {
>>> + case CPUFREQ_xen:
>>> + ret = powernow_register_driver();
>>> + break;
>>> +
>>> + case CPUFREQ_amd_cppc:
>>> + ret = amd_cppc_register_driver();
>>> + break;
>>> +
>>> + case CPUFREQ_none:
>>> + ret = 0;
>>> + break;
>>> +
>>> + default:
>>> + printk(XENLOG_WARNING
>>> + "Unsupported cpufreq driver for vendor AMD or Hygon\n");
>>> + break;
>>
>> ... here.
>>
>
> Are we suggesting moving
> "
> if ( !IS_ENABLED(CONFIG_AMD) )
> {
> ret = -ENODEV;
> break;
> }
> " here? In which case, When CONFIG_AMD=n and users doesn't provide "cpufreq=xxx", we will have cpufreq_xen_cnt initialized as 1 and cpufreq_xen_opts[0] = CPUFREQ_xen. powernow_register_driver() hence gets invoked. The thing is that we don't have stub for it and it is compiled under CONFIG_AMD
> I suggest to change to use #ifdef CONFIG_AMD code wrapping
>
>>> + }
>>> +
>>> + if ( !ret || ret == -EBUSY )
>>> + break;
>>> + }
>>> +
>>> break;
>>> }
>>> +
>>> + /*
>>> + * After successful cpufreq driver registeration,
>> XEN_PROCESSOR_PM_CPPC
>>> + * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
>>> + */
>>> + if ( !ret )
>>> + {
>>> + ASSERT(i < cpufreq_xen_cnt);
>>> + switch ( cpufreq_xen_opts[i] )
>>
>> Hmm, this is using the the initializer of i that I commented on. I think there's
>> another default: case missing, where you simply "return 0" (to retain prior behavior).
>> But again see also yet further down.
>>
>>
>>> + /*
>>> + * No cpufreq driver gets registered, clear both
>>> + * XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX
>>> + */
>>> + xen_processor_pmbits &= ~(XEN_PROCESSOR_PM_CPPC |
>>> + XEN_PROCESSOR_PM_PX);
>>
>> Yet more hmm - this path you want to get through for the case mentioned above.
>> But only this code; specifically not the "switch ( cpufreq_xen_opts[i] )", which really
>> is "switch ( cpufreq_xen_opts[0] )" in that case, and that's pretty clearly wrong to
>> evaluate in then.
>
> Correct me if I understand you wrongly:
> The above "case missing" , are we talking about is entering "case CPUFREQ_none" ?
> IMO, it may never be entered. If users doesn't provide "cpufreq=xxx", we will have cpufreq_xen_cnt initialized as 1 and cpufreq_xen_opts[0] = CPUFREQ_xen. That is, we will have px states as default driver. Even if we have failed px-driver initialization, with cpufreq_xen_cnt limited to 1, we will not enter CPUFREQ_none.
> CPUFREQ_none only could be set when users explicitly set "cpufreq=disabled/none/0", but in which case, cpufreq_controller will be set with FREQCTL_none. And the whole cpufreq_driver_init() is under " cpufreq_controller == FREQCTL_xen " condition
> Or "case missing" is referring entering default case? In which case, we will have -ENOENT errno. As we have ret=-ENOENT in the very beginning
Sorry, this is hard to follow. Plus I think I made the main requirement quite
clear: You want to "retain prior behavior" for all cases you don't deliberately
change to accommodate the new driver. Plus you want to watch out for pre-
existing incorrect behavior: Rather than proliferating any, such would want
adjusting.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-08-04 8:48 ` Jan Beulich
@ 2025-08-05 6:31 ` Penny, Zheng
2025-08-05 7:42 ` Jan Beulich
0 siblings, 1 reply; 66+ messages in thread
From: Penny, Zheng @ 2025-08-05 6:31 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Andrew Cooper, Anthony PERARD, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, August 4, 2025 4:48 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline
> and amd-cppc driver
>
> On 04.08.2025 10:09, Penny, Zheng wrote:
> > [Public]
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Thursday, July 17, 2025 12:00 AM
> >> To: Penny, Zheng <penny.zheng@amd.com>
> >> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> >> <andrew.cooper3@citrix.com>; Anthony PERARD
> >> <anthony.perard@vates.tech>; Orzel, Michal <Michal.Orzel@amd.com>;
> >> Julien Grall <julien@xen.org>; Roger Pau Monné
> >> <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> >> xen- devel@lists.xenproject.org
> >> Subject: Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc"
> >> xen cmdline and amd-cppc driver
> >>
> >> On 11.07.2025 05:50, Penny Zheng wrote:
> >>> --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> >>> +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> >>> @@ -128,12 +128,14 @@ static int __init cf_check
> >>> cpufreq_driver_init(void)
> >>>
> >>> if ( cpufreq_controller == FREQCTL_xen )
> >>> {
> >>> + unsigned int i = 0;
> >>
> >> Pointless initializer; both for() loops set i to 0. But also see further down.
> >>
> >>> @@ -157,9 +164,70 @@ static int __init cf_check
> >>> cpufreq_driver_init(void)
> >>>
> >>> case X86_VENDOR_AMD:
> >>> case X86_VENDOR_HYGON:
> >>> - ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -
> >> ENODEV;
> >>> + if ( !IS_ENABLED(CONFIG_AMD) )
> >>> + {
> >>> + ret = -ENODEV;
> >>> + break;
> >>> + }
> >>> + ret = -ENOENT;
> >>
> >> The code structure is sufficiently different from the Intel
> >> counterpart for this to perhaps better move ...
> >>
> >>> + for ( i = 0; i < cpufreq_xen_cnt; i++ )
> >>> + {
> >>> + switch ( cpufreq_xen_opts[i] )
> >>> + {
> >>> + case CPUFREQ_xen:
> >>> + ret = powernow_register_driver();
> >>> + break;
> >>> +
> >>> + case CPUFREQ_amd_cppc:
> >>> + ret = amd_cppc_register_driver();
> >>> + break;
> >>> +
> >>> + case CPUFREQ_none:
> >>> + ret = 0;
> >>> + break;
> >>> +
> >>> + default:
> >>> + printk(XENLOG_WARNING
> >>> + "Unsupported cpufreq driver for vendor AMD or Hygon\n");
> >>> + break;
> >>
> >> ... here.
> >>
> >
> > Are we suggesting moving
> > "
> > if ( !IS_ENABLED(CONFIG_AMD) )
> > {
> > ret = -ENODEV;
> > break;
> > }
> > " here? In which case, When CONFIG_AMD=n and users doesn't provide
> > "cpufreq=xxx", we will have cpufreq_xen_cnt initialized as 1 and
> > cpufreq_xen_opts[0] = CPUFREQ_xen. powernow_register_driver() hence
> > gets invoked. The thing is that we don't have stub for it and it is
> > compiled under CONFIG_AMD I suggest to change to use #ifdef CONFIG_AMD
> > code wrapping
> >
> >>> + }
> >>> +
> >>> + if ( !ret || ret == -EBUSY )
> >>> + break;
> >>> + }
> >>> +
> >>> break;
> >>> }
> >>> +
> >>> + /*
> >>> + * After successful cpufreq driver registeration,
> >> XEN_PROCESSOR_PM_CPPC
> >>> + * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
> >>> + */
> >>> + if ( !ret )
> >>> + {
> >>> + ASSERT(i < cpufreq_xen_cnt);
> >>> + switch ( cpufreq_xen_opts[i] )
> >>
> >> Hmm, this is using the the initializer of i that I commented on. I
> >> think there's another default: case missing, where you simply "return 0" (to
> retain prior behavior).
> >> But again see also yet further down.
> >>
> >>
> >>> + /*
> >>> + * No cpufreq driver gets registered, clear both
> >>> + * XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX
> >>> + */
> >>> + xen_processor_pmbits &= ~(XEN_PROCESSOR_PM_CPPC |
> >>> + XEN_PROCESSOR_PM_PX);
> >>
> >> Yet more hmm - this path you want to get through for the case mentioned above.
> >> But only this code; specifically not the "switch (
> >> cpufreq_xen_opts[i] )", which really is "switch ( cpufreq_xen_opts[0]
> >> )" in that case, and that's pretty clearly wrong to evaluate in then.
> >
> > Correct me if I understand you wrongly:
> > The above "case missing" , are we talking about is entering "case
> CPUFREQ_none" ?
> > IMO, it may never be entered. If users doesn't provide "cpufreq=xxx", we will
> have cpufreq_xen_cnt initialized as 1 and cpufreq_xen_opts[0] = CPUFREQ_xen.
> That is, we will have px states as default driver. Even if we have failed px-driver
> initialization, with cpufreq_xen_cnt limited to 1, we will not enter CPUFREQ_none.
> > CPUFREQ_none only could be set when users explicitly set
> > "cpufreq=disabled/none/0", but in which case, cpufreq_controller will
> > be set with FREQCTL_none. And the whole cpufreq_driver_init() is under
> > " cpufreq_controller == FREQCTL_xen " condition Or "case missing" is
> > referring entering default case? In which case, we will have -ENOENT
> > errno. As we have ret=-ENOENT in the very beginning
>
> Sorry, this is hard to follow. Plus I think I made the main requirement quite
> clear: You want to "retain prior behavior" for all cases you don't deliberately change
> to accommodate the new driver. Plus you want to watch out for pre- existing
> incorrect behavior: Rather than proliferating any, such would want adjusting.
>
I was trying to follow "there's another default: case missing, where you simply "return 0" (to retain prior behavior ) ",
The missing "default :" is referring the one for "switch ( boot_cpu_data.x86_vendor )"? (I thought it referred " switch ( cpufreq_xen_opts[i] ) " ....)
It is a pre- existing incorrect behavior which I shall create a new commit to fix it firstly
I'll add an -ENOENTRY initializer for ret at the very beginning , and complement the missing default: entry with "Unsupported vendor..." error log
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-08-05 6:31 ` Penny, Zheng
@ 2025-08-05 7:42 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-08-05 7:42 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Andrew Cooper, Anthony PERARD, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 05.08.2025 08:31, Penny, Zheng wrote:
> [Public]
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, August 4, 2025 4:48 PM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
>> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline
>> and amd-cppc driver
>>
>> On 04.08.2025 10:09, Penny, Zheng wrote:
>>> [Public]
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Thursday, July 17, 2025 12:00 AM
>>>> To: Penny, Zheng <penny.zheng@amd.com>
>>>> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
>>>> <andrew.cooper3@citrix.com>; Anthony PERARD
>>>> <anthony.perard@vates.tech>; Orzel, Michal <Michal.Orzel@amd.com>;
>>>> Julien Grall <julien@xen.org>; Roger Pau Monné
>>>> <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
>>>> xen- devel@lists.xenproject.org
>>>> Subject: Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc"
>>>> xen cmdline and amd-cppc driver
>>>>
>>>> On 11.07.2025 05:50, Penny Zheng wrote:
>>>>> --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
>>>>> +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
>>>>> @@ -128,12 +128,14 @@ static int __init cf_check
>>>>> cpufreq_driver_init(void)
>>>>>
>>>>> if ( cpufreq_controller == FREQCTL_xen )
>>>>> {
>>>>> + unsigned int i = 0;
>>>>
>>>> Pointless initializer; both for() loops set i to 0. But also see further down.
>>>>
>>>>> @@ -157,9 +164,70 @@ static int __init cf_check
>>>>> cpufreq_driver_init(void)
>>>>>
>>>>> case X86_VENDOR_AMD:
>>>>> case X86_VENDOR_HYGON:
>>>>> - ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -
>>>> ENODEV;
>>>>> + if ( !IS_ENABLED(CONFIG_AMD) )
>>>>> + {
>>>>> + ret = -ENODEV;
>>>>> + break;
>>>>> + }
>>>>> + ret = -ENOENT;
>>>>
>>>> The code structure is sufficiently different from the Intel
>>>> counterpart for this to perhaps better move ...
>>>>
>>>>> + for ( i = 0; i < cpufreq_xen_cnt; i++ )
>>>>> + {
>>>>> + switch ( cpufreq_xen_opts[i] )
>>>>> + {
>>>>> + case CPUFREQ_xen:
>>>>> + ret = powernow_register_driver();
>>>>> + break;
>>>>> +
>>>>> + case CPUFREQ_amd_cppc:
>>>>> + ret = amd_cppc_register_driver();
>>>>> + break;
>>>>> +
>>>>> + case CPUFREQ_none:
>>>>> + ret = 0;
>>>>> + break;
>>>>> +
>>>>> + default:
>>>>> + printk(XENLOG_WARNING
>>>>> + "Unsupported cpufreq driver for vendor AMD or Hygon\n");
>>>>> + break;
>>>>
>>>> ... here.
>>>>
>>>
>>> Are we suggesting moving
>>> "
>>> if ( !IS_ENABLED(CONFIG_AMD) )
>>> {
>>> ret = -ENODEV;
>>> break;
>>> }
>>> " here? In which case, When CONFIG_AMD=n and users doesn't provide
>>> "cpufreq=xxx", we will have cpufreq_xen_cnt initialized as 1 and
>>> cpufreq_xen_opts[0] = CPUFREQ_xen. powernow_register_driver() hence
>>> gets invoked. The thing is that we don't have stub for it and it is
>>> compiled under CONFIG_AMD I suggest to change to use #ifdef CONFIG_AMD
>>> code wrapping
>>>
>>>>> + }
>>>>> +
>>>>> + if ( !ret || ret == -EBUSY )
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> break;
>>>>> }
>>>>> +
>>>>> + /*
>>>>> + * After successful cpufreq driver registeration,
>>>> XEN_PROCESSOR_PM_CPPC
>>>>> + * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
>>>>> + */
>>>>> + if ( !ret )
>>>>> + {
>>>>> + ASSERT(i < cpufreq_xen_cnt);
>>>>> + switch ( cpufreq_xen_opts[i] )
>>>>
>>>> Hmm, this is using the the initializer of i that I commented on. I
>>>> think there's another default: case missing, where you simply "return 0" (to
>> retain prior behavior).
>>>> But again see also yet further down.
>>>>
>>>>
>>>>> + /*
>>>>> + * No cpufreq driver gets registered, clear both
>>>>> + * XEN_PROCESSOR_PM_CPPC and XEN_PROCESSOR_PM_PX
>>>>> + */
>>>>> + xen_processor_pmbits &= ~(XEN_PROCESSOR_PM_CPPC |
>>>>> + XEN_PROCESSOR_PM_PX);
>>>>
>>>> Yet more hmm - this path you want to get through for the case mentioned above.
>>>> But only this code; specifically not the "switch (
>>>> cpufreq_xen_opts[i] )", which really is "switch ( cpufreq_xen_opts[0]
>>>> )" in that case, and that's pretty clearly wrong to evaluate in then.
>>>
>>> Correct me if I understand you wrongly:
>>> The above "case missing" , are we talking about is entering "case
>> CPUFREQ_none" ?
>>> IMO, it may never be entered. If users doesn't provide "cpufreq=xxx", we will
>> have cpufreq_xen_cnt initialized as 1 and cpufreq_xen_opts[0] = CPUFREQ_xen.
>> That is, we will have px states as default driver. Even if we have failed px-driver
>> initialization, with cpufreq_xen_cnt limited to 1, we will not enter CPUFREQ_none.
>>> CPUFREQ_none only could be set when users explicitly set
>>> "cpufreq=disabled/none/0", but in which case, cpufreq_controller will
>>> be set with FREQCTL_none. And the whole cpufreq_driver_init() is under
>>> " cpufreq_controller == FREQCTL_xen " condition Or "case missing" is
>>> referring entering default case? In which case, we will have -ENOENT
>>> errno. As we have ret=-ENOENT in the very beginning
>>
>> Sorry, this is hard to follow. Plus I think I made the main requirement quite
>> clear: You want to "retain prior behavior" for all cases you don't deliberately change
>> to accommodate the new driver. Plus you want to watch out for pre- existing
>> incorrect behavior: Rather than proliferating any, such would want adjusting.
>>
>
> I was trying to follow "there's another default: case missing, where you simply "return 0" (to retain prior behavior ) ",
> The missing "default :" is referring the one for "switch ( boot_cpu_data.x86_vendor )"? (I thought it referred " switch ( cpufreq_xen_opts[i] ) " ....)
> It is a pre- existing incorrect behavior which I shall create a new commit to fix it firstly
> I'll add an -ENOENTRY initializer for ret at the very beginning , and complement the missing default: entry with "Unsupported vendor..." error log
Yes, I was referring to pre-existing code which I think wants adjusting in
order to then accommodate your changes there.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver
2025-08-04 8:09 ` Penny, Zheng
2025-08-04 8:48 ` Jan Beulich
@ 2025-08-04 8:48 ` Jan Beulich
1 sibling, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-08-04 8:48 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Andrew Cooper, Anthony PERARD, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 04.08.2025 10:09, Penny, Zheng wrote:
> [Public]
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, July 17, 2025 12:00 AM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
>> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline
>> and amd-cppc driver
>>
>> On 11.07.2025 05:50, Penny Zheng wrote:
>>> --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
>>> +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
>>> @@ -128,12 +128,14 @@ static int __init cf_check
>>> cpufreq_driver_init(void)
>>>
>>> if ( cpufreq_controller == FREQCTL_xen )
>>> {
>>> + unsigned int i = 0;
>>
>> Pointless initializer; both for() loops set i to 0. But also see further down.
>>
>>> @@ -157,9 +164,70 @@ static int __init cf_check
>>> cpufreq_driver_init(void)
>>>
>>> case X86_VENDOR_AMD:
>>> case X86_VENDOR_HYGON:
>>> - ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -
>> ENODEV;
>>> + if ( !IS_ENABLED(CONFIG_AMD) )
>>> + {
>>> + ret = -ENODEV;
>>> + break;
>>> + }
>>> + ret = -ENOENT;
>>
>> The code structure is sufficiently different from the Intel counterpart for this to
>> perhaps better move ...
>>
>>> + for ( i = 0; i < cpufreq_xen_cnt; i++ )
>>> + {
>>> + switch ( cpufreq_xen_opts[i] )
>>> + {
>>> + case CPUFREQ_xen:
>>> + ret = powernow_register_driver();
>>> + break;
>>> +
>>> + case CPUFREQ_amd_cppc:
>>> + ret = amd_cppc_register_driver();
>>> + break;
>>> +
>>> + case CPUFREQ_none:
>>> + ret = 0;
>>> + break;
>>> +
>>> + default:
>>> + printk(XENLOG_WARNING
>>> + "Unsupported cpufreq driver for vendor AMD or Hygon\n");
>>> + break;
>>
>> ... here.
>>
>
> Are we suggesting moving
> "
> if ( !IS_ENABLED(CONFIG_AMD) )
> {
> ret = -ENODEV;
> break;
> }
> " here?
That's what I said, didn't I?
> In which case, When CONFIG_AMD=n and users doesn't provide "cpufreq=xxx", we will have cpufreq_xen_cnt initialized as 1 and cpufreq_xen_opts[0] = CPUFREQ_xen. powernow_register_driver() hence gets invoked. The thing is that we don't have stub for it and it is compiled under CONFIG_AMD
> I suggest to change to use #ifdef CONFIG_AMD code wrapping
Perhaps necessary, yes. As you know, we generally prefer IS_ENABLED() where possible,
but when not possible, #ifdef is certainly okay to use.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 12/19] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (10 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 11/19] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline and amd-cppc driver Penny Zheng
@ 2025-07-11 3:50 ` Penny Zheng
2025-07-17 12:55 ` Jan Beulich
2025-07-11 3:51 ` [PATCH v6 13/19] xen/x86: implement amd-cppc-epp driver for CPPC in active mode Penny Zheng
` (6 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:50 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini
amd-cppc is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism. The new mechanism is based on
Collaborative Processor Performance Control (CPPC) which is a finer grain
frequency management than legacy ACPI hardware P-States.
Current AMD CPU platforms are using the ACPI P-states driver to
manage CPU frequency and clocks with switching only in 3 P-states, while the
new amd-cppc allows a more flexible, low-latency interface for Xen
to directly communicate the performance hints to hardware.
"amd-cppc" driver is responsible for implementing CPPC in passive mode, which
still leverages Xen governors such as *ondemand*, *performance*, etc, to
calculate the performance hints. In the future, we will introduce an advanced
active mode to enable autonomous performence level selection.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- re-construct union caps and req to have anonymous struct instead
- avoid "else" when the earlier if() ends in an unconditional control flow statement
- Add check to avoid chopping off set bits from cast
- make pointers pointer-to-const wherever possible
- remove noisy log
- exclude families before 0x17 before CPPC-feature MSR op
- remove useless variable helpers
- use xvzalloc and XVFREE
- refactor error handling as ENABLE bit can only be cleared by reset
---
v2 -> v3:
- Move all MSR-definations to msr-index.h and follow the required style
- Refactor opening figure braces for struct/union
- Sort overlong lines throughout the series
- Make offset/res int covering underflow scenario
- Error out when amd_max_freq_mhz isn't set
- Introduce amd_get_freq(name) macro to decrease redundancy
- Supported CPU family checked ahead of smp-function
- Nominal freq shall be checked between the [min, max]
- Use APERF/MPREF to calculate current frequency
- Use amd_cppc_cpufreq_cpu_exit() to tidy error path
---
v3 -> v4:
- verbose print shall come with a CPU number
- deal with res <= 0 in amd_cppc_khz_to_perf()
- introduce a single helper amd_get_lowest_or_nominal_freq() to cover both
lowest and nominal scenario
- reduce abuse of wrmsr_safe()/rdmsr_safe() with wrmsrl()/rdmsrl()
- move cf_check from amd_cppc_write_request() to amd_cppc_write_request_msrs()
- add comment to explain why setting non_linear_lowest in passive mode
- add check to ensure perf values in
lowest <= non_linear_lowest <= nominal <= highset
- refactor comment for "data->err != 0" scenario
- use "data->err" instead of -ENODEV
- add U suffixes for all msr macro
---
v4 -> v5:
- all freq-values shall be unsigned int type
- remove shortcuts as it is rarely taken
- checking cpc.nominal_mhz and cpc.lowest_mhz are non-zero values is enough
- drop the explicit type cast
- null pointer check is in no need for internal functions
- change amd_get_lowest_or_nominal_freq() to amd_get_cpc_freq()
- clarifying function-wide that the calculated frequency result is to be in kHz
- use array notation
- with cpu_has_cppc check, no need to do cpu family check
---
v5 -> v6
- replace "AMD_CPPC" with "AMD-CPPC" in message
- add equation(mul,div) non-zero check
- replace -EINVAL with -EOPNOTSUPP
- refactor comment
---
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 407 ++++++++++++++++++++++++++-
xen/arch/x86/cpu/amd.c | 8 +-
xen/arch/x86/include/asm/amd.h | 2 +
xen/arch/x86/include/asm/msr-index.h | 5 +
xen/include/public/sysctl.h | 1 +
5 files changed, 418 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index 3377783f7e..57fd98d2d9 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -14,7 +14,95 @@
#include <xen/domain.h>
#include <xen/init.h>
#include <xen/param.h>
+#include <xen/percpu.h>
+#include <xen/xvmalloc.h>
#include <acpi/cpufreq/cpufreq.h>
+#include <asm/amd.h>
+#include <asm/msr-index.h>
+
+#define amd_cppc_err(cpu, fmt, args...) \
+ printk(XENLOG_ERR "AMD-CPPC: CPU%u error: " fmt, cpu, ## args)
+#define amd_cppc_warn(cpu, fmt, args...) \
+ printk(XENLOG_WARNING "AMD-CPPC: CPU%u warning: " fmt, cpu, ## args)
+#define amd_cppc_verbose(cpu, fmt, args...) \
+({ \
+ if ( cpufreq_verbose ) \
+ printk(XENLOG_DEBUG "AMD-CPPC: CPU%u " fmt, cpu, ## args); \
+})
+
+/*
+ * Field highest_perf, nominal_perf, lowest_nonlinear_perf, and lowest_perf
+ * contain the values read from CPPC capability MSR. They represent the limits
+ * of managed performance range as well as the dynamic capability, which may
+ * change during processor operation
+ * Field highest_perf represents highest performance, which is the absolute
+ * maximum performance an individual processor may reach, assuming ideal
+ * conditions. This performance level may not be sustainable for long
+ * durations and may only be achievable if other platform components
+ * are in a specific state; for example, it may require other processors be
+ * in an idle state. This would be equivalent to the highest frequencies
+ * supported by the processor.
+ * Field nominal_perf represents maximum sustained performance level of the
+ * processor, assuming ideal operating conditions. All cores/processors are
+ * expected to be able to sustain their nominal performance state\
+ * simultaneously.
+ * Field lowest_nonlinear_perf represents Lowest Nonlinear Performance, which
+ * is the lowest performance level at which nonlinear power savings are
+ * achieved. Above this threshold, lower performance levels should be
+ * generally more energy efficient than higher performance levels. So in
+ * traditional terms, this represents the P-state range of performance levels.
+ * Field lowest_perf represents the absolute lowest performance level of the
+ * platform. Selecting it may cause an efficiency penalty but should reduce
+ * the instantaneous power consumption of the processor. So in traditional
+ * terms, this represents the T-state range of performance levels.
+ *
+ * Field max_perf, min_perf, des_perf store the values for CPPC request MSR.
+ * Software passes performance goals through these fields.
+ * Field max_perf conveys the maximum performance level at which the platform
+ * may run. And it may be set to any performance value in the range
+ * [lowest_perf, highest_perf], inclusive.
+ * Field min_perf conveys the minimum performance level at which the platform
+ * may run. And it may be set to any performance value in the range
+ * [lowest_perf, highest_perf], inclusive but must be less than or equal to
+ * max_perf.
+ * Field des_perf conveys performance level Xen governor is requesting. And it
+ * may be set to any performance value in the range [min_perf, max_perf],
+ * inclusive.
+ */
+struct amd_cppc_drv_data
+{
+ const struct xen_processor_cppc *cppc_data;
+ union {
+ uint64_t raw;
+ struct {
+ unsigned int lowest_perf:8;
+ unsigned int lowest_nonlinear_perf:8;
+ unsigned int nominal_perf:8;
+ unsigned int highest_perf:8;
+ unsigned int :32;
+ };
+ } caps;
+ union {
+ uint64_t raw;
+ struct {
+ unsigned int max_perf:8;
+ unsigned int min_perf:8;
+ unsigned int des_perf:8;
+ unsigned int epp:8;
+ unsigned int :32;
+ };
+ } req;
+
+ int err;
+};
+
+static DEFINE_PER_CPU_READ_MOSTLY(struct amd_cppc_drv_data *,
+ amd_cppc_drv_data);
+/*
+ * Core max frequency read from PstateDef as anchor point
+ * for freq-to-perf transition
+ */
+static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, pxfreq_mhz);
static bool __init amd_cppc_handle_option(const char *s, const char *end)
{
@@ -50,10 +138,327 @@ int __init amd_cppc_cmdline_parse(const char *s, const char *e)
return 0;
}
+/*
+ * If CPPC lowest_freq and nominal_freq registers are exposed then we can
+ * use them to convert perf to freq and vice versa. The conversion is
+ * extrapolated as an linear function passing by the 2 points:
+ * - (Low perf, Low freq)
+ * - (Nominal perf, Nominal freq)
+ * Parameter freq is always in kHz.
+ */
+static int amd_cppc_khz_to_perf(const struct amd_cppc_drv_data *data,
+ unsigned int freq, uint8_t *perf)
+{
+ const struct xen_processor_cppc *cppc_data = data->cppc_data;
+ unsigned int mul, div;
+ int offset = 0, res;
+
+ if ( cppc_data->cpc.lowest_mhz && cppc_data->cpc.nominal_mhz &&
+ data->caps.nominal_perf != data->caps.lowest_perf &&
+ cppc_data->cpc.nominal_mhz != cppc_data->cpc.lowest_mhz )
+ {
+ mul = data->caps.nominal_perf - data->caps.lowest_perf;
+ div = cppc_data->cpc.nominal_mhz - cppc_data->cpc.lowest_mhz;
+
+ /*
+ * We don't need to convert to kHz for computing offset and can
+ * directly use nominal_mhz and lowest_mhz as the division
+ * will remove the frequency unit.
+ */
+ offset = data->caps.nominal_perf -
+ (mul * cppc_data->cpc.nominal_mhz) / div;
+ }
+ else
+ {
+ /* Read Processor Max Speed(MHz) as anchor point */
+ mul = data->caps.highest_perf;
+ div = this_cpu(pxfreq_mhz);
+ if ( !div )
+ return -EOPNOTSUPP;
+ }
+
+ res = offset + (mul * freq) / (div * 1000);
+ if ( res > UINT8_MAX )
+ {
+ printk_once(XENLOG_WARNING
+ "Perf value exceeds maximum value 255: %d\n", res);
+ *perf = 0xff;
+ return 0;
+ }
+ if ( res < 0 )
+ {
+ printk_once(XENLOG_WARNING
+ "Perf value smaller than minimum value 0: %d\n", res);
+ *perf = 0;
+ return 0;
+ }
+ *perf = res;
+
+ return 0;
+}
+
+/*
+ * _CPC may define nominal frequecy and lowest frequency, if not, use
+ * Processor Max Speed as anchor point to calculate.
+ * Output freq stores cpc frequency in kHz
+ */
+static int amd_get_cpc_freq(const struct amd_cppc_drv_data *data,
+ uint32_t cpc_mhz, uint8_t perf, unsigned int *freq)
+{
+ unsigned int mul, div, res;
+
+ if ( cpc_mhz )
+ {
+ /* Switch to kHz */
+ *freq = cpc_mhz * 1000;
+ return 0;
+ }
+
+ /* Read Processor Max Speed(MHz) as anchor point */
+ mul = this_cpu(pxfreq_mhz);
+ if ( !mul )
+ return -EOPNOTSUPP;
+ div = data->caps.highest_perf;
+ res = (mul * perf * 1000) / div;
+ if ( unlikely(!res) )
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+/* Output max_freq stores calculated maximum frequency in kHz */
+static int amd_get_max_freq(const struct amd_cppc_drv_data *data,
+ unsigned int *max_freq)
+{
+ unsigned int nom_freq = 0;
+ int res;
+
+ res = amd_get_cpc_freq(data, data->cppc_data->cpc.nominal_mhz,
+ data->caps.nominal_perf, &nom_freq);
+ if ( res )
+ return res;
+
+ *max_freq = (data->caps.highest_perf * nom_freq) / data->caps.nominal_perf;
+
+ return 0;
+}
+
+static int cf_check amd_cppc_cpufreq_verify(struct cpufreq_policy *policy)
+{
+ cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
+ policy->cpuinfo.max_freq);
+
+ return 0;
+}
+
+static void cf_check amd_cppc_write_request_msrs(void *info)
+{
+ const struct amd_cppc_drv_data *data = info;
+
+ wrmsrl(MSR_AMD_CPPC_REQ, data->req.raw);
+}
+
+static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
+ uint8_t des_perf, uint8_t max_perf)
+{
+ struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
+ uint64_t prev = data->req.raw;
+
+ data->req.min_perf = min_perf;
+ data->req.max_perf = max_perf;
+ data->req.des_perf = des_perf;
+
+ if ( prev == data->req.raw )
+ return;
+
+ on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs, data, 1);
+}
+
+static int cf_check amd_cppc_cpufreq_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int cpu = policy->cpu;
+ const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
+ uint8_t des_perf;
+ int res;
+
+ if ( unlikely(!target_freq) )
+ return 0;
+
+ res = amd_cppc_khz_to_perf(data, target_freq, &des_perf);
+ if ( res )
+ return res;
+
+ /*
+ * Having a performance level lower than the lowest nonlinear
+ * performance level, such as, lowest_perf <= perf <= lowest_nonliner_perf,
+ * may actually cause an efficiency penalty, So when deciding the min_perf
+ * value, we prefer lowest nonlinear performance over lowest performance.
+ */
+ amd_cppc_write_request(policy->cpu, data->caps.lowest_nonlinear_perf,
+ des_perf, data->caps.highest_perf);
+ return 0;
+}
+
+static void cf_check amd_cppc_init_msrs(void *info)
+{
+ struct cpufreq_policy *policy = info;
+ struct amd_cppc_drv_data *data = this_cpu(amd_cppc_drv_data);
+ uint64_t val;
+ unsigned int min_freq = 0, nominal_freq = 0, max_freq;
+
+ /* Package level MSR */
+ rdmsrl(MSR_AMD_CPPC_ENABLE, val);
+ /*
+ * Only when Enable bit is on, the hardware will calculate the processor’s
+ * performance capabilities and initialize the performance level fields in
+ * the CPPC capability registers.
+ */
+ if ( !(val & AMD_CPPC_ENABLE) )
+ {
+ val |= AMD_CPPC_ENABLE;
+ wrmsrl(MSR_AMD_CPPC_ENABLE, val);
+ }
+
+ rdmsrl(MSR_AMD_CPPC_CAP1, data->caps.raw);
+
+ if ( data->caps.highest_perf == 0 || data->caps.lowest_perf == 0 ||
+ data->caps.nominal_perf == 0 || data->caps.lowest_nonlinear_perf == 0 ||
+ data->caps.lowest_perf > data->caps.lowest_nonlinear_perf ||
+ data->caps.lowest_nonlinear_perf > data->caps.nominal_perf ||
+ data->caps.nominal_perf > data->caps.highest_perf )
+ {
+ amd_cppc_err(policy->cpu,
+ "Out of range values: highest(%u), lowest(%u), nominal(%u), lowest_nonlinear(%u)\n",
+ data->caps.highest_perf, data->caps.lowest_perf,
+ data->caps.nominal_perf, data->caps.lowest_nonlinear_perf);
+ goto err;
+ }
+
+ amd_process_freq(&cpu_data[policy->cpu],
+ NULL, NULL, &this_cpu(pxfreq_mhz));
+
+ data->err = amd_get_cpc_freq(data, data->cppc_data->cpc.lowest_mhz,
+ data->caps.lowest_perf, &min_freq);
+ if ( data->err )
+ return;
+
+ data->err = amd_get_cpc_freq(data, data->cppc_data->cpc.nominal_mhz,
+ data->caps.nominal_perf, &nominal_freq);
+ if ( data->err )
+ return;
+
+ data->err = amd_get_max_freq(data, &max_freq);
+ if ( data->err )
+ return;
+
+ if ( min_freq > nominal_freq || nominal_freq > max_freq )
+ {
+ amd_cppc_err(policy->cpu,
+ "min(%u), or max(%u), or nominal(%u) freq value is incorrect\n",
+ min_freq, max_freq, nominal_freq);
+ goto err;
+ }
+
+ policy->min = min_freq;
+ policy->max = max_freq;
+
+ policy->cpuinfo.min_freq = min_freq;
+ policy->cpuinfo.max_freq = max_freq;
+ policy->cpuinfo.perf_freq = nominal_freq;
+ /*
+ * Set after policy->cpuinfo.perf_freq, as we are taking
+ * APERF/MPERF average frequency as current frequency.
+ */
+ policy->cur = cpufreq_driver_getavg(policy->cpu, GOV_GETAVG);
+
+ return;
+
+ err:
+ /*
+ * No fallback shceme is available here, see more explanation at call
+ * site in amd_cppc_cpufreq_cpu_init().
+ */
+ data->err = -EINVAL;
+}
+
+/*
+ * AMD CPPC driver is different than legacy ACPI hardware P-State,
+ * which has a finer grain frequency range between the highest and lowest
+ * frequency. And boost frequency is actually the frequency which is mapped on
+ * highest performance ratio. The legacy P0 frequency is actually mapped on
+ * nominal performance ratio.
+ */
+static void amd_cppc_boost_init(struct cpufreq_policy *policy,
+ const struct amd_cppc_drv_data *data)
+{
+ if ( data->caps.highest_perf <= data->caps.nominal_perf )
+ return;
+
+ policy->turbo = CPUFREQ_TURBO_ENABLED;
+}
+
+static int cf_check amd_cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+ XVFREE(per_cpu(amd_cppc_drv_data, policy->cpu));
+
+ return 0;
+}
+
+static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+ unsigned int cpu = policy->cpu;
+ struct amd_cppc_drv_data *data;
+
+ data = xvzalloc(struct amd_cppc_drv_data);
+ if ( !data )
+ return -ENOMEM;
+
+ data->cppc_data = &processor_pminfo[cpu]->cppc_data;
+
+ per_cpu(amd_cppc_drv_data, cpu) = data;
+
+ on_selected_cpus(cpumask_of(cpu), amd_cppc_init_msrs, policy, 1);
+
+ /*
+ * The enable bit is sticky, as we need to enable it at the very first
+ * begining, before CPPC capability values sanity check.
+ * If error path is taken effective, not only amd-cppc cpufreq core fails
+ * to initialize, but also we could not fall back to legacy P-states
+ * driver, irrespective of the command line specifying a fallback option.
+ */
+ if ( data->err )
+ {
+ amd_cppc_err(cpu, "Could not initialize cpufreq core in CPPC mode\n");
+ amd_cppc_cpufreq_cpu_exit(policy);
+ return data->err;
+ }
+
+ policy->governor = cpufreq_opt_governor ? : CPUFREQ_DEFAULT_GOVERNOR;
+
+ amd_cppc_boost_init(policy, data);
+
+ amd_cppc_verbose(policy->cpu,
+ "CPU initialized with amd-cppc passive mode\n");
+
+ return 0;
+}
+
+static const struct cpufreq_driver __initconst_cf_clobber
+amd_cppc_cpufreq_driver =
+{
+ .name = XEN_AMD_CPPC_DRIVER_NAME,
+ .verify = amd_cppc_cpufreq_verify,
+ .target = amd_cppc_cpufreq_target,
+ .init = amd_cppc_cpufreq_cpu_init,
+ .exit = amd_cppc_cpufreq_cpu_exit,
+};
+
int __init amd_cppc_register_driver(void)
{
if ( !cpu_has_cppc )
return -ENODEV;
- return -EOPNOTSUPP;
+ return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
}
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index eb428f284e..1b9af1270c 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -613,10 +613,10 @@ static unsigned int attr_const amd_parse_freq(unsigned int family,
return freq;
}
-static void amd_process_freq(const struct cpuinfo_x86 *c,
- unsigned int *low_mhz,
- unsigned int *nom_mhz,
- unsigned int *hi_mhz)
+void amd_process_freq(const struct cpuinfo_x86 *c,
+ unsigned int *low_mhz,
+ unsigned int *nom_mhz,
+ unsigned int *hi_mhz)
{
unsigned int idx = 0, h;
uint64_t hi, lo, val;
diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
index 9c9599a622..72df42a6f6 100644
--- a/xen/arch/x86/include/asm/amd.h
+++ b/xen/arch/x86/include/asm/amd.h
@@ -173,5 +173,7 @@ extern bool amd_virt_spec_ctrl;
bool amd_setup_legacy_ssbd(void);
void amd_set_legacy_ssbd(bool enable);
void amd_set_cpuid_user_dis(bool enable);
+void amd_process_freq(const struct cpuinfo_x86 *c, unsigned int *low_mhz,
+ unsigned int *nom_mhz, unsigned int *hi_mhz);
#endif /* __AMD_H__ */
diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
index 6f2c3147e3..815f1b9744 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -241,6 +241,11 @@
#define MSR_AMD_CSTATE_CFG 0xc0010296U
+#define MSR_AMD_CPPC_CAP1 0xc00102b0U
+#define MSR_AMD_CPPC_ENABLE 0xc00102b1U
+#define AMD_CPPC_ENABLE (_AC(1, ULL) << 0)
+#define MSR_AMD_CPPC_REQ 0xc00102b3U
+
/*
* Legacy MSR constants in need of cleanup. No new MSRs below this comment.
*/
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index aafa7fcf2b..aa29a5401c 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -453,6 +453,7 @@ struct xen_set_cppc_para {
uint32_t activity_window;
};
+#define XEN_AMD_CPPC_DRIVER_NAME "amd-cppc"
#define XEN_HWP_DRIVER_NAME "hwp"
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 12/19] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode
2025-07-11 3:50 ` [PATCH v6 12/19] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode Penny Zheng
@ 2025-07-17 12:55 ` Jan Beulich
2025-08-11 8:23 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-17 12:55 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel
On 11.07.2025 05:50, Penny Zheng wrote:
> --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> @@ -14,7 +14,95 @@
> #include <xen/domain.h>
> #include <xen/init.h>
> #include <xen/param.h>
> +#include <xen/percpu.h>
> +#include <xen/xvmalloc.h>
> #include <acpi/cpufreq/cpufreq.h>
> +#include <asm/amd.h>
> +#include <asm/msr-index.h>
> +
> +#define amd_cppc_err(cpu, fmt, args...) \
> + printk(XENLOG_ERR "AMD-CPPC: CPU%u error: " fmt, cpu, ## args)
> +#define amd_cppc_warn(cpu, fmt, args...) \
> + printk(XENLOG_WARNING "AMD-CPPC: CPU%u warning: " fmt, cpu, ## args)
> +#define amd_cppc_verbose(cpu, fmt, args...) \
> +({ \
> + if ( cpufreq_verbose ) \
> + printk(XENLOG_DEBUG "AMD-CPPC: CPU%u " fmt, cpu, ## args); \
> +})
> +
> +/*
> + * Field highest_perf, nominal_perf, lowest_nonlinear_perf, and lowest_perf
> + * contain the values read from CPPC capability MSR. They represent the limits
> + * of managed performance range as well as the dynamic capability, which may
> + * change during processor operation
> + * Field highest_perf represents highest performance, which is the absolute
> + * maximum performance an individual processor may reach, assuming ideal
> + * conditions. This performance level may not be sustainable for long
> + * durations and may only be achievable if other platform components
> + * are in a specific state; for example, it may require other processors be
> + * in an idle state. This would be equivalent to the highest frequencies
> + * supported by the processor.
> + * Field nominal_perf represents maximum sustained performance level of the
> + * processor, assuming ideal operating conditions. All cores/processors are
> + * expected to be able to sustain their nominal performance state\
Nit: Stray trailing backslash.
> + * simultaneously.
> + * Field lowest_nonlinear_perf represents Lowest Nonlinear Performance, which
> + * is the lowest performance level at which nonlinear power savings are
> + * achieved. Above this threshold, lower performance levels should be
> + * generally more energy efficient than higher performance levels. So in
> + * traditional terms, this represents the P-state range of performance levels.
> + * Field lowest_perf represents the absolute lowest performance level of the
> + * platform. Selecting it may cause an efficiency penalty but should reduce
> + * the instantaneous power consumption of the processor. So in traditional
> + * terms, this represents the T-state range of performance levels.
> + *
> + * Field max_perf, min_perf, des_perf store the values for CPPC request MSR.
> + * Software passes performance goals through these fields.
> + * Field max_perf conveys the maximum performance level at which the platform
> + * may run. And it may be set to any performance value in the range
> + * [lowest_perf, highest_perf], inclusive.
> + * Field min_perf conveys the minimum performance level at which the platform
> + * may run. And it may be set to any performance value in the range
> + * [lowest_perf, highest_perf], inclusive but must be less than or equal to
> + * max_perf.
> + * Field des_perf conveys performance level Xen governor is requesting. And it
> + * may be set to any performance value in the range [min_perf, max_perf],
> + * inclusive.
> + */
> +struct amd_cppc_drv_data
> +{
> + const struct xen_processor_cppc *cppc_data;
> + union {
> + uint64_t raw;
> + struct {
> + unsigned int lowest_perf:8;
> + unsigned int lowest_nonlinear_perf:8;
> + unsigned int nominal_perf:8;
> + unsigned int highest_perf:8;
> + unsigned int :32;
> + };
> + } caps;
> + union {
> + uint64_t raw;
> + struct {
> + unsigned int max_perf:8;
> + unsigned int min_perf:8;
> + unsigned int des_perf:8;
> + unsigned int epp:8;
> + unsigned int :32;
> + };
> + } req;
> +
> + int err;
> +};
> +
> +static DEFINE_PER_CPU_READ_MOSTLY(struct amd_cppc_drv_data *,
> + amd_cppc_drv_data);
> +/*
> + * Core max frequency read from PstateDef as anchor point
> + * for freq-to-perf transition
> + */
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, pxfreq_mhz);
>
> static bool __init amd_cppc_handle_option(const char *s, const char *end)
> {
> @@ -50,10 +138,327 @@ int __init amd_cppc_cmdline_parse(const char *s, const char *e)
> return 0;
> }
>
> +/*
> + * If CPPC lowest_freq and nominal_freq registers are exposed then we can
> + * use them to convert perf to freq and vice versa. The conversion is
> + * extrapolated as an linear function passing by the 2 points:
> + * - (Low perf, Low freq)
> + * - (Nominal perf, Nominal freq)
> + * Parameter freq is always in kHz.
> + */
> +static int amd_cppc_khz_to_perf(const struct amd_cppc_drv_data *data,
> + unsigned int freq, uint8_t *perf)
> +{
> + const struct xen_processor_cppc *cppc_data = data->cppc_data;
> + unsigned int mul, div;
> + int offset = 0, res;
> +
> + if ( cppc_data->cpc.lowest_mhz && cppc_data->cpc.nominal_mhz &&
> + data->caps.nominal_perf != data->caps.lowest_perf &&
> + cppc_data->cpc.nominal_mhz != cppc_data->cpc.lowest_mhz )
While I understand that required relations are being checked elsewhere, if
you used > in place of != here, that would not only serve a doc aspect, but
also allow to drop one part:
if ( cppc_data->cpc.lowest_mhz &&
data->caps.nominal_perf > data->caps.lowest_perf &&
cppc_data->cpc.nominal_mhz > cppc_data->cpc.lowest_mhz )
> + {
> + mul = data->caps.nominal_perf - data->caps.lowest_perf;
> + div = cppc_data->cpc.nominal_mhz - cppc_data->cpc.lowest_mhz;
> +
> + /*
> + * We don't need to convert to kHz for computing offset and can
> + * directly use nominal_mhz and lowest_mhz as the division
> + * will remove the frequency unit.
> + */
> + offset = data->caps.nominal_perf -
> + (mul * cppc_data->cpc.nominal_mhz) / div;
> + }
> + else
> + {
> + /* Read Processor Max Speed(MHz) as anchor point */
> + mul = data->caps.highest_perf;
> + div = this_cpu(pxfreq_mhz);
> + if ( !div )
> + return -EOPNOTSUPP;
> + }
> +
> + res = offset + (mul * freq) / (div * 1000);
> + if ( res > UINT8_MAX )
Why UINT8_MAX here but ...
> + {
> + printk_once(XENLOG_WARNING
> + "Perf value exceeds maximum value 255: %d\n", res);
> + *perf = 0xff;
... 0xff here?
> + return 0;
> + }
> + if ( res < 0 )
> + {
> + printk_once(XENLOG_WARNING
> + "Perf value smaller than minimum value 0: %d\n", res);
> + *perf = 0;
> + return 0;
> + }
> + *perf = res;
Considering that amd_cppc_init_msrs() rejects perf values of 0 as invalid,
is 0 actually valid as an output here?
> +/*
> + * _CPC may define nominal frequecy and lowest frequency, if not, use
> + * Processor Max Speed as anchor point to calculate.
> + * Output freq stores cpc frequency in kHz
> + */
> +static int amd_get_cpc_freq(const struct amd_cppc_drv_data *data,
> + uint32_t cpc_mhz, uint8_t perf, unsigned int *freq)
Once again no need for uint32_t when unsigned int will do.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 12/19] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode
2025-07-17 12:55 ` Jan Beulich
@ 2025-08-11 8:23 ` Penny, Zheng
0 siblings, 0 replies; 66+ messages in thread
From: Penny, Zheng @ 2025-08-11 8:23 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Orzel, Michal, Julien Grall, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, July 17, 2025 8:55 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>;
> Anthony PERARD <anthony.perard@vates.tech>; Orzel, Michal
> <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v6 12/19] xen/cpufreq: implement amd-cppc driver for CPPC
> in passive mode
>
> On 11.07.2025 05:50, Penny Zheng wrote:
> > --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > + if ( res < 0 )
> > + {
> > + printk_once(XENLOG_WARNING
> > + "Perf value smaller than minimum value 0: %d\n", res);
> > + *perf = 0;
> > + return 0;
> > + }
> > + *perf = res;
>
> Considering that amd_cppc_init_msrs() rejects perf values of 0 as invalid, is 0
> actually valid as an output here?
>
Yes... we are rejecting 0 in there. Maybe I shall return -ERANGE here instead
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 13/19] xen/x86: implement amd-cppc-epp driver for CPPC in active mode
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (11 preceding siblings ...)
2025-07-11 3:50 ` [PATCH v6 12/19] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode Penny Zheng
@ 2025-07-11 3:51 ` Penny Zheng
2025-07-17 13:35 ` Jan Beulich
2025-07-11 3:51 ` [PATCH v6 14/19] xen/cpufreq: get performance policy from governor set via xenpm Penny Zheng
` (5 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:51 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Andrew Cooper, Anthony PERARD,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
amd-cppc has 2 operation modes: autonomous (active) mode and
non-autonomous (passive) mode.
In active mode, we don't need Xen governor to calculate and tune the cpu
frequency, while hardware built-in CPPC power algorithm will calculate the
runtime workload and adjust cores frequency automatically according to the
power supply, thermal, core voltage and some other hardware conditions.
In active mode, CPPC ignores requests done in the desired performance field,
and takes into account only the values set to the minimum performance, maximum
performance, and energy performance preference registers.
A new field EPP (energy performance preference), in CPPC request register, is
introduced. It will be used in the CCLK DPM controller to drive the frequency
that a core is going to operate during short periods of activity, called
minimum active frequency, It could contatin a range of values from 0 to 0xff.
An EPP of zero sets the min active frequency to maximum frequency, while
an EPP of 0xff sets the min active frequency to approxiately Idle frequency.
We implement a new AMD CPU frequency driver `amd-cppc-epp` for active mode.
It requires `active` tag in Xen cmdline for users to explicitly select active
mode.
In driver `active-cppc-epp`, ->setpolicy() is hooked, not the ->target(), as
it does not depend on xen governor to do performance tuning.
We also introduce a new field "policy" (CPUFREQ_POLICY_xxx) to represent
performance policy. Right now, it supports three values:
CPUFREQ_POLICY_PERFORMANCE as maximum performance, CPUFREQ_POLICY_POWERSAVE
as the least power consumption, and CPUFREQ_POLICY_ONDEMAND as no preference,
just corresponding to "performance", "powersave" and "ondemand" Xen governor,
which benefit users from re-using "governor" in Xen cmdline to deliver
which performance policy they want to apply.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Remove redundant epp_mode
- Remove pointless initializer
- Define sole caller read_epp_init_once and epp_init value to read
pre-defined BIOS epp value only once
- Combine the commit "xen/cpufreq: introduce policy type when
cpufreq_driver->setpolicy exists"
---
v2 -> v3:
- Combined with commit "x86/cpufreq: add "cpufreq=amd-cppc,active" para"
- Refactor doc about "active mode"
- Change opt_cpufreq_active to opt_active_mode
- Let caller pass epp_init when unspecified to allow the function parameter
to be of uint8_t
- Make epp_init per-cpu value
---
v3 -> v4:
- doc refinement
- use MASK_EXTR() to get epp value
- fix indentation
- replace if-else() with switch()
- combine successive comments and do refinement
- no need to introduce amd_cppc_epp_update_limit() as a wrapper
- rename cpufreq_parse_policy() with cpufreq_policy_from_governor()
- no need to use case-insensitive comparison
---
v4 -> v5:
- refine doc to state what the default is for "active" sub-option and it's of
boolean nature
- excess blank after << for AMD_CPPC_EPP_MASK
- set max_perf with lowest_perf to get utmost powersave
- refine commit message to include description about relation between "policy"
and "governor"
---
v5 -> v6:
- expand comment for "epp" field
- let min_perf set with lowest_nonliner_perf, not lowest_perf, to constrain
performance tuning in P-states range
- refactor doc and comments
- blank lines between non-fall-through case blocks
- introduce and add entry for "CPUFREQ_POLICY_ONDEMAND"
---
---
docs/misc/xen-command-line.pandoc | 9 +-
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 143 ++++++++++++++++++++++++++-
xen/arch/x86/include/asm/msr-index.h | 1 +
xen/drivers/cpufreq/utility.c | 14 +++
xen/include/acpi/cpufreq/cpufreq.h | 18 ++++
xen/include/public/sysctl.h | 1 +
6 files changed, 180 insertions(+), 6 deletions(-)
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 03761d9e3c..74404ed1e6 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -515,7 +515,7 @@ If set, force use of the performance counters for oprofile, rather than detectin
available support.
### cpufreq
-> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[verbose]]`
+> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[active][,verbose]]`
> Default: `xen`
@@ -537,6 +537,13 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
* `amd-cppc` selects ACPI Collaborative Performance and Power Control (CPPC)
on supported AMD hardware to provide finer grained frequency control
mechanism. The default is disabled.
+* `active` is a boolean to enable amd-cppc driver in active(autonomous) mode.
+ In this mode, users don't rely on Xen governor to do performance monitoring
+ and tuning. Hardware built-in CPPC power algorithm will calculate the runtime
+ workload and adjust cores frequency automatically according to the power
+ supply, thermal, core voltage and some other hardware conditions.
+ The default is disabled, and the option only applies when `amd-cppc` is
+ enabled.
There is also support for `;`-separated fallback options:
`cpufreq=hwp;xen,verbose`. This first tries `hwp` and falls back to `xen` if
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index 57fd98d2d9..e4bd990982 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -67,7 +67,14 @@
* max_perf.
* Field des_perf conveys performance level Xen governor is requesting. And it
* may be set to any performance value in the range [min_perf, max_perf],
- * inclusive.
+ * inclusive. In active mode, desf_perf must be zero.
+ * Field epp represents energy performance preference, which only has meaning
+ * when active mode is enabled. The EPP is used in the CCLK DPM controller [1]
+ * to drive the frequency that a core is going to operate during short periods
+ * of activity, called minimum active frequency, It could contatin a range of
+ * values from 0 to 0xff. An EPP of zero sets the min active frequency to
+ * maximum frequency, while an EPP of 0xff sets the min active frequency to
+ * approxiately Idle frequency.
*/
struct amd_cppc_drv_data
{
@@ -104,6 +111,9 @@ static DEFINE_PER_CPU_READ_MOSTLY(struct amd_cppc_drv_data *,
*/
static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, pxfreq_mhz);
+static bool __ro_after_init opt_active_mode;
+static DEFINE_PER_CPU_READ_MOSTLY(uint8_t, epp_init);
+
static bool __init amd_cppc_handle_option(const char *s, const char *end)
{
int ret;
@@ -115,6 +125,13 @@ static bool __init amd_cppc_handle_option(const char *s, const char *end)
return true;
}
+ ret = parse_boolean("active", s, end);
+ if ( ret >= 0 )
+ {
+ opt_active_mode = ret;
+ return true;
+ }
+
return false;
}
@@ -259,11 +276,18 @@ static void cf_check amd_cppc_write_request_msrs(void *info)
}
static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
- uint8_t des_perf, uint8_t max_perf)
+ uint8_t des_perf, uint8_t max_perf,
+ uint8_t epp)
{
struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
uint64_t prev = data->req.raw;
+ if ( !opt_active_mode )
+ data->req.des_perf = des_perf;
+ else
+ data->req.des_perf = 0;
+ data->req.epp = epp;
+
data->req.min_perf = min_perf;
data->req.max_perf = max_perf;
data->req.des_perf = des_perf;
@@ -274,6 +298,14 @@ static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs, data, 1);
}
+static void read_epp_init(void)
+{
+ uint64_t val;
+
+ rdmsrl(MSR_AMD_CPPC_REQ, val);
+ this_cpu(epp_init) = MASK_EXTR(val, AMD_CPPC_EPP_MASK);
+}
+
static int cf_check amd_cppc_cpufreq_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation)
@@ -297,7 +329,10 @@ static int cf_check amd_cppc_cpufreq_target(struct cpufreq_policy *policy,
* value, we prefer lowest nonlinear performance over lowest performance.
*/
amd_cppc_write_request(policy->cpu, data->caps.lowest_nonlinear_perf,
- des_perf, data->caps.highest_perf);
+ des_perf, data->caps.highest_perf,
+ /* Pre-defined BIOS value for passive mode */
+ per_cpu(epp_init, policy->cpu));
+
return 0;
}
@@ -373,6 +408,8 @@ static void cf_check amd_cppc_init_msrs(void *info)
*/
policy->cur = cpufreq_driver_getavg(policy->cpu, GOV_GETAVG);
+ read_epp_init();
+
return;
err:
@@ -406,7 +443,7 @@ static int cf_check amd_cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
return 0;
}
-static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+static int amd_cppc_cpufreq_init_perf(struct cpufreq_policy *policy)
{
unsigned int cpu = policy->cpu;
struct amd_cppc_drv_data *data;
@@ -439,12 +476,91 @@ static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
amd_cppc_boost_init(policy, data);
+ return 0;
+}
+
+static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+ int ret;
+
+ ret = amd_cppc_cpufreq_init_perf(policy);
+ if ( ret )
+ return ret;
+
amd_cppc_verbose(policy->cpu,
"CPU initialized with amd-cppc passive mode\n");
return 0;
}
+static int cf_check amd_cppc_epp_cpu_init(struct cpufreq_policy *policy)
+{
+ int ret;
+
+ ret = amd_cppc_cpufreq_init_perf(policy);
+ if ( ret )
+ return ret;
+
+ policy->policy = cpufreq_policy_from_governor(policy->governor);
+
+ amd_cppc_verbose(policy->cpu,
+ "CPU initialized with amd-cppc active mode\n");
+
+ return 0;
+}
+
+static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
+{
+ const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
+ policy->cpu);
+ uint8_t max_perf, min_perf, epp;
+
+ /*
+ * On default, set min_perf with lowest_nonlinear_perf, and max_perf
+ * with the highest, to ensure performance scaling in P-states range.
+ */
+ max_perf = data->caps.highest_perf;
+ min_perf = data->caps.lowest_nonlinear_perf;
+
+ /*
+ * In policy CPUFREQ_POLICY_PERFORMANCE, increase min_perf to
+ * highest_perf to achieve ultmost performance.
+ * In policy CPUFREQ_POLICY_POWERSAVE, decrease max_perf to
+ * lowest_nonlinear_perf to achieve ultmost power saving.
+ */
+ switch ( policy->policy )
+ {
+ case CPUFREQ_POLICY_PERFORMANCE:
+ /* Force the epp value to be zero for performance policy */
+ epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
+ min_perf = data->caps.highest_perf;
+ break;
+
+ case CPUFREQ_POLICY_POWERSAVE:
+ /* Force the epp value to be 0xff for powersave policy */
+ epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
+ max_perf = data->caps.lowest_nonlinear_perf;
+ break;
+
+ case CPUFREQ_POLICY_ONDEMAND:
+ /*
+ * Set epp with medium value to show no preference over performance
+ * or powersave
+ */
+ epp = CPPC_ENERGY_PERF_BALANCE;
+ break;
+
+ default:
+ epp = per_cpu(epp_init, policy->cpu);
+ break;
+ }
+
+ amd_cppc_write_request(policy->cpu, min_perf,
+ 0 /* no des_perf in active mode */,
+ max_perf, epp);
+ return 0;
+}
+
static const struct cpufreq_driver __initconst_cf_clobber
amd_cppc_cpufreq_driver =
{
@@ -455,10 +571,27 @@ amd_cppc_cpufreq_driver =
.exit = amd_cppc_cpufreq_cpu_exit,
};
+static const struct cpufreq_driver __initconst_cf_clobber
+amd_cppc_epp_driver =
+{
+ .name = XEN_AMD_CPPC_EPP_DRIVER_NAME,
+ .verify = amd_cppc_cpufreq_verify,
+ .setpolicy = amd_cppc_epp_set_policy,
+ .init = amd_cppc_epp_cpu_init,
+ .exit = amd_cppc_cpufreq_cpu_exit,
+};
+
int __init amd_cppc_register_driver(void)
{
+ int ret;
+
if ( !cpu_has_cppc )
return -ENODEV;
- return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+ if ( opt_active_mode )
+ ret = cpufreq_register_driver(&amd_cppc_epp_driver);
+ else
+ ret = cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+
+ return ret;
}
diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
index 815f1b9744..6f731685e5 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -245,6 +245,7 @@
#define MSR_AMD_CPPC_ENABLE 0xc00102b1U
#define AMD_CPPC_ENABLE (_AC(1, ULL) << 0)
#define MSR_AMD_CPPC_REQ 0xc00102b3U
+#define AMD_CPPC_EPP_MASK (_AC(0xff, ULL) << 24)
/*
* Legacy MSR constants in need of cleanup. No new MSRs below this comment.
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index 987c3b5929..64bcc464f6 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -281,3 +281,17 @@ int __cpufreq_set_policy(struct cpufreq_policy *data,
return __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
}
+
+unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov)
+{
+ if ( !strncmp(gov->name, "performance", CPUFREQ_NAME_LEN) )
+ return CPUFREQ_POLICY_PERFORMANCE;
+
+ if ( !strncmp(gov->name, "powersave", CPUFREQ_NAME_LEN) )
+ return CPUFREQ_POLICY_POWERSAVE;
+
+ if ( !strncmp(gov->name, "ondemand", CPUFREQ_NAME_LEN) )
+ return CPUFREQ_POLICY_ONDEMAND;
+
+ return CPUFREQ_POLICY_UNKNOWN;
+}
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 32cf905fb8..b0b22d1c9c 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -82,6 +82,7 @@ struct cpufreq_policy {
int8_t turbo; /* tristate flag: 0 for unsupported
* -1 for disable, 1 for enabled
* See CPUFREQ_TURBO_* below for defines */
+ unsigned int policy; /* CPUFREQ_POLICY_* */
};
DECLARE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_policy);
@@ -132,6 +133,23 @@ extern int cpufreq_register_governor(struct cpufreq_governor *governor);
extern struct cpufreq_governor *__find_governor(const char *governor);
#define CPUFREQ_DEFAULT_GOVERNOR &cpufreq_gov_dbs
+/*
+ * Performance Policy
+ * If cpufreq_driver->target() exists, the ->governor decides what frequency
+ * within the limits is used. If cpufreq_driver->setpolicy() exists, these
+ * following policies are available:
+ * CPUFREQ_POLICY_PERFORMANCE represents maximum performance
+ * CPUFREQ_POLICY_POWERSAVE represents least power consumption
+ * CPUFREQ_POLICY_ONDEMAND represents no preference over performance or
+ * powersave
+ */
+#define CPUFREQ_POLICY_UNKNOWN 0
+#define CPUFREQ_POLICY_POWERSAVE 1
+#define CPUFREQ_POLICY_PERFORMANCE 2
+#define CPUFREQ_POLICY_ONDEMAND 3
+
+unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov);
+
/* pass a target to the cpufreq driver */
extern int __cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int target_freq,
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index aa29a5401c..eb3a23b038 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -454,6 +454,7 @@ struct xen_set_cppc_para {
};
#define XEN_AMD_CPPC_DRIVER_NAME "amd-cppc"
+#define XEN_AMD_CPPC_EPP_DRIVER_NAME "amd-cppc-epp"
#define XEN_HWP_DRIVER_NAME "hwp"
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 13/19] xen/x86: implement amd-cppc-epp driver for CPPC in active mode
2025-07-11 3:51 ` [PATCH v6 13/19] xen/x86: implement amd-cppc-epp driver for CPPC in active mode Penny Zheng
@ 2025-07-17 13:35 ` Jan Beulich
2025-08-12 7:40 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-17 13:35 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel
On 11.07.2025 05:51, Penny Zheng wrote:
> --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> @@ -67,7 +67,14 @@
> * max_perf.
> * Field des_perf conveys performance level Xen governor is requesting. And it
> * may be set to any performance value in the range [min_perf, max_perf],
> - * inclusive.
> + * inclusive. In active mode, desf_perf must be zero.
Nit (typo): des_perf
> @@ -259,11 +276,18 @@ static void cf_check amd_cppc_write_request_msrs(void *info)
> }
>
> static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
> - uint8_t des_perf, uint8_t max_perf)
> + uint8_t des_perf, uint8_t max_perf,
> + uint8_t epp)
> {
> struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
> uint64_t prev = data->req.raw;
>
> + if ( !opt_active_mode )
> + data->req.des_perf = des_perf;
> + else
> + data->req.des_perf = 0;
In amd_cppc_epp_set_policy() you pass 0 anyway. Why is this needed? With this
change dropped, opt_active_mode can become __initdata. (But of course you may
want to add an assertion instead, in which case the variable needs to stay
where it is at least in debug builds.)
> + data->req.epp = epp;
Ahead of this patch, aren't you mis-handling this field then, in that you
clear it (as you never read the MSR)?
> data->req.min_perf = min_perf;
> data->req.max_perf = max_perf;
> data->req.des_perf = des_perf;
Don't you need to delete this line with the addition above, or alternatively
change the above to
if ( opt_active_mode )
data->req.des_perf = 0;
?
> @@ -274,6 +298,14 @@ static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
> on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs, data, 1);
> }
>
> +static void read_epp_init(void)
> +{
> + uint64_t val;
> +
> + rdmsrl(MSR_AMD_CPPC_REQ, val);
> + this_cpu(epp_init) = MASK_EXTR(val, AMD_CPPC_EPP_MASK);
> +}
I'm unconvinced this is worth a separate function.
> +static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
> +{
> + const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
> + policy->cpu);
> + uint8_t max_perf, min_perf, epp;
> +
> + /*
> + * On default, set min_perf with lowest_nonlinear_perf, and max_perf
> + * with the highest, to ensure performance scaling in P-states range.
> + */
> + max_perf = data->caps.highest_perf;
> + min_perf = data->caps.lowest_nonlinear_perf;
> +
> + /*
> + * In policy CPUFREQ_POLICY_PERFORMANCE, increase min_perf to
> + * highest_perf to achieve ultmost performance.
> + * In policy CPUFREQ_POLICY_POWERSAVE, decrease max_perf to
> + * lowest_nonlinear_perf to achieve ultmost power saving.
> + */
> + switch ( policy->policy )
> + {
> + case CPUFREQ_POLICY_PERFORMANCE:
> + /* Force the epp value to be zero for performance policy */
> + epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
> + min_perf = data->caps.highest_perf;
Use the local variable you have, i.e. max_perf?
> + break;
> +
> + case CPUFREQ_POLICY_POWERSAVE:
> + /* Force the epp value to be 0xff for powersave policy */
> + epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
> + max_perf = data->caps.lowest_nonlinear_perf;
Use the local variable you have, i.e. min_perf?
> --- a/xen/arch/x86/include/asm/msr-index.h
> +++ b/xen/arch/x86/include/asm/msr-index.h
> @@ -245,6 +245,7 @@
> #define MSR_AMD_CPPC_ENABLE 0xc00102b1U
> #define AMD_CPPC_ENABLE (_AC(1, ULL) << 0)
> #define MSR_AMD_CPPC_REQ 0xc00102b3U
> +#define AMD_CPPC_EPP_MASK (_AC(0xff, ULL) << 24)
The reason I noticed the EPP issue in amd_cppc_write_request() is because
I wondered why you would need this, when you have the fields defined in
struct amd_cppc_drv_data.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 13/19] xen/x86: implement amd-cppc-epp driver for CPPC in active mode
2025-07-17 13:35 ` Jan Beulich
@ 2025-08-12 7:40 ` Penny, Zheng
0 siblings, 0 replies; 66+ messages in thread
From: Penny, Zheng @ 2025-08-12 7:40 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Andrew Cooper, Anthony PERARD, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, July 17, 2025 9:36 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v6 13/19] xen/x86: implement amd-cppc-epp driver for CPPC
> in active mode
>
> On 11.07.2025 05:51, Penny Zheng wrote:
> > --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > @@ -259,11 +276,18 @@ static void cf_check
> > amd_cppc_write_request_msrs(void *info) }
> >
> > static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
> > - uint8_t des_perf, uint8_t max_perf)
> > + uint8_t des_perf, uint8_t max_perf,
> > + uint8_t epp)
> > {
> > struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
> > uint64_t prev = data->req.raw;
> >
> > + if ( !opt_active_mode )
> > + data->req.des_perf = des_perf;
> > + else
> > + data->req.des_perf = 0;
>
> In amd_cppc_epp_set_policy() you pass 0 anyway. Why is this needed? With this
> change dropped, opt_active_mode can become __initdata. (But of course you may
> want to add an assertion instead, in which case the variable needs to stay where it
> is at least in debug builds.)
>
True, the if-else seems redundant. I'll make opt_active_mode __initdata under NDEBUG
```
#ifndef NDEBUG
static bool __ro_after_init opt_active_mode;
#else
static bool __initdata opt_active_mode;
#endif
```
> > + data->req.epp = epp;
>
> Ahead of this patch, aren't you mis-handling this field then, in that you clear it (as
> you never read the MSR)?
>
Yes, It will always be 0 of it in the previous commit. I shall move getting "pre-defined BIOS value" for epp thingy ahead
>
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 14/19] xen/cpufreq: get performance policy from governor set via xenpm
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (12 preceding siblings ...)
2025-07-11 3:51 ` [PATCH v6 13/19] xen/x86: implement amd-cppc-epp driver for CPPC in active mode Penny Zheng
@ 2025-07-11 3:51 ` Penny Zheng
2025-07-23 15:44 ` Jan Beulich
2025-07-11 3:51 ` [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-related parameters Penny Zheng
` (4 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:51 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Jan Beulich
Even if Xen governor is not used in amd-cppc active mode, we could
somehow deduce which performance policy (CPUFREQ_POLICY_xxx) user wants to
apply through which governor they choose, such as:
If user chooses performance governor, they want maximum performance, then
the policy shall be CPUFREQ_POLICY_PERFORMANCE
If user chooses powersave governor, they want the least power consumption,
then the policy shall be CPUFREQ_POLICY_POWERSAVE
Function cpufreq_policy_from_governor() is responsible for above transition,
and it shall be also effective when users setting new governor through xenpm.
ondemand and userspace are forbidden choices, and if users specify
such options, we shall not only give warning message to suggest using
"xenpm set-cpufreq-cppc", but also error out.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v4 -> v5:
- new commit
---
v5 -> v6:
- refactor warning message
---
xen/drivers/acpi/pm-op.c | 8 ++++++++
xen/drivers/cpufreq/utility.c | 1 +
2 files changed, 9 insertions(+)
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index d10f6db0e4..e616c3316a 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -205,6 +205,14 @@ static int set_cpufreq_gov(struct xen_sysctl_pm_op *op)
if ( new_policy.governor == NULL )
return -EINVAL;
+ new_policy.policy = cpufreq_policy_from_governor(new_policy.governor);
+ if ( new_policy.policy == CPUFREQ_POLICY_UNKNOWN )
+ {
+ printk("Failed to get performance policy from %s, Try \"xenpm set-cpufreq-cppc\"\n",
+ new_policy.governor->name);
+ return -EINVAL;
+ }
+
return __cpufreq_set_policy(old_policy, &new_policy);
}
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index 64bcc464f6..e2cc9ff2af 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -250,6 +250,7 @@ int __cpufreq_set_policy(struct cpufreq_policy *data,
data->min = policy->min;
data->max = policy->max;
data->limits = policy->limits;
+ data->policy = policy->policy;
if (cpufreq_driver.setpolicy)
return alternative_call(cpufreq_driver.setpolicy, data);
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 14/19] xen/cpufreq: get performance policy from governor set via xenpm
2025-07-11 3:51 ` [PATCH v6 14/19] xen/cpufreq: get performance policy from governor set via xenpm Penny Zheng
@ 2025-07-23 15:44 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-23 15:44 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, xen-devel
On 11.07.2025 05:51, Penny Zheng wrote:
> Even if Xen governor is not used in amd-cppc active mode, we could
> somehow deduce which performance policy (CPUFREQ_POLICY_xxx) user wants to
> apply through which governor they choose, such as:
> If user chooses performance governor, they want maximum performance, then
> the policy shall be CPUFREQ_POLICY_PERFORMANCE
> If user chooses powersave governor, they want the least power consumption,
> then the policy shall be CPUFREQ_POLICY_POWERSAVE
> Function cpufreq_policy_from_governor() is responsible for above transition,
> and it shall be also effective when users setting new governor through xenpm.
>
> ondemand and userspace are forbidden choices, and if users specify
Is this stale? You permit ...
> such options, we shall not only give warning message to suggest using
> "xenpm set-cpufreq-cppc", but also error out.
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> ---
> v4 -> v5:
> - new commit
> ---
> v5 -> v6:
> - refactor warning message
> ---
> xen/drivers/acpi/pm-op.c | 8 ++++++++
> xen/drivers/cpufreq/utility.c | 1 +
> 2 files changed, 9 insertions(+)
>
> diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
> index d10f6db0e4..e616c3316a 100644
> --- a/xen/drivers/acpi/pm-op.c
> +++ b/xen/drivers/acpi/pm-op.c
> @@ -205,6 +205,14 @@ static int set_cpufreq_gov(struct xen_sysctl_pm_op *op)
> if ( new_policy.governor == NULL )
> return -EINVAL;
>
> + new_policy.policy = cpufreq_policy_from_governor(new_policy.governor);
> + if ( new_policy.policy == CPUFREQ_POLICY_UNKNOWN )
... CPUFREQ_POLICY_ONDEMAND here now, aiui (as per patch 13).
> --- a/xen/drivers/cpufreq/utility.c
> +++ b/xen/drivers/cpufreq/utility.c
> @@ -250,6 +250,7 @@ int __cpufreq_set_policy(struct cpufreq_policy *data,
> data->min = policy->min;
> data->max = policy->max;
> data->limits = policy->limits;
> + data->policy = policy->policy;
> if (cpufreq_driver.setpolicy)
> return alternative_call(cpufreq_driver.setpolicy, data);
This looks like it would belong in the patch introducing the field. More
generally, is the field left uninitialized in certain cases until here?
That we will want to avoid.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-related parameters
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (13 preceding siblings ...)
2025-07-11 3:51 ` [PATCH v6 14/19] xen/cpufreq: get performance policy from governor set via xenpm Penny Zheng
@ 2025-07-11 3:51 ` Penny Zheng
2025-07-23 15:56 ` Jan Beulich
2025-07-11 3:51 ` [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC sub-op Penny Zheng
` (3 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:51 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Anthony PERARD, Jan Beulich
New helpers print_cppc_para() and get_cpufreq_cppc() are introduced to deal
with CPPC-related parameters, in order to be re-used when later exporting new
sub-op "get-cpufreq-cppc".
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v5 -> v6:
- new commit
---
tools/misc/xenpm.c | 53 +++++++++++++++++++++-------------------
xen/drivers/acpi/pm-op.c | 16 +++++++++---
2 files changed, 41 insertions(+), 28 deletions(-)
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 55b0b0c482..120e9eae22 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -799,6 +799,33 @@ static unsigned int calculate_activity_window(const xc_cppc_para_t *cppc,
return mantissa * multiplier;
}
+/* print out parameters about cpu cppc */
+static void print_cppc_para(unsigned int cpuid,
+ const xc_cppc_para_t *cppc)
+{
+ printf("cppc variables :\n");
+ printf(" hardware limits : lowest [%"PRIu32"] lowest nonlinear [%"PRIu32"]\n",
+ cppc->lowest, cppc->lowest_nonlinear);
+ printf(" : nominal [%"PRIu32"] highest [%"PRIu32"]\n",
+ cppc->nominal, cppc->highest);
+ printf(" configured limits : min [%"PRIu32"] max [%"PRIu32"] energy perf [%"PRIu32"]\n",
+ cppc->minimum, cppc->maximum, cppc->energy_perf);
+
+ if ( cppc->features & XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW )
+ {
+ unsigned int activity_window;
+ const char *units;
+
+ activity_window = calculate_activity_window(cppc, &units);
+ printf(" : activity_window [%"PRIu32" %s]\n",
+ activity_window, units);
+ }
+
+ printf(" : desired [%"PRIu32"%s]\n",
+ cppc->desired,
+ cppc->desired ? "" : " hw autonomous");
+}
+
/* print out parameters about cpu frequency */
static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
{
@@ -825,31 +852,7 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
printf("scaling_driver : %s\n", p_cpufreq->scaling_driver);
if ( hwp )
- {
- const xc_cppc_para_t *cppc = &p_cpufreq->u.cppc_para;
-
- printf("cppc variables :\n");
- printf(" hardware limits : lowest [%"PRIu32"] lowest nonlinear [%"PRIu32"]\n",
- cppc->lowest, cppc->lowest_nonlinear);
- printf(" : nominal [%"PRIu32"] highest [%"PRIu32"]\n",
- cppc->nominal, cppc->highest);
- printf(" configured limits : min [%"PRIu32"] max [%"PRIu32"] energy perf [%"PRIu32"]\n",
- cppc->minimum, cppc->maximum, cppc->energy_perf);
-
- if ( cppc->features & XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW )
- {
- unsigned int activity_window;
- const char *units;
-
- activity_window = calculate_activity_window(cppc, &units);
- printf(" : activity_window [%"PRIu32" %s]\n",
- activity_window, units);
- }
-
- printf(" : desired [%"PRIu32"%s]\n",
- cppc->desired,
- cppc->desired ? "" : " hw autonomous");
- }
+ print_cppc_para(cpuid, &p_cpufreq->u.cppc_para);
else
{
if ( p_cpufreq->gov_num )
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index e616c3316a..acaa33561f 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -77,6 +77,17 @@ static int read_scaling_available_governors(char *scaling_available_governors,
return 0;
}
+static int get_cpufreq_cppc(unsigned int cpu,
+ struct xen_get_cppc_para *cppc_para)
+{
+ int ret = -ENODEV;
+
+ if ( hwp_active() )
+ ret = get_hwp_para(cpu, cppc_para);
+
+ return ret;
+}
+
static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
{
uint32_t ret = 0;
@@ -141,9 +152,8 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
else
strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
- if ( hwp_active() )
- ret = get_hwp_para(policy->cpu, &op->u.get_para.u.cppc_para);
- else
+ ret = get_cpufreq_cppc(op->cpuid, &op->u.get_para.u.cppc_para);
+ if ( ret == -ENODEV )
{
if ( !(scaling_available_governors =
xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-related parameters
2025-07-11 3:51 ` [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-related parameters Penny Zheng
@ 2025-07-23 15:56 ` Jan Beulich
2025-08-12 9:56 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-23 15:56 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Anthony PERARD, xen-devel
On 11.07.2025 05:51, Penny Zheng wrote:
> New helpers print_cppc_para() and get_cpufreq_cppc() are introduced to deal
> with CPPC-related parameters, in order to be re-used when later exporting new
> sub-op "get-cpufreq-cppc".
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
I once again wonder whether this can go in right away, ahead of everything
that wants re-submitting.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* RE: [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-related parameters
2025-07-23 15:56 ` Jan Beulich
@ 2025-08-12 9:56 ` Penny, Zheng
2025-08-12 10:11 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Penny, Zheng @ 2025-08-12 9:56 UTC (permalink / raw)
To: Jan Beulich; +Cc: Huang, Ray, Anthony PERARD, xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, July 23, 2025 11:56 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-
> related parameters
>
> On 11.07.2025 05:51, Penny Zheng wrote:
> > New helpers print_cppc_para() and get_cpufreq_cppc() are introduced to
> > deal with CPPC-related parameters, in order to be re-used when later
> > exporting new sub-op "get-cpufreq-cppc".
> >
> > Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>
> Acked-by: Jan Beulich <jbeulich@suse.com>
>
> I once again wonder whether this can go in right away, ahead of everything that
> wants re-submitting.
Thx, it could.
>
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* RE: [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-related parameters
2025-08-12 9:56 ` Penny, Zheng
@ 2025-08-12 10:11 ` Penny, Zheng
0 siblings, 0 replies; 66+ messages in thread
From: Penny, Zheng @ 2025-08-12 10:11 UTC (permalink / raw)
To: Jan Beulich; +Cc: Huang, Ray, Anthony PERARD, xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Penny, Zheng
> Sent: Tuesday, August 12, 2025 5:56 PM
> To: Jan Beulich <jbeulich@suse.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; xen-devel@lists.xenproject.org
> Subject: RE: [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-
> related parameters
>
>
>
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@suse.com>
> > Sent: Wednesday, July 23, 2025 11:56 PM
> > To: Penny, Zheng <penny.zheng@amd.com>
> > Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> > <anthony.perard@vates.tech>; xen-devel@lists.xenproject.org
> > Subject: Re: [PATCH v6 15/19] tools/cpufreq: introduce helper to deal
> > with CPPC- related parameters
> >
> > On 11.07.2025 05:51, Penny Zheng wrote:
> > > New helpers print_cppc_para() and get_cpufreq_cppc() are introduced
> > > to deal with CPPC-related parameters, in order to be re-used when
> > > later exporting new sub-op "get-cpufreq-cppc".
> > >
> > > Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> >
> > Acked-by: Jan Beulich <jbeulich@suse.com>
> >
> > I once again wonder whether this can go in right away, ahead of
> > everything that wants re-submitting.
>
> Thx, it could.
>
Sorry, I just read the next commit review. This commit may still be needed, but the commit message needs rewording, as no new sub-cmd "get-cpufreq-cppc" will be introduced.
> >
> > Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC sub-op
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (14 preceding siblings ...)
2025-07-11 3:51 ` [PATCH v6 15/19] tools/cpufreq: introduce helper to deal with CPPC-related parameters Penny Zheng
@ 2025-07-11 3:51 ` Penny Zheng
2025-07-24 13:31 ` Jan Beulich
2025-07-11 3:51 ` [PATCH v6 17/19] xen/cpufreq: introduce helper cpufreq_in_cppc_passive_mode() Penny Zheng
` (2 subsequent siblings)
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:51 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Juergen Gross,
Andrew Cooper, Michal Orzel, Jan Beulich, Julien Grall,
Roger Pau Monné, Stefano Stabellini
In amd-cppc passive mode, it's Xen governor which is responsible for
performance tuning, so governor and CPPC could co-exist. That is, both
governor-info and CPPC-info need to be printed together via xenpm tool.
If we tried to still put it in "struct xen_get_cpufreq_para" (e.g. just move
out of union), "struct xen_get_cpufreq_para" will enlarge too much to further
make xen_sysctl.u exceed 128 bytes.
So we introduce a new sub-op GET_CPUFREQ_CPPC to specifically print
CPPC-related para.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v4 -> v5:
- new commit
---
v5 -> v6:
- remove the changes for get-cpufreq-para
---
tools/include/xenctrl.h | 2 ++
tools/libs/ctrl/xc_pm.c | 27 +++++++++++++++++++++
tools/misc/xenpm.c | 47 +++++++++++++++++++++++++++++++++++++
xen/drivers/acpi/pm-op.c | 4 ++++
xen/include/public/sysctl.h | 2 ++
5 files changed, 82 insertions(+)
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 965d3b585a..699243c4df 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1953,6 +1953,8 @@ int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
int ctrl_type, int ctrl_value);
int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
xc_set_cppc_para_t *set_cppc);
+int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
+ xc_cppc_para_t *cppc_para);
int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq);
int xc_set_sched_opt_smt(xc_interface *xch, uint32_t value);
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index 6fda973f1f..3f72152617 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -366,6 +366,33 @@ int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
return ret;
}
+int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
+ xc_cppc_para_t *cppc_para)
+{
+ int ret;
+ struct xen_sysctl sysctl = {};
+ struct xen_get_cppc_para *sys_cppc_para = &sysctl.u.pm_op.u.get_cppc;
+
+ if ( !xch || !cppc_para )
+ {
+ errno = EINVAL;
+ return -1;
+ }
+
+ sysctl.cmd = XEN_SYSCTL_pm_op;
+ sysctl.u.pm_op.cmd = GET_CPUFREQ_CPPC;
+ sysctl.u.pm_op.cpuid = cpuid;
+
+ ret = xc_sysctl(xch, &sysctl);
+ if ( ret )
+ return ret;
+
+ BUILD_BUG_ON(sizeof(*cppc_para) != sizeof(*sys_cppc_para));
+ memcpy(cppc_para, sys_cppc_para, sizeof(*sys_cppc_para));
+
+ return ret;
+}
+
int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq)
{
int ret = 0;
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 120e9eae22..bdc09f468a 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -69,6 +69,7 @@ void show_help(void)
" set-max-cstate <num>|'unlimited' [<num2>|'unlimited']\n"
" set the C-State limitation (<num> >= 0) and\n"
" optionally the C-sub-state limitation (<num2> >= 0)\n"
+ " get-cpufreq-cppc [cpuid] list cpu cppc parameter of CPU <cpuid> or all\n"
" set-cpufreq-cppc [cpuid] [balance|performance|powersave] <param:val>*\n"
" set Hardware P-State (HWP) parameters\n"
" on CPU <cpuid> or all if omitted.\n"
@@ -996,6 +997,51 @@ void cpufreq_para_func(int argc, char *argv[])
show_cpufreq_para_by_cpuid(xc_handle, cpuid);
}
+/* show cpu cppc parameters information on CPU cpuid */
+static int show_cppc_para_by_cpuid(xc_interface *xc_handle, unsigned int cpuid)
+{
+ int ret;
+ xc_cppc_para_t cppc_para;
+
+ ret = xc_get_cppc_para(xc_handle, cpuid, &cppc_para);
+ if ( !ret )
+ {
+ printf("cpu id : %d\n", cpuid);
+ print_cppc_para(cpuid, &cppc_para);
+ printf("\n");
+ }
+ else if ( errno == ENODEV )
+ {
+ ret = -ENODEV;
+ fprintf(stderr, "CPPC is not available!\n");
+ }
+ else
+ fprintf(stderr, "[CPU%u] failed to get cppc parameter\n", cpuid);
+
+ return ret;
+}
+
+static void cppc_para_func(int argc, char *argv[])
+{
+ int cpuid = -1;
+
+ if ( argc > 0 )
+ parse_cpuid(argv[0], &cpuid);
+
+ if ( cpuid < 0 )
+ {
+ unsigned int i;
+
+ /* show cpu cppc information on all cpus */
+ for ( i = 0; i < max_cpu_nr; i++ )
+ /* Bail out only on unsupported platform */
+ if ( show_cppc_para_by_cpuid(xc_handle, i) == -ENODEV )
+ break;
+ }
+ else
+ show_cppc_para_by_cpuid(xc_handle, cpuid);
+}
+
void scaling_max_freq_func(int argc, char *argv[])
{
int cpuid = -1, freq = -1;
@@ -1576,6 +1622,7 @@ struct {
{ "get-cpufreq-average", cpufreq_func },
{ "start", start_gather_func },
{ "get-cpufreq-para", cpufreq_para_func },
+ { "get-cpufreq-cppc", cppc_para_func },
{ "set-cpufreq-cppc", cppc_set_func },
{ "set-scaling-maxfreq", scaling_max_freq_func },
{ "set-scaling-minfreq", scaling_min_freq_func },
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index acaa33561f..0723cea34c 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -390,6 +390,10 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
ret = set_cpufreq_para(op);
break;
+ case GET_CPUFREQ_CPPC:
+ ret = get_cpufreq_cppc(op->cpuid, &op->u.get_cppc);
+ break;
+
case SET_CPUFREQ_CPPC:
ret = set_cpufreq_cppc(op);
break;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index eb3a23b038..2578a63b01 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -523,6 +523,7 @@ struct xen_sysctl_pm_op {
#define SET_CPUFREQ_PARA (CPUFREQ_PARA | 0x03)
#define GET_CPUFREQ_AVGFREQ (CPUFREQ_PARA | 0x04)
#define SET_CPUFREQ_CPPC (CPUFREQ_PARA | 0x05)
+ #define GET_CPUFREQ_CPPC (CPUFREQ_PARA | 0x06)
/* set/reset scheduler power saving option */
#define XEN_SYSCTL_pm_op_set_sched_opt_smt 0x21
@@ -547,6 +548,7 @@ struct xen_sysctl_pm_op {
uint32_t cpuid;
union {
struct xen_get_cpufreq_para get_para;
+ struct xen_get_cppc_para get_cppc;
struct xen_set_cpufreq_gov set_gov;
struct xen_set_cpufreq_para set_para;
struct xen_set_cppc_para set_cppc;
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC sub-op
2025-07-11 3:51 ` [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC sub-op Penny Zheng
@ 2025-07-24 13:31 ` Jan Beulich
2025-07-24 14:17 ` Jason Andryuk
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-24 13:31 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Anthony PERARD, Juergen Gross, Andrew Cooper,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel, Jason Andryuk
On 11.07.2025 05:51, Penny Zheng wrote:
> In amd-cppc passive mode, it's Xen governor which is responsible for
> performance tuning, so governor and CPPC could co-exist. That is, both
> governor-info and CPPC-info need to be printed together via xenpm tool.
>
> If we tried to still put it in "struct xen_get_cpufreq_para" (e.g. just move
> out of union), "struct xen_get_cpufreq_para" will enlarge too much to further
> make xen_sysctl.u exceed 128 bytes.
> So we introduce a new sub-op GET_CPUFREQ_CPPC to specifically print
> CPPC-related para.
>
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> ---
> v4 -> v5:
> - new commit
> ---
> v5 -> v6:
> - remove the changes for get-cpufreq-para
> ---
> tools/include/xenctrl.h | 2 ++
> tools/libs/ctrl/xc_pm.c | 27 +++++++++++++++++++++
> tools/misc/xenpm.c | 47 +++++++++++++++++++++++++++++++++++++
> xen/drivers/acpi/pm-op.c | 4 ++++
> xen/include/public/sysctl.h | 2 ++
> 5 files changed, 82 insertions(+)
>
> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
> index 965d3b585a..699243c4df 100644
> --- a/tools/include/xenctrl.h
> +++ b/tools/include/xenctrl.h
> @@ -1953,6 +1953,8 @@ int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
> int ctrl_type, int ctrl_value);
> int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
> xc_set_cppc_para_t *set_cppc);
> +int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
> + xc_cppc_para_t *cppc_para);
> int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq);
>
> int xc_set_sched_opt_smt(xc_interface *xch, uint32_t value);
> diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
> index 6fda973f1f..3f72152617 100644
> --- a/tools/libs/ctrl/xc_pm.c
> +++ b/tools/libs/ctrl/xc_pm.c
> @@ -366,6 +366,33 @@ int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
> return ret;
> }
>
> +int xc_get_cppc_para(xc_interface *xch, unsigned int cpuid,
> + xc_cppc_para_t *cppc_para)
> +{
> + int ret;
> + struct xen_sysctl sysctl = {};
> + struct xen_get_cppc_para *sys_cppc_para = &sysctl.u.pm_op.u.get_cppc;
> +
> + if ( !xch || !cppc_para )
> + {
> + errno = EINVAL;
> + return -1;
> + }
> +
> + sysctl.cmd = XEN_SYSCTL_pm_op;
> + sysctl.u.pm_op.cmd = GET_CPUFREQ_CPPC;
> + sysctl.u.pm_op.cpuid = cpuid;
> +
> + ret = xc_sysctl(xch, &sysctl);
> + if ( ret )
> + return ret;
> +
> + BUILD_BUG_ON(sizeof(*cppc_para) != sizeof(*sys_cppc_para));
> + memcpy(cppc_para, sys_cppc_para, sizeof(*sys_cppc_para));
> +
> + return ret;
> +}
> +
> int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq)
> {
> int ret = 0;
> diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
> index 120e9eae22..bdc09f468a 100644
> --- a/tools/misc/xenpm.c
> +++ b/tools/misc/xenpm.c
> @@ -69,6 +69,7 @@ void show_help(void)
> " set-max-cstate <num>|'unlimited' [<num2>|'unlimited']\n"
> " set the C-State limitation (<num> >= 0) and\n"
> " optionally the C-sub-state limitation (<num2> >= 0)\n"
> + " get-cpufreq-cppc [cpuid] list cpu cppc parameter of CPU <cpuid> or all\n"
> " set-cpufreq-cppc [cpuid] [balance|performance|powersave] <param:val>*\n"
> " set Hardware P-State (HWP) parameters\n"
> " on CPU <cpuid> or all if omitted.\n"
> @@ -996,6 +997,51 @@ void cpufreq_para_func(int argc, char *argv[])
> show_cpufreq_para_by_cpuid(xc_handle, cpuid);
> }
>
> +/* show cpu cppc parameters information on CPU cpuid */
> +static int show_cppc_para_by_cpuid(xc_interface *xc_handle, unsigned int cpuid)
> +{
> + int ret;
> + xc_cppc_para_t cppc_para;
> +
> + ret = xc_get_cppc_para(xc_handle, cpuid, &cppc_para);
> + if ( !ret )
> + {
> + printf("cpu id : %d\n", cpuid);
> + print_cppc_para(cpuid, &cppc_para);
> + printf("\n");
> + }
> + else if ( errno == ENODEV )
> + {
> + ret = -ENODEV;
> + fprintf(stderr, "CPPC is not available!\n");
> + }
> + else
> + fprintf(stderr, "[CPU%u] failed to get cppc parameter\n", cpuid);
> +
> + return ret;
> +}
> +
> +static void cppc_para_func(int argc, char *argv[])
> +{
> + int cpuid = -1;
> +
> + if ( argc > 0 )
> + parse_cpuid(argv[0], &cpuid);
> +
> + if ( cpuid < 0 )
> + {
> + unsigned int i;
> +
> + /* show cpu cppc information on all cpus */
> + for ( i = 0; i < max_cpu_nr; i++ )
> + /* Bail out only on unsupported platform */
> + if ( show_cppc_para_by_cpuid(xc_handle, i) == -ENODEV )
> + break;
> + }
> + else
> + show_cppc_para_by_cpuid(xc_handle, cpuid);
> +}
> +
> void scaling_max_freq_func(int argc, char *argv[])
> {
> int cpuid = -1, freq = -1;
> @@ -1576,6 +1622,7 @@ struct {
> { "get-cpufreq-average", cpufreq_func },
> { "start", start_gather_func },
> { "get-cpufreq-para", cpufreq_para_func },
> + { "get-cpufreq-cppc", cppc_para_func },
Didn't Jason also suggest that we would better not introduce a new command, but
rather make get-cpufreq-para invoke GET_CPUFREQ_CPPC as needed? Considering that
as per patch 15 the same information is already printed, I think I'm a little
lost with the need for this separate operation (and command), and then also with
the need for patch 15.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC sub-op
2025-07-24 13:31 ` Jan Beulich
@ 2025-07-24 14:17 ` Jason Andryuk
2025-07-24 14:47 ` Jan Beulich
2025-08-12 10:15 ` Penny, Zheng
0 siblings, 2 replies; 66+ messages in thread
From: Jason Andryuk @ 2025-07-24 14:17 UTC (permalink / raw)
To: Jan Beulich, Penny Zheng
Cc: ray.huang, Anthony PERARD, Juergen Gross, Andrew Cooper,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 2025-07-24 09:31, Jan Beulich wrote:
> On 11.07.2025 05:51, Penny Zheng wrote:
>> In amd-cppc passive mode, it's Xen governor which is responsible for
>> performance tuning, so governor and CPPC could co-exist. That is, both
>> governor-info and CPPC-info need to be printed together via xenpm tool.
>>
>> If we tried to still put it in "struct xen_get_cpufreq_para" (e.g. just move
>> out of union), "struct xen_get_cpufreq_para" will enlarge too much to further
>> make xen_sysctl.u exceed 128 bytes.
>> So we introduce a new sub-op GET_CPUFREQ_CPPC to specifically print
>> CPPC-related para.
>>
>> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>> void scaling_max_freq_func(int argc, char *argv[])
>> {
>> int cpuid = -1, freq = -1;
>> @@ -1576,6 +1622,7 @@ struct {
>> { "get-cpufreq-average", cpufreq_func },
>> { "start", start_gather_func },
>> { "get-cpufreq-para", cpufreq_para_func },
>> + { "get-cpufreq-cppc", cppc_para_func },
>
> Didn't Jason also suggest that we would better not introduce a new command, but
> rather make get-cpufreq-para invoke GET_CPUFREQ_CPPC as needed? Considering that
> as per patch 15 the same information is already printed, I think I'm a little
> lost with the need for this separate operation (and command), and then also with
> the need for patch 15.
Yes, but I thought I was repeating your suggestion, Jan :)
xenpm's show_cpufreq_para_by_cpuid() would do something like:
show_cpufreq_para_by_cpuid() {
xc_get_cpufreq_para()
hw_auto = HWP || CPPC
if ( hw_auto ) {
xc_get_cppc_para()
print_cppc_para()
} else
print_cpufreq_para()
}
Would that work?
That way the single `xenpm get-cpufreq-para` would return the current
cpufreq data without the user needed to know what is running.
Regards,
Jason
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC sub-op
2025-07-24 14:17 ` Jason Andryuk
@ 2025-07-24 14:47 ` Jan Beulich
2025-08-12 10:15 ` Penny, Zheng
1 sibling, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-24 14:47 UTC (permalink / raw)
To: Jason Andryuk
Cc: ray.huang, Anthony PERARD, Juergen Gross, Andrew Cooper,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel, Penny Zheng
On 24.07.2025 16:17, Jason Andryuk wrote:
> On 2025-07-24 09:31, Jan Beulich wrote:
>> On 11.07.2025 05:51, Penny Zheng wrote:
>>> In amd-cppc passive mode, it's Xen governor which is responsible for
>>> performance tuning, so governor and CPPC could co-exist. That is, both
>>> governor-info and CPPC-info need to be printed together via xenpm tool.
>>>
>>> If we tried to still put it in "struct xen_get_cpufreq_para" (e.g. just move
>>> out of union), "struct xen_get_cpufreq_para" will enlarge too much to further
>>> make xen_sysctl.u exceed 128 bytes.
>>> So we introduce a new sub-op GET_CPUFREQ_CPPC to specifically print
>>> CPPC-related para.
>>>
>>> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>
>>> void scaling_max_freq_func(int argc, char *argv[])
>>> {
>>> int cpuid = -1, freq = -1;
>>> @@ -1576,6 +1622,7 @@ struct {
>>> { "get-cpufreq-average", cpufreq_func },
>>> { "start", start_gather_func },
>>> { "get-cpufreq-para", cpufreq_para_func },
>>> + { "get-cpufreq-cppc", cppc_para_func },
>>
>> Didn't Jason also suggest that we would better not introduce a new command, but
>> rather make get-cpufreq-para invoke GET_CPUFREQ_CPPC as needed? Considering that
>> as per patch 15 the same information is already printed, I think I'm a little
>> lost with the need for this separate operation (and command), and then also with
>> the need for patch 15.
>
> Yes, but I thought I was repeating your suggestion, Jan :)
That's what I tried to express using "also" ;-)
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC sub-op
2025-07-24 14:17 ` Jason Andryuk
2025-07-24 14:47 ` Jan Beulich
@ 2025-08-12 10:15 ` Penny, Zheng
1 sibling, 0 replies; 66+ messages in thread
From: Penny, Zheng @ 2025-08-12 10:15 UTC (permalink / raw)
To: Andryuk, Jason, Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Juergen Gross, Andrew Cooper,
Orzel, Michal, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jason Andryuk <jason.andryuk@amd.com>
> Sent: Thursday, July 24, 2025 10:17 PM
> To: Jan Beulich <jbeulich@suse.com>; Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Juergen Gross <jgross@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; Orzel, Michal <Michal.Orzel@amd.com>;
> Julien Grall <julien@xen.org>; Roger Pau Monné <roger.pau@citrix.com>;
> Stefano Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC
> sub-op
>
> On 2025-07-24 09:31, Jan Beulich wrote:
> > On 11.07.2025 05:51, Penny Zheng wrote:
> >> In amd-cppc passive mode, it's Xen governor which is responsible for
> >> performance tuning, so governor and CPPC could co-exist. That is,
> >> both governor-info and CPPC-info need to be printed together via xenpm tool.
> >>
> >> If we tried to still put it in "struct xen_get_cpufreq_para" (e.g.
> >> just move out of union), "struct xen_get_cpufreq_para" will enlarge
> >> too much to further make xen_sysctl.u exceed 128 bytes.
> >> So we introduce a new sub-op GET_CPUFREQ_CPPC to specifically print
> >> CPPC-related para.
> >>
> >> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>
> >> void scaling_max_freq_func(int argc, char *argv[])
> >> {
> >> int cpuid = -1, freq = -1;
> >> @@ -1576,6 +1622,7 @@ struct {
> >> { "get-cpufreq-average", cpufreq_func },
> >> { "start", start_gather_func },
> >> { "get-cpufreq-para", cpufreq_para_func },
> >> + { "get-cpufreq-cppc", cppc_para_func },
> >
> > Didn't Jason also suggest that we would better not introduce a new
> > command, but rather make get-cpufreq-para invoke GET_CPUFREQ_CPPC as
> > needed? Considering that as per patch 15 the same information is
> > already printed, I think I'm a little lost with the need for this
> > separate operation (and command), and then also with the need for patch 15.
>
> Yes, but I thought I was repeating your suggestion, Jan :)
>
> xenpm's show_cpufreq_para_by_cpuid() would do something like:
>
> show_cpufreq_para_by_cpuid() {
> xc_get_cpufreq_para()
> hw_auto = HWP || CPPC
> if ( hw_auto ) {
> xc_get_cppc_para()
> print_cppc_para()
> } else
> print_cpufreq_para()
> }
>
> Would that work?
>
Understood, I will re-write as you suggests, thx
> That way the single `xenpm get-cpufreq-para` would return the current cpufreq
> data without the user needed to know what is running.
>
> Regards,
> Jason
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 17/19] xen/cpufreq: introduce helper cpufreq_in_cppc_passive_mode()
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (15 preceding siblings ...)
2025-07-11 3:51 ` [PATCH v6 16/19] xen/cpufreq: introduce GET_CPUFREQ_CPPC sub-op Penny Zheng
@ 2025-07-11 3:51 ` Penny Zheng
2025-07-24 13:57 ` Jan Beulich
2025-07-11 3:51 ` [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-cppc-epp Penny Zheng
2025-07-11 3:51 ` [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver Penny Zheng
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:51 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Jan Beulich
When cpufreq driver in cppc passive mode, it has both cppc and governor
info. We need to invoke two sysctl sub-ops ("get-cpufreq-cppc" and
"get-cpufreq-para") to produce both info.
A new helper cpufreq_in_cppc_passive_mode() is introduced to tell whether
cpufreq driver supports cppc passive mode.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v5 -> v6
- new commit
---
xen/drivers/acpi/pm-op.c | 10 +++++++++-
xen/drivers/cpufreq/cpufreq.c | 6 ++++++
xen/include/acpi/cpufreq/cpufreq.h | 2 ++
3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 0723cea34c..077efdfc5c 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -152,7 +152,15 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
else
strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
- ret = get_cpufreq_cppc(op->cpuid, &op->u.get_para.u.cppc_para);
+ /*
+ * When cpufreq driver in cppc passive mode, it has both cppc and governor
+ * info. Users could only rely on "get-cpufreq-cppc" to acquire CPPC info.
+ * And it returns governor info in "get-cpufreq-para"
+ */
+ if ( cpufreq_in_cppc_passive_mode(op->cpuid) )
+ ret = -ENODEV;
+ else
+ ret = get_cpufreq_cppc(op->cpuid, &op->u.get_para.u.cppc_para);
if ( ret == -ENODEV )
{
if ( !(scaling_available_governors =
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index cf1fcf1d22..431f2903f8 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -962,3 +962,9 @@ int __init cpufreq_register_driver(const struct cpufreq_driver *driver_data)
return 0;
}
+
+bool cpufreq_in_cppc_passive_mode(unsigned int cpuid)
+{
+ return processor_pminfo[cpuid]->init & XEN_CPPC_INIT &&
+ cpufreq_driver.target;
+}
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index b0b22d1c9c..dd55d268c0 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -295,4 +295,6 @@ int acpi_cpufreq_register(void);
int amd_cppc_cmdline_parse(const char *s, const char *e);
int amd_cppc_register_driver(void);
+bool cpufreq_in_cppc_passive_mode(unsigned int cpuid);
+
#endif /* __XEN_CPUFREQ_PM_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 17/19] xen/cpufreq: introduce helper cpufreq_in_cppc_passive_mode()
2025-07-11 3:51 ` [PATCH v6 17/19] xen/cpufreq: introduce helper cpufreq_in_cppc_passive_mode() Penny Zheng
@ 2025-07-24 13:57 ` Jan Beulich
0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2025-07-24 13:57 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, xen-devel
On 11.07.2025 05:51, Penny Zheng wrote:
> --- a/xen/drivers/acpi/pm-op.c
> +++ b/xen/drivers/acpi/pm-op.c
> @@ -152,7 +152,15 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
> else
> strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
>
> - ret = get_cpufreq_cppc(op->cpuid, &op->u.get_para.u.cppc_para);
> + /*
> + * When cpufreq driver in cppc passive mode, it has both cppc and governor
> + * info. Users could only rely on "get-cpufreq-cppc" to acquire CPPC info.
> + * And it returns governor info in "get-cpufreq-para"
> + */
Which of the two they need to invoke to obtain a complete picture? Both?
I'm confused by you bypassing get_cpufreq_cppc() (i.e. get_hwp_para())
for yet another reason, when - aiui - some information there is relevant
in both active and passive modes.
> + if ( cpufreq_in_cppc_passive_mode(op->cpuid) )
> + ret = -ENODEV;
> + else
> + ret = get_cpufreq_cppc(op->cpuid, &op->u.get_para.u.cppc_para);
Any reason the extra check isn't put in get_cpufreq_cppc(), alongside the
hwp_active() one?
> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -962,3 +962,9 @@ int __init cpufreq_register_driver(const struct cpufreq_driver *driver_data)
>
> return 0;
> }
> +
> +bool cpufreq_in_cppc_passive_mode(unsigned int cpuid)
> +{
> + return processor_pminfo[cpuid]->init & XEN_CPPC_INIT &&
Nit: Please use parentheses when using & and && together.
Also, isn't this function going to become unreachable when PM_OP=n, thus
violating Misra rule 2.1?
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-cppc-epp
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (16 preceding siblings ...)
2025-07-11 3:51 ` [PATCH v6 17/19] xen/cpufreq: introduce helper cpufreq_in_cppc_passive_mode() Penny Zheng
@ 2025-07-11 3:51 ` Penny Zheng
2025-07-24 14:09 ` Jan Beulich
2025-07-24 22:36 ` Jason Andryuk
2025-07-11 3:51 ` [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver Penny Zheng
18 siblings, 2 replies; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:51 UTC (permalink / raw)
To: xen-devel; +Cc: ray.huang, Penny Zheng, Anthony PERARD, Jan Beulich
HWP and amd-cppc-epp are both governor-less driver, so we introduce "hw_auto"
flag to together bypass governor-related print in print_cpufreq_para().
In set_cpufreq_para(), a new helper is introduced to help error out when
cpufreq core intialized in governor-less mode.
---
v3 -> v4:
- Include validation check fix here
---
v4 -> v5:
- validation check has beem moved to where XEN_PROCESSOR_PM_CPPC and
XEN_CPPC_INIT have been firstly introduced
- adding "cpufreq_driver.setpolicy == NULL" check to exclude governor-related
para for amd-cppc-epp driver in get/set_cpufreq_para()
---
v5 -> v6:
- add helper cpufreq_is_governorless() to tell whether cpufreq driver is
governor-less
---
tools/misc/xenpm.c | 10 +++++++---
xen/drivers/acpi/pm-op.c | 4 ++--
xen/drivers/cpufreq/cpufreq.c | 6 ++++++
xen/include/acpi/cpufreq/cpufreq.h | 1 +
4 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index bdc09f468a..9cb30ea9ce 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -830,9 +830,13 @@ static void print_cppc_para(unsigned int cpuid,
/* print out parameters about cpu frequency */
static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
{
- bool hwp = strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) == 0;
+ bool hw_auto = false;
int i;
+ if ( !strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) ||
+ !strcmp(p_cpufreq->scaling_driver, XEN_AMD_CPPC_EPP_DRIVER_NAME) )
+ hw_auto = true;
+
printf("cpu id : %d\n", cpuid);
printf("affected_cpus :");
@@ -840,7 +844,7 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
printf(" %d", p_cpufreq->affected_cpus[i]);
printf("\n");
- if ( hwp )
+ if ( hw_auto )
printf("cpuinfo frequency : base [%"PRIu32"] max [%"PRIu32"]\n",
p_cpufreq->cpuinfo_min_freq,
p_cpufreq->cpuinfo_max_freq);
@@ -852,7 +856,7 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
printf("scaling_driver : %s\n", p_cpufreq->scaling_driver);
- if ( hwp )
+ if ( hw_auto )
print_cppc_para(cpuid, &p_cpufreq->u.cppc_para);
else
{
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 077efdfc5c..54815c444b 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -244,8 +244,8 @@ static int set_cpufreq_para(struct xen_sysctl_pm_op *op)
if ( !policy || !policy->governor )
return -EINVAL;
- if ( hwp_active() )
- return -EOPNOTSUPP;
+ if ( cpufreq_is_governorless(op->cpuid) )
+ return -EOPNOTSUPP;
switch( op->u.set_para.ctrl_type )
{
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 431f2903f8..26aaef6008 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -968,3 +968,9 @@ bool cpufreq_in_cppc_passive_mode(unsigned int cpuid)
return processor_pminfo[cpuid]->init & XEN_CPPC_INIT &&
cpufreq_driver.target;
}
+
+bool cpufreq_is_governorless(unsigned int cpuid)
+{
+ return processor_pminfo[cpuid]->init && (hwp_active() ||
+ cpufreq_driver.setpolicy);
+}
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index dd55d268c0..da0456f46d 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -296,5 +296,6 @@ int amd_cppc_cmdline_parse(const char *s, const char *e);
int amd_cppc_register_driver(void);
bool cpufreq_in_cppc_passive_mode(unsigned int cpuid);
+bool cpufreq_is_governorless(unsigned int cpuid);
#endif /* __XEN_CPUFREQ_PM_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-cppc-epp
2025-07-11 3:51 ` [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-cppc-epp Penny Zheng
@ 2025-07-24 14:09 ` Jan Beulich
2025-08-13 6:57 ` Penny, Zheng
2025-07-24 22:36 ` Jason Andryuk
1 sibling, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-24 14:09 UTC (permalink / raw)
To: Penny Zheng; +Cc: ray.huang, Anthony PERARD, xen-devel
On 11.07.2025 05:51, Penny Zheng wrote:
> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -968,3 +968,9 @@ bool cpufreq_in_cppc_passive_mode(unsigned int cpuid)
> return processor_pminfo[cpuid]->init & XEN_CPPC_INIT &&
> cpufreq_driver.target;
> }
> +
> +bool cpufreq_is_governorless(unsigned int cpuid)
> +{
> + return processor_pminfo[cpuid]->init && (hwp_active() ||
> + cpufreq_driver.setpolicy);
> +}
The function, by its name, is seemingly generic, yet its implementation
is tailored to the HWP and CPPC drivers. I think such needs calling out
in a comment.
Seeing the XEN_CPPC_INIT check in context, I also wonder why here you
check for ->init just being non-zero.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-cppc-epp
2025-07-24 14:09 ` Jan Beulich
@ 2025-08-13 6:57 ` Penny, Zheng
0 siblings, 0 replies; 66+ messages in thread
From: Penny, Zheng @ 2025-08-13 6:57 UTC (permalink / raw)
To: Jan Beulich; +Cc: Huang, Ray, Anthony PERARD, xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, July 24, 2025 10:09 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-
> cppc-epp
>
> On 11.07.2025 05:51, Penny Zheng wrote:
> > --- a/xen/drivers/cpufreq/cpufreq.c
> > +++ b/xen/drivers/cpufreq/cpufreq.c
> > @@ -968,3 +968,9 @@ bool cpufreq_in_cppc_passive_mode(unsigned int cpuid)
> > return processor_pminfo[cpuid]->init & XEN_CPPC_INIT &&
> > cpufreq_driver.target;
> > }
> > +
> > +bool cpufreq_is_governorless(unsigned int cpuid) {
> > + return processor_pminfo[cpuid]->init && (hwp_active() ||
> > +
> > +cpufreq_driver.setpolicy); }
>
> The function, by its name, is seemingly generic, yet its implementation is tailored to
> the HWP and CPPC drivers. I think such needs calling out in a comment.
>
> Seeing the XEN_CPPC_INIT check in context, I also wonder why here you check
> for ->init just being non-zero.
>
Checking ->init being non-zero is to ensure that cpufreq core is initialized successfully, no matter Px mode or CPPC mode.
As non-zero cpufreq_driver.setpolicy callback could only verify that registered cpufreq driver is governorless.
Maybe I shall add comments to explain
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-cppc-epp
2025-07-11 3:51 ` [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-cppc-epp Penny Zheng
2025-07-24 14:09 ` Jan Beulich
@ 2025-07-24 22:36 ` Jason Andryuk
1 sibling, 0 replies; 66+ messages in thread
From: Jason Andryuk @ 2025-07-24 22:36 UTC (permalink / raw)
To: Penny Zheng, xen-devel; +Cc: ray.huang, Anthony PERARD, Jan Beulich
On 2025-07-10 23:51, Penny Zheng wrote:
> HWP and amd-cppc-epp are both governor-less driver, so we introduce "hw_auto"
> flag to together bypass governor-related print in print_cpufreq_para().
>
> In set_cpufreq_para(), a new helper is introduced to help error out when
> cpufreq core intialized in governor-less mode.
> ---
> v3 -> v4:
> - Include validation check fix here
> ---
> v4 -> v5:
> - validation check has beem moved to where XEN_PROCESSOR_PM_CPPC and
> XEN_CPPC_INIT have been firstly introduced
> - adding "cpufreq_driver.setpolicy == NULL" check to exclude governor-related
> para for amd-cppc-epp driver in get/set_cpufreq_para()
> ---
> v5 -> v6:
> - add helper cpufreq_is_governorless() to tell whether cpufreq driver is
> governor-less
> ---
> diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
> index 077efdfc5c..54815c444b 100644
> --- a/xen/drivers/acpi/pm-op.c
> +++ b/xen/drivers/acpi/pm-op.c
> @@ -244,8 +244,8 @@ static int set_cpufreq_para(struct xen_sysctl_pm_op *op)
> if ( !policy || !policy->governor )
> return -EINVAL;
>
> - if ( hwp_active() )
> - return -EOPNOTSUPP;
> + if ( cpufreq_is_governorless(op->cpuid) )
> + return -EOPNOTSUPP;
NIT: return indent off by 1.
Regards,
Jason
>
> switch( op->u.set_para.ctrl_type )
> {
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-07-11 3:50 [PATCH v6 00/19] amd-cppc CPU Performance Scaling Driver Penny Zheng
` (17 preceding siblings ...)
2025-07-11 3:51 ` [PATCH v6 18/19] xen/cpufreq: bypass governor-related para for amd-cppc-epp Penny Zheng
@ 2025-07-11 3:51 ` Penny Zheng
2025-07-24 14:44 ` Jan Beulich
18 siblings, 1 reply; 66+ messages in thread
From: Penny Zheng @ 2025-07-11 3:51 UTC (permalink / raw)
To: xen-devel
Cc: ray.huang, Penny Zheng, Anthony PERARD, Andrew Cooper,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
Introduce helper set_amd_cppc_para() and get_amd_cppc_para() to
SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.
In get_cpufreq_cppc()/set_cpufreq_cppc(), we include
"processor_pminfo[cpuid]->init & XEN_CPPC_INIT" condition check to deal with
cpufreq driver in amd-cppc.
Also, a new field "policy" has also been added in "struct xen_get_cppc_para"
to describe performance policy in active mode. It gets printed with other
cppc paras. Move manifest constants "XEN_CPUFREQ_POLICY_xxx" to public header
to let it be used in user space tools. Also add a new anchor
"XEN_CPUFREQ_POLICY_xxx" for array overrun check.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Give the variable des_perf an initializer of 0
- Use the strncmp()s directly in the if()
---
v3 -> v4
- refactor comments
- remove double blank lines
- replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
---
v4 -> v5:
- add new field "policy" in "struct xen_cppc_para"
- add new performamce policy XEN_CPUFREQ_POLICY_BALANCE
- drop string comparisons with "processor_pminfo[cpuid]->init & XEN_CPPC_INIT"
and "cpufreq.setpolicy == NULL"
- Blank line ahead of the main "return" of a function
- refactor comments, commit message and title
---
v5 -> v6:
- remove duplicated manifest constants, and just move it to public header
- use "else if" to avoid confusion that it looks as if both paths could be taken
- add check for legitimate perf values
- use "unknown" instead of "none"
- introduce "CPUFREQ_POLICY_END" for array overrun check in user space tools
---
tools/misc/xenpm.c | 11 ++
xen/arch/x86/acpi/cpufreq/amd-cppc.c | 180 +++++++++++++++++++++++++++
xen/drivers/acpi/pm-op.c | 10 +-
xen/include/acpi/cpufreq/cpufreq.h | 19 +--
xen/include/public/sysctl.h | 17 +++
5 files changed, 219 insertions(+), 18 deletions(-)
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 9cb30ea9ce..49766c8d35 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -38,6 +38,13 @@
static xc_interface *xc_handle;
static unsigned int max_cpu_nr;
+static const char cpufreq_policy_str[][12] = {
+ [CPUFREQ_POLICY_UNKNOWN] = "unknown",
+ [CPUFREQ_POLICY_POWERSAVE] = "powersave",
+ [CPUFREQ_POLICY_PERFORMANCE] = "performance",
+ [CPUFREQ_POLICY_ONDEMAND] = "ondemand",
+};
+
/* help message */
void show_help(void)
{
@@ -825,6 +832,10 @@ static void print_cppc_para(unsigned int cpuid,
printf(" : desired [%"PRIu32"%s]\n",
cppc->desired,
cppc->desired ? "" : " hw autonomous");
+
+ if ( cppc->policy < CPUFREQ_POLICY_END )
+ printf(" performance policy : %s\n",
+ cpufreq_policy_str[cppc->policy]);
}
/* print out parameters about cpu frequency */
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index e4bd990982..cee948b12f 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -561,6 +561,186 @@ static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
return 0;
}
+int get_amd_cppc_para(const struct cpufreq_policy *policy,
+ struct xen_get_cppc_para *cppc_para)
+{
+ const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
+ policy->cpu);
+
+ if ( data == NULL )
+ return -ENODATA;
+
+ cppc_para->policy = policy->policy;
+ cppc_para->lowest = data->caps.lowest_perf;
+ cppc_para->lowest_nonlinear = data->caps.lowest_nonlinear_perf;
+ cppc_para->nominal = data->caps.nominal_perf;
+ cppc_para->highest = data->caps.highest_perf;
+ cppc_para->minimum = data->req.min_perf;
+ cppc_para->maximum = data->req.max_perf;
+ cppc_para->desired = data->req.des_perf;
+ cppc_para->energy_perf = data->req.epp;
+
+ return 0;
+}
+
+int set_amd_cppc_para(struct cpufreq_policy *policy,
+ const struct xen_set_cppc_para *set_cppc)
+{
+ unsigned int cpu = policy->cpu;
+ struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
+ uint8_t max_perf, min_perf, des_perf = 0, epp;
+ bool active_mode;
+
+ if ( data == NULL )
+ return -ENOENT;
+
+ if ( cpufreq_is_governorless(cpu) )
+ active_mode = true;
+
+ /* Only allow values if params bit is set. */
+ if ( (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
+ set_cppc->desired) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
+ set_cppc->minimum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
+ set_cppc->maximum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF) &&
+ set_cppc->energy_perf) )
+ return -EINVAL;
+
+ /*
+ * Validate all parameters
+ * Maximum performance may be set to any performance value in the range
+ * [Nonlinear Lowest Performance, Highest Performance], inclusive.
+ */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
+ (set_cppc->maximum > data->caps.highest_perf ||
+ set_cppc->maximum < data->caps.lowest_nonlinear_perf) )
+ return -EINVAL;
+ /*
+ * Minimum performance may be set to any performance value in the range
+ * [Nonlinear Lowest Performance, Highest Performance], inclusive but must
+ * be set to a value that is less than or equal to Maximum Performance.
+ */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
+ (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
+ (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
+ set_cppc->minimum > set_cppc->maximum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
+ set_cppc->minimum > data->req.max_perf)) )
+ return -EINVAL;
+ /*
+ * Desired performance may be set to any performance value in the range
+ * [Minimum Performance, Maximum Performance], inclusive.
+ */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
+ {
+ if ( active_mode )
+ return -EOPNOTSUPP;
+
+ if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
+ set_cppc->desired > set_cppc->maximum) ||
+ (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
+ set_cppc->desired < set_cppc->minimum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
+ set_cppc->desired > data->req.max_perf) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
+ set_cppc->desired < data->req.min_perf) )
+ return -EINVAL;
+ }
+ /*
+ * Energy Performance Preference may be set with a range of values
+ * from 0 to 0xFF
+ */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF )
+ {
+ if ( !active_mode )
+ return -EOPNOTSUPP;
+
+ if ( set_cppc->energy_perf > UINT_MAX )
+ return -EINVAL;
+ }
+
+ /* Activity window not supported in MSR */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW )
+ return -EOPNOTSUPP;
+
+ /* Return if there is nothing to do. */
+ if ( set_cppc->set_params == 0 )
+ return 0;
+
+ epp = per_cpu(epp_init, cpu);
+ /*
+ * Apply presets:
+ * XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE/PERFORMANCE/ONDEMAND are
+ * only available when CPPC in active mode
+ */
+ switch ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_PRESET_MASK )
+ {
+ case XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE:
+ if ( !active_mode )
+ return -EINVAL;
+ policy->policy = CPUFREQ_POLICY_POWERSAVE;
+ min_perf = data->caps.lowest_nonlinear_perf;
+ /*
+ * Lower max_perf to nonlinear_lowest to achieve
+ * ultmost power saviongs
+ */
+ max_perf = data->caps.lowest_nonlinear_perf;
+ epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
+ break;
+
+ case XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE:
+ if ( !active_mode )
+ return -EINVAL;
+ policy->policy = CPUFREQ_POLICY_PERFORMANCE;
+ /* Increase min_perf to highest to achieve ultmost performance */
+ min_perf = data->caps.highest_perf;
+ max_perf = data->caps.highest_perf;
+ epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
+ break;
+
+ case XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND:
+ if ( !active_mode )
+ return -EINVAL;
+ policy->policy = CPUFREQ_POLICY_ONDEMAND;
+ min_perf = data->caps.lowest_nonlinear_perf;
+ max_perf = data->caps.highest_perf;
+ /*
+ * Take medium value to show no preference over
+ * performance or powersave
+ */
+ epp = CPPC_ENERGY_PERF_BALANCE;
+ break;
+
+ case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
+ policy->policy = CPUFREQ_POLICY_UNKNOWN;
+ min_perf = data->caps.lowest_nonlinear_perf;
+ max_perf = data->caps.highest_perf;
+ break;
+
+ default:
+ return -EINVAL;
+ }
+
+ /* Further customize presets if needed */
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM )
+ min_perf = set_cppc->minimum;
+
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM )
+ max_perf = set_cppc->maximum;
+
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF )
+ epp = set_cppc->energy_perf;
+
+ if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
+ des_perf = set_cppc->desired;
+
+ amd_cppc_write_request(cpu, min_perf, des_perf, max_perf, epp);
+
+ return 0;
+}
+
static const struct cpufreq_driver __initconst_cf_clobber
amd_cppc_cpufreq_driver =
{
diff --git a/xen/drivers/acpi/pm-op.c b/xen/drivers/acpi/pm-op.c
index 54815c444b..164290397e 100644
--- a/xen/drivers/acpi/pm-op.c
+++ b/xen/drivers/acpi/pm-op.c
@@ -84,6 +84,8 @@ static int get_cpufreq_cppc(unsigned int cpu,
if ( hwp_active() )
ret = get_hwp_para(cpu, cppc_para);
+ else if ( processor_pminfo[cpu]->init & XEN_CPPC_INIT )
+ ret = get_amd_cppc_para(per_cpu(cpufreq_cpu_policy, cpu), cppc_para);
return ret;
}
@@ -325,10 +327,12 @@ static int set_cpufreq_cppc(struct xen_sysctl_pm_op *op)
if ( !policy || !policy->governor )
return -ENOENT;
- if ( !hwp_active() )
- return -EOPNOTSUPP;
+ if ( hwp_active() )
+ return set_hwp_para(policy, &op->u.set_cppc);
+ else if ( processor_pminfo[op->cpuid]->init & XEN_CPPC_INIT )
+ return set_amd_cppc_para(policy, &op->u.set_cppc);
- return set_hwp_para(policy, &op->u.set_cppc);
+ return -EOPNOTSUPP;
}
int do_pm_op(struct xen_sysctl_pm_op *op)
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index da0456f46d..2bb10dc233 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -133,21 +133,6 @@ extern int cpufreq_register_governor(struct cpufreq_governor *governor);
extern struct cpufreq_governor *__find_governor(const char *governor);
#define CPUFREQ_DEFAULT_GOVERNOR &cpufreq_gov_dbs
-/*
- * Performance Policy
- * If cpufreq_driver->target() exists, the ->governor decides what frequency
- * within the limits is used. If cpufreq_driver->setpolicy() exists, these
- * following policies are available:
- * CPUFREQ_POLICY_PERFORMANCE represents maximum performance
- * CPUFREQ_POLICY_POWERSAVE represents least power consumption
- * CPUFREQ_POLICY_ONDEMAND represents no preference over performance or
- * powersave
- */
-#define CPUFREQ_POLICY_UNKNOWN 0
-#define CPUFREQ_POLICY_POWERSAVE 1
-#define CPUFREQ_POLICY_PERFORMANCE 2
-#define CPUFREQ_POLICY_ONDEMAND 3
-
unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov);
/* pass a target to the cpufreq driver */
@@ -294,6 +279,10 @@ int acpi_cpufreq_register(void);
int amd_cppc_cmdline_parse(const char *s, const char *e);
int amd_cppc_register_driver(void);
+int get_amd_cppc_para(const struct cpufreq_policy *policy,
+ struct xen_get_cppc_para *cppc_para);
+int set_amd_cppc_para(struct cpufreq_policy *policy,
+ const struct xen_set_cppc_para *set_cppc);
bool cpufreq_in_cppc_passive_mode(unsigned int cpuid);
bool cpufreq_is_governorless(unsigned int cpuid);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 2578a63b01..a6d7aedbad 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -336,8 +336,25 @@ struct xen_ondemand {
uint32_t up_threshold;
};
+/*
+ * Performance Policy
+ * If cpufreq_driver->target() exists, the ->governor decides what frequency
+ * within the limits is used. If cpufreq_driver->setpolicy() exists, these
+ * following policies are available:
+ * CPUFREQ_POLICY_PERFORMANCE represents maximum performance
+ * CPUFREQ_POLICY_POWERSAVE represents least power consumption
+ * CPUFREQ_POLICY_ONDEMAND represents no preference over performance or
+ * powersave
+ */
+#define CPUFREQ_POLICY_UNKNOWN 0
+#define CPUFREQ_POLICY_POWERSAVE 1
+#define CPUFREQ_POLICY_PERFORMANCE 2
+#define CPUFREQ_POLICY_ONDEMAND 3
+#define CPUFREQ_POLICY_END 4
+
struct xen_get_cppc_para {
/* OUT */
+ uint32_t policy; /* CPUFREQ_POLICY_xxx */
/* activity_window supported if set */
#define XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW (1 << 0)
uint32_t features; /* bit flags for features */
--
2.34.1
^ permalink raw reply related [flat|nested] 66+ messages in thread* Re: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-07-11 3:51 ` [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver Penny Zheng
@ 2025-07-24 14:44 ` Jan Beulich
2025-08-14 3:13 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-07-24 14:44 UTC (permalink / raw)
To: Penny Zheng
Cc: ray.huang, Anthony PERARD, Andrew Cooper, Michal Orzel,
Julien Grall, Roger Pau Monné, Stefano Stabellini, xen-devel
On 11.07.2025 05:51, Penny Zheng wrote:
> Introduce helper set_amd_cppc_para() and get_amd_cppc_para() to
> SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.
>
> In get_cpufreq_cppc()/set_cpufreq_cppc(), we include
> "processor_pminfo[cpuid]->init & XEN_CPPC_INIT" condition check to deal with
> cpufreq driver in amd-cppc.
>
> Also, a new field "policy" has also been added in "struct xen_get_cppc_para"
> to describe performance policy in active mode. It gets printed with other
> cppc paras. Move manifest constants "XEN_CPUFREQ_POLICY_xxx" to public header
> to let it be used in user space tools. Also add a new anchor
> "XEN_CPUFREQ_POLICY_xxx" for array overrun check.
If only they indeed had XEN_ prefixes.
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> ---
> v1 -> v2:
> - Give the variable des_perf an initializer of 0
> - Use the strncmp()s directly in the if()
> ---
> v3 -> v4
> - refactor comments
> - remove double blank lines
> - replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
> ---
> v4 -> v5:
> - add new field "policy" in "struct xen_cppc_para"
> - add new performamce policy XEN_CPUFREQ_POLICY_BALANCE
> - drop string comparisons with "processor_pminfo[cpuid]->init & XEN_CPPC_INIT"
> and "cpufreq.setpolicy == NULL"
> - Blank line ahead of the main "return" of a function
> - refactor comments, commit message and title
> ---
> v5 -> v6:
> - remove duplicated manifest constants, and just move it to public header
> - use "else if" to avoid confusion that it looks as if both paths could be taken
> - add check for legitimate perf values
> - use "unknown" instead of "none"
> - introduce "CPUFREQ_POLICY_END" for array overrun check in user space tools
Please don't; use ARRAY_SIZE() (and if necessary further checking) instead.
In fact I think ...
> --- a/tools/misc/xenpm.c
> +++ b/tools/misc/xenpm.c
> @@ -38,6 +38,13 @@
> static xc_interface *xc_handle;
> static unsigned int max_cpu_nr;
>
> +static const char cpufreq_policy_str[][12] = {
> + [CPUFREQ_POLICY_UNKNOWN] = "unknown",
> + [CPUFREQ_POLICY_POWERSAVE] = "powersave",
> + [CPUFREQ_POLICY_PERFORMANCE] = "performance",
> + [CPUFREQ_POLICY_ONDEMAND] = "ondemand",
> +};
> +
> /* help message */
> void show_help(void)
> {
> @@ -825,6 +832,10 @@ static void print_cppc_para(unsigned int cpuid,
> printf(" : desired [%"PRIu32"%s]\n",
> cppc->desired,
> cppc->desired ? "" : " hw autonomous");
> +
> + if ( cppc->policy < CPUFREQ_POLICY_END )
> + printf(" performance policy : %s\n",
> + cpufreq_policy_str[cppc->policy]);
... you would want to print "unknown" in all other cases as well.
It's not clear to me though how the printing is avoided for passive mode.
> --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> @@ -561,6 +561,186 @@ static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
> return 0;
> }
>
> +int get_amd_cppc_para(const struct cpufreq_policy *policy,
> + struct xen_get_cppc_para *cppc_para)
> +{
> + const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
> + policy->cpu);
> +
> + if ( data == NULL )
> + return -ENODATA;
> +
> + cppc_para->policy = policy->policy;
> + cppc_para->lowest = data->caps.lowest_perf;
> + cppc_para->lowest_nonlinear = data->caps.lowest_nonlinear_perf;
> + cppc_para->nominal = data->caps.nominal_perf;
> + cppc_para->highest = data->caps.highest_perf;
> + cppc_para->minimum = data->req.min_perf;
> + cppc_para->maximum = data->req.max_perf;
> + cppc_para->desired = data->req.des_perf;
> + cppc_para->energy_perf = data->req.epp;
> +
> + return 0;
> +}
> +
> +int set_amd_cppc_para(struct cpufreq_policy *policy,
> + const struct xen_set_cppc_para *set_cppc)
> +{
> + unsigned int cpu = policy->cpu;
> + struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
> + uint8_t max_perf, min_perf, des_perf = 0, epp;
> + bool active_mode;
> +
> + if ( data == NULL )
> + return -ENOENT;
> +
> + if ( cpufreq_is_governorless(cpu) )
> + active_mode = true;
Without "else" the variable will be left uninitialized. I'm surprised
the compiler allowed you to get away with this. Why is the function
call not simply the variable's initializer?
> + /* Only allow values if params bit is set. */
> + if ( (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
> + set_cppc->desired) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
> + set_cppc->minimum) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
> + set_cppc->maximum) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF) &&
> + set_cppc->energy_perf) )
> + return -EINVAL;
> +
> + /*
> + * Validate all parameters
> + * Maximum performance may be set to any performance value in the range
> + * [Nonlinear Lowest Performance, Highest Performance], inclusive.
> + */
> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
Nit: Missing parentheses again. Which is particularly odd since in the
immediately preceding if() they're there. More of this further down.
> + (set_cppc->maximum > data->caps.highest_perf ||
> + set_cppc->maximum < data->caps.lowest_nonlinear_perf) )
> + return -EINVAL;
> + /*
> + * Minimum performance may be set to any performance value in the range
> + * [Nonlinear Lowest Performance, Highest Performance], inclusive but must
> + * be set to a value that is less than or equal to Maximum Performance.
> + */
> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
> + (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
> + set_cppc->minimum > set_cppc->maximum) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
Hmm, I find this confusing to read, and was first thinking the ! was wrong
here. Imo such is better expressed with the conditional operator:
set_cppc->minimum > (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM
? set_cppc->maximum
: data->req.max_perf)
Which also makes it easier to spot that here you use data->req, when
in the minimum check you use data->caps. Why this difference?
> + set_cppc->minimum > data->req.max_perf)) )
> + return -EINVAL;
> + /*
> + * Desired performance may be set to any performance value in the range
> + * [Minimum Performance, Maximum Performance], inclusive.
> + */
> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
> + {
> + if ( active_mode )
> + return -EOPNOTSUPP;
> +
> + if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
> + set_cppc->desired > set_cppc->maximum) ||
> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
> + set_cppc->desired < set_cppc->minimum) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
> + set_cppc->desired > data->req.max_perf) ||
> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
> + set_cppc->desired < data->req.min_perf) )
> + return -EINVAL;
All of the above applies here as well.
> + }
> + /*
> + * Energy Performance Preference may be set with a range of values
> + * from 0 to 0xFF
> + */
> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF )
> + {
> + if ( !active_mode )
> + return -EOPNOTSUPP;
> +
> + if ( set_cppc->energy_perf > UINT_MAX )
> + return -EINVAL;
> + }
> +
> + /* Activity window not supported in MSR */
> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW )
> + return -EOPNOTSUPP;
> +
> + /* Return if there is nothing to do. */
> + if ( set_cppc->set_params == 0 )
> + return 0;
Why did this move so far down (I think it was sitting earlier)? The
earlier 5 if()s are all needlessly carried out in this case (unless the
compiler can reason about moving this check ahead).
> + epp = per_cpu(epp_init, cpu);
> + /*
> + * Apply presets:
> + * XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE/PERFORMANCE/ONDEMAND are
> + * only available when CPPC in active mode
> + */
> + switch ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_PRESET_MASK )
> + {
> + case XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE:
> + if ( !active_mode )
> + return -EINVAL;
> + policy->policy = CPUFREQ_POLICY_POWERSAVE;
> + min_perf = data->caps.lowest_nonlinear_perf;
> + /*
> + * Lower max_perf to nonlinear_lowest to achieve
> + * ultmost power saviongs
> + */
> + max_perf = data->caps.lowest_nonlinear_perf;
Why not use the shorter "min_perf" here?
> + epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
> + break;
> +
> + case XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE:
> + if ( !active_mode )
> + return -EINVAL;
> + policy->policy = CPUFREQ_POLICY_PERFORMANCE;
> + /* Increase min_perf to highest to achieve ultmost performance */
> + min_perf = data->caps.highest_perf;
> + max_perf = data->caps.highest_perf;
And "max_perf" here? Furthermore if you moved ...
> + epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
> + break;
> +
> + case XEN_SYSCTL_CPPC_SET_PRESET_ONDEMAND:
> + if ( !active_mode )
> + return -EINVAL;
> + policy->policy = CPUFREQ_POLICY_ONDEMAND;
> + min_perf = data->caps.lowest_nonlinear_perf;
> + max_perf = data->caps.highest_perf;
... these two ahead of the switch(), you could further reduce the number
of assignments (overrides) done, including ...
> + /*
> + * Take medium value to show no preference over
> + * performance or powersave
> + */
> + epp = CPPC_ENERGY_PERF_BALANCE;
> + break;
> +
> + case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
> + policy->policy = CPUFREQ_POLICY_UNKNOWN;
> + min_perf = data->caps.lowest_nonlinear_perf;
> + max_perf = data->caps.highest_perf;
... the dropping of these two lines.
> --- a/xen/drivers/acpi/pm-op.c
> +++ b/xen/drivers/acpi/pm-op.c
> @@ -84,6 +84,8 @@ static int get_cpufreq_cppc(unsigned int cpu,
>
> if ( hwp_active() )
> ret = get_hwp_para(cpu, cppc_para);
> + else if ( processor_pminfo[cpu]->init & XEN_CPPC_INIT )
> + ret = get_amd_cppc_para(per_cpu(cpufreq_cpu_policy, cpu), cppc_para);
>
> return ret;
> }
> @@ -325,10 +327,12 @@ static int set_cpufreq_cppc(struct xen_sysctl_pm_op *op)
> if ( !policy || !policy->governor )
> return -ENOENT;
>
> - if ( !hwp_active() )
> - return -EOPNOTSUPP;
> + if ( hwp_active() )
> + return set_hwp_para(policy, &op->u.set_cppc);
> + else if ( processor_pminfo[op->cpuid]->init & XEN_CPPC_INIT )
As before, please can you avoid the use of "else" in such cases. Strictly
speaking that's "dead code" in Misra's nomeclature.
> --- a/xen/include/acpi/cpufreq/cpufreq.h
> +++ b/xen/include/acpi/cpufreq/cpufreq.h
> @@ -133,21 +133,6 @@ extern int cpufreq_register_governor(struct cpufreq_governor *governor);
> extern struct cpufreq_governor *__find_governor(const char *governor);
> #define CPUFREQ_DEFAULT_GOVERNOR &cpufreq_gov_dbs
>
> -/*
> - * Performance Policy
> - * If cpufreq_driver->target() exists, the ->governor decides what frequency
> - * within the limits is used. If cpufreq_driver->setpolicy() exists, these
> - * following policies are available:
> - * CPUFREQ_POLICY_PERFORMANCE represents maximum performance
> - * CPUFREQ_POLICY_POWERSAVE represents least power consumption
> - * CPUFREQ_POLICY_ONDEMAND represents no preference over performance or
> - * powersave
> - */
> -#define CPUFREQ_POLICY_UNKNOWN 0
> -#define CPUFREQ_POLICY_POWERSAVE 1
> -#define CPUFREQ_POLICY_PERFORMANCE 2
> -#define CPUFREQ_POLICY_ONDEMAND 3
> -
> unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov);
>
> /* pass a target to the cpufreq driver */
> @@ -294,6 +279,10 @@ int acpi_cpufreq_register(void);
>
> int amd_cppc_cmdline_parse(const char *s, const char *e);
> int amd_cppc_register_driver(void);
> +int get_amd_cppc_para(const struct cpufreq_policy *policy,
> + struct xen_get_cppc_para *cppc_para);
> +int set_amd_cppc_para(struct cpufreq_policy *policy,
> + const struct xen_set_cppc_para *set_cppc);
>
> bool cpufreq_in_cppc_passive_mode(unsigned int cpuid);
> bool cpufreq_is_governorless(unsigned int cpuid);
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -336,8 +336,25 @@ struct xen_ondemand {
> uint32_t up_threshold;
> };
>
> +/*
> + * Performance Policy
> + * If cpufreq_driver->target() exists, the ->governor decides what frequency
> + * within the limits is used. If cpufreq_driver->setpolicy() exists, these
> + * following policies are available:
> + * CPUFREQ_POLICY_PERFORMANCE represents maximum performance
> + * CPUFREQ_POLICY_POWERSAVE represents least power consumption
> + * CPUFREQ_POLICY_ONDEMAND represents no preference over performance or
> + * powersave
> + */
I appreciate that you want to retain the comment, but implementation details
of the hypervisor imo don't belong in public headers. That internal part may
want to move to e.g. the respective (internal) struct field.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread* RE: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-07-24 14:44 ` Jan Beulich
@ 2025-08-14 3:13 ` Penny, Zheng
2025-08-14 6:40 ` Jan Beulich
0 siblings, 1 reply; 66+ messages in thread
From: Penny, Zheng @ 2025-08-14 3:13 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, July 24, 2025 10:44 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
> xen_sysctl_pm_op for amd-cppc driver
>
> On 11.07.2025 05:51, Penny Zheng wrote:
> > Introduce helper set_amd_cppc_para() and get_amd_cppc_para() to
> > SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.
> >
> > In get_cpufreq_cppc()/set_cpufreq_cppc(), we include
> > "processor_pminfo[cpuid]->init & XEN_CPPC_INIT" condition check to
> > deal with cpufreq driver in amd-cppc.
> >
> > Also, a new field "policy" has also been added in "struct xen_get_cppc_para"
> > to describe performance policy in active mode. It gets printed with
> > other cppc paras. Move manifest constants "XEN_CPUFREQ_POLICY_xxx" to
> > public header to let it be used in user space tools. Also add a new
> > anchor "XEN_CPUFREQ_POLICY_xxx" for array overrun check.
>
> If only they indeed had XEN_ prefixes.
>
> > Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> > ---
> > v1 -> v2:
> > - Give the variable des_perf an initializer of 0
> > - Use the strncmp()s directly in the if()
> > ---
> > v3 -> v4
> > - refactor comments
> > - remove double blank lines
> > - replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
> > ---
> > v4 -> v5:
> > - add new field "policy" in "struct xen_cppc_para"
> > - add new performamce policy XEN_CPUFREQ_POLICY_BALANCE
> > - drop string comparisons with "processor_pminfo[cpuid]->init &
> XEN_CPPC_INIT"
> > and "cpufreq.setpolicy == NULL"
> > - Blank line ahead of the main "return" of a function
> > - refactor comments, commit message and title
> > ---
> > v5 -> v6:
> > - remove duplicated manifest constants, and just move it to public
> > header
> > - use "else if" to avoid confusion that it looks as if both paths
> > could be taken
> > - add check for legitimate perf values
> > - use "unknown" instead of "none"
> > - introduce "CPUFREQ_POLICY_END" for array overrun check in user space
> > tools
> > + (set_cppc->maximum > data->caps.highest_perf ||
> > + set_cppc->maximum < data->caps.lowest_nonlinear_perf) )
> > + return -EINVAL;
> > + /*
> > + * Minimum performance may be set to any performance value in the range
> > + * [Nonlinear Lowest Performance, Highest Performance], inclusive but must
> > + * be set to a value that is less than or equal to Maximum Performance.
> > + */
> > + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
> > + (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
> > + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
> > + set_cppc->minimum > set_cppc->maximum) ||
> > + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
>
> Hmm, I find this confusing to read, and was first thinking the ! was wrong here. Imo
> such is better expressed with the conditional operator:
>
>
> set_cppc->minimum > (set_cppc->set_params &
> XEN_SYSCTL_CPPC_SET_MAXIMUM
> ? set_cppc->maximum
> : data->req.max_perf)
>
Thx, understood!
> Which also makes it easier to spot that here you use data->req, when in the
> minimum check you use data->caps. Why this difference?
>
minimum check has two boundary check,
left boundary check is against data->caps.lowest_nonlinear_perf. And right boundary check is against data->req.max_perf. As it shall not only not larger than caps.highest_perf , but also req.max_perf. The relation between max_perf and highest_perf is validated in the maximum check. So here, we are only considering max_perf
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-14 3:13 ` Penny, Zheng
@ 2025-08-14 6:40 ` Jan Beulich
2025-08-14 7:34 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-08-14 6:40 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 14.08.2025 05:13, Penny, Zheng wrote:
> [Public]
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, July 24, 2025 10:44 PM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
>> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
>> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
>> xen_sysctl_pm_op for amd-cppc driver
>>
>> On 11.07.2025 05:51, Penny Zheng wrote:
>>> Introduce helper set_amd_cppc_para() and get_amd_cppc_para() to
>>> SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.
>>>
>>> In get_cpufreq_cppc()/set_cpufreq_cppc(), we include
>>> "processor_pminfo[cpuid]->init & XEN_CPPC_INIT" condition check to
>>> deal with cpufreq driver in amd-cppc.
>>>
>>> Also, a new field "policy" has also been added in "struct xen_get_cppc_para"
>>> to describe performance policy in active mode. It gets printed with
>>> other cppc paras. Move manifest constants "XEN_CPUFREQ_POLICY_xxx" to
>>> public header to let it be used in user space tools. Also add a new
>>> anchor "XEN_CPUFREQ_POLICY_xxx" for array overrun check.
>>
>> If only they indeed had XEN_ prefixes.
>>
>>> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>>> ---
>>> v1 -> v2:
>>> - Give the variable des_perf an initializer of 0
>>> - Use the strncmp()s directly in the if()
>>> ---
>>> v3 -> v4
>>> - refactor comments
>>> - remove double blank lines
>>> - replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
>>> ---
>>> v4 -> v5:
>>> - add new field "policy" in "struct xen_cppc_para"
>>> - add new performamce policy XEN_CPUFREQ_POLICY_BALANCE
>>> - drop string comparisons with "processor_pminfo[cpuid]->init &
>> XEN_CPPC_INIT"
>>> and "cpufreq.setpolicy == NULL"
>>> - Blank line ahead of the main "return" of a function
>>> - refactor comments, commit message and title
>>> ---
>>> v5 -> v6:
>>> - remove duplicated manifest constants, and just move it to public
>>> header
>>> - use "else if" to avoid confusion that it looks as if both paths
>>> could be taken
>>> - add check for legitimate perf values
>>> - use "unknown" instead of "none"
>>> - introduce "CPUFREQ_POLICY_END" for array overrun check in user space
>>> tools
>>> + (set_cppc->maximum > data->caps.highest_perf ||
>>> + set_cppc->maximum < data->caps.lowest_nonlinear_perf) )
>>> + return -EINVAL;
>>> + /*
>>> + * Minimum performance may be set to any performance value in the range
>>> + * [Nonlinear Lowest Performance, Highest Performance], inclusive but must
>>> + * be set to a value that is less than or equal to Maximum Performance.
>>> + */
>>> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
>>> + (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
>>> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
>>> + set_cppc->minimum > set_cppc->maximum) ||
>>> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
>>
>> Hmm, I find this confusing to read, and was first thinking the ! was wrong here. Imo
>> such is better expressed with the conditional operator:
>>
>>
>> set_cppc->minimum > (set_cppc->set_params &
>> XEN_SYSCTL_CPPC_SET_MAXIMUM
>> ? set_cppc->maximum
>> : data->req.max_perf)
>>
>
> Thx, understood!
>
>> Which also makes it easier to spot that here you use data->req, when in the
>> minimum check you use data->caps. Why this difference?
>>
>
> minimum check has two boundary check,
> left boundary check is against data->caps.lowest_nonlinear_perf. And right boundary check is against data->req.max_perf. As it shall not only not larger than caps.highest_perf , but also req.max_perf. The relation between max_perf and highest_perf is validated in the maximum check. So here, we are only considering max_perf
I still don't get why one check is against capabilities (permitted values) why the
other is again what's currently set.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* RE: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-14 6:40 ` Jan Beulich
@ 2025-08-14 7:34 ` Penny, Zheng
2025-08-14 8:29 ` Jan Beulich
0 siblings, 1 reply; 66+ messages in thread
From: Penny, Zheng @ 2025-08-14 7:34 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, August 14, 2025 2:40 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
> xen_sysctl_pm_op for amd-cppc driver
>
> On 14.08.2025 05:13, Penny, Zheng wrote:
> > [Public]
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Thursday, July 24, 2025 10:44 PM
> >> To: Penny, Zheng <penny.zheng@amd.com>
> >> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> >> <anthony.perard@vates.tech>; Andrew Cooper
> >> <andrew.cooper3@citrix.com>; Orzel, Michal <Michal.Orzel@amd.com>;
> >> Julien Grall <julien@xen.org>; Roger Pau Monné
> >> <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> >> xen- devel@lists.xenproject.org
> >> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt
> SET/GET_CPUFREQ_CPPC
> >> xen_sysctl_pm_op for amd-cppc driver
> >>
> >> On 11.07.2025 05:51, Penny Zheng wrote:
> >>> Introduce helper set_amd_cppc_para() and get_amd_cppc_para() to
> >>> SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.
> >>>
> >>> In get_cpufreq_cppc()/set_cpufreq_cppc(), we include
> >>> "processor_pminfo[cpuid]->init & XEN_CPPC_INIT" condition check to
> >>> deal with cpufreq driver in amd-cppc.
> >>>
> >>> Also, a new field "policy" has also been added in "struct xen_get_cppc_para"
> >>> to describe performance policy in active mode. It gets printed with
> >>> other cppc paras. Move manifest constants "XEN_CPUFREQ_POLICY_xxx"
> >>> to public header to let it be used in user space tools. Also add a
> >>> new anchor "XEN_CPUFREQ_POLICY_xxx" for array overrun check.
> >>
> >> If only they indeed had XEN_ prefixes.
> >>
> >>> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> >>> ---
> >>> v1 -> v2:
> >>> - Give the variable des_perf an initializer of 0
> >>> - Use the strncmp()s directly in the if()
> >>> ---
> >>> v3 -> v4
> >>> - refactor comments
> >>> - remove double blank lines
> >>> - replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
> >>> ---
> >>> v4 -> v5:
> >>> - add new field "policy" in "struct xen_cppc_para"
> >>> - add new performamce policy XEN_CPUFREQ_POLICY_BALANCE
> >>> - drop string comparisons with "processor_pminfo[cpuid]->init &
> >> XEN_CPPC_INIT"
> >>> and "cpufreq.setpolicy == NULL"
> >>> - Blank line ahead of the main "return" of a function
> >>> - refactor comments, commit message and title
> >>> ---
> >>> v5 -> v6:
> >>> - remove duplicated manifest constants, and just move it to public
> >>> header
> >>> - use "else if" to avoid confusion that it looks as if both paths
> >>> could be taken
> >>> - add check for legitimate perf values
> >>> - use "unknown" instead of "none"
> >>> - introduce "CPUFREQ_POLICY_END" for array overrun check in user
> >>> space tools
> >>> + (set_cppc->maximum > data->caps.highest_perf ||
> >>> + set_cppc->maximum < data->caps.lowest_nonlinear_perf) )
> >>> + return -EINVAL;
> >>> + /*
> >>> + * Minimum performance may be set to any performance value in the range
> >>> + * [Nonlinear Lowest Performance, Highest Performance], inclusive but
> must
> >>> + * be set to a value that is less than or equal to Maximum Performance.
> >>> + */
> >>> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
> >>> + (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
> >>> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
> >>> + set_cppc->minimum > set_cppc->maximum) ||
> >>> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM)
> &&
> >>
> >> Hmm, I find this confusing to read, and was first thinking the ! was
> >> wrong here. Imo such is better expressed with the conditional operator:
> >>
> >>
> >> set_cppc->minimum > (set_cppc->set_params &
> >> XEN_SYSCTL_CPPC_SET_MAXIMUM
> >> ? set_cppc->maximum
> >> : data->req.max_perf)
> >>
> >
> > Thx, understood!
> >
> >> Which also makes it easier to spot that here you use data->req, when
> >> in the minimum check you use data->caps. Why this difference?
> >>
> >
> > minimum check has two boundary check, left boundary check is against
> > data->caps.lowest_nonlinear_perf. And right boundary check is against
> > data->req.max_perf. As it shall not only not larger than
> > caps.highest_perf , but also req.max_perf. The relation between
> > max_perf and highest_perf is validated in the maximum check. So here,
> > we are only considering max_perf
>
> I still don't get why one check is against capabilities (permitted values) why the
> other is again what's currently set.
It needs to meet the following two criteria:
1. caps.lowest_nonlinear <= min_perf <= caps.highest_perf
2. min_perf <= max_perf. If users don't set max_perf at the same time, we are using the values stored in req.max_perf, which is the last setting.
>
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-14 7:34 ` Penny, Zheng
@ 2025-08-14 8:29 ` Jan Beulich
2025-08-14 8:32 ` Penny, Zheng
0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2025-08-14 8:29 UTC (permalink / raw)
To: Penny, Zheng
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 14.08.2025 09:34, Penny, Zheng wrote:
> [Public]
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, August 14, 2025 2:40 PM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
>> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
>> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
>> xen_sysctl_pm_op for amd-cppc driver
>>
>> On 14.08.2025 05:13, Penny, Zheng wrote:
>>> [Public]
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Thursday, July 24, 2025 10:44 PM
>>>> To: Penny, Zheng <penny.zheng@amd.com>
>>>> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
>>>> <anthony.perard@vates.tech>; Andrew Cooper
>>>> <andrew.cooper3@citrix.com>; Orzel, Michal <Michal.Orzel@amd.com>;
>>>> Julien Grall <julien@xen.org>; Roger Pau Monné
>>>> <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
>>>> xen- devel@lists.xenproject.org
>>>> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt
>> SET/GET_CPUFREQ_CPPC
>>>> xen_sysctl_pm_op for amd-cppc driver
>>>>
>>>> On 11.07.2025 05:51, Penny Zheng wrote:
>>>>> Introduce helper set_amd_cppc_para() and get_amd_cppc_para() to
>>>>> SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.
>>>>>
>>>>> In get_cpufreq_cppc()/set_cpufreq_cppc(), we include
>>>>> "processor_pminfo[cpuid]->init & XEN_CPPC_INIT" condition check to
>>>>> deal with cpufreq driver in amd-cppc.
>>>>>
>>>>> Also, a new field "policy" has also been added in "struct xen_get_cppc_para"
>>>>> to describe performance policy in active mode. It gets printed with
>>>>> other cppc paras. Move manifest constants "XEN_CPUFREQ_POLICY_xxx"
>>>>> to public header to let it be used in user space tools. Also add a
>>>>> new anchor "XEN_CPUFREQ_POLICY_xxx" for array overrun check.
>>>>
>>>> If only they indeed had XEN_ prefixes.
>>>>
>>>>> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>>>>> ---
>>>>> v1 -> v2:
>>>>> - Give the variable des_perf an initializer of 0
>>>>> - Use the strncmp()s directly in the if()
>>>>> ---
>>>>> v3 -> v4
>>>>> - refactor comments
>>>>> - remove double blank lines
>>>>> - replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
>>>>> ---
>>>>> v4 -> v5:
>>>>> - add new field "policy" in "struct xen_cppc_para"
>>>>> - add new performamce policy XEN_CPUFREQ_POLICY_BALANCE
>>>>> - drop string comparisons with "processor_pminfo[cpuid]->init &
>>>> XEN_CPPC_INIT"
>>>>> and "cpufreq.setpolicy == NULL"
>>>>> - Blank line ahead of the main "return" of a function
>>>>> - refactor comments, commit message and title
>>>>> ---
>>>>> v5 -> v6:
>>>>> - remove duplicated manifest constants, and just move it to public
>>>>> header
>>>>> - use "else if" to avoid confusion that it looks as if both paths
>>>>> could be taken
>>>>> - add check for legitimate perf values
>>>>> - use "unknown" instead of "none"
>>>>> - introduce "CPUFREQ_POLICY_END" for array overrun check in user
>>>>> space tools
>>>>> + (set_cppc->maximum > data->caps.highest_perf ||
>>>>> + set_cppc->maximum < data->caps.lowest_nonlinear_perf) )
>>>>> + return -EINVAL;
>>>>> + /*
>>>>> + * Minimum performance may be set to any performance value in the range
>>>>> + * [Nonlinear Lowest Performance, Highest Performance], inclusive but
>> must
>>>>> + * be set to a value that is less than or equal to Maximum Performance.
>>>>> + */
>>>>> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
>>>>> + (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
>>>>> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM &&
>>>>> + set_cppc->minimum > set_cppc->maximum) ||
>>>>> + (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM)
>> &&
>>>>
>>>> Hmm, I find this confusing to read, and was first thinking the ! was
>>>> wrong here. Imo such is better expressed with the conditional operator:
>>>>
>>>>
>>>> set_cppc->minimum > (set_cppc->set_params &
>>>> XEN_SYSCTL_CPPC_SET_MAXIMUM
>>>> ? set_cppc->maximum
>>>> : data->req.max_perf)
>>>>
>>>
>>> Thx, understood!
>>>
>>>> Which also makes it easier to spot that here you use data->req, when
>>>> in the minimum check you use data->caps. Why this difference?
>>>>
>>>
>>> minimum check has two boundary check, left boundary check is against
>>> data->caps.lowest_nonlinear_perf. And right boundary check is against
>>> data->req.max_perf. As it shall not only not larger than
>>> caps.highest_perf , but also req.max_perf. The relation between
>>> max_perf and highest_perf is validated in the maximum check. So here,
>>> we are only considering max_perf
>>
>> I still don't get why one check is against capabilities (permitted values) why the
>> other is again what's currently set.
>
> It needs to meet the following two criteria:
>
> 1. caps.lowest_nonlinear <= min_perf <= caps.highest_perf
> 2. min_perf <= max_perf. If users don't set max_perf at the same time, we are using the values stored in req.max_perf, which is the last setting.
Hmm, I see. Yet then what about the case of max being set without also setting
min? Overall I'm expecting full symmetry in the checking that's being done.
Jan
^ permalink raw reply [flat|nested] 66+ messages in thread
* RE: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver
2025-08-14 8:29 ` Jan Beulich
@ 2025-08-14 8:32 ` Penny, Zheng
0 siblings, 0 replies; 66+ messages in thread
From: Penny, Zheng @ 2025-08-14 8:32 UTC (permalink / raw)
To: Jan Beulich
Cc: Huang, Ray, Anthony PERARD, Andrew Cooper, Orzel, Michal,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
xen-devel@lists.xenproject.org
[Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, August 14, 2025 4:30 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> <anthony.perard@vates.tech>; Andrew Cooper <andrew.cooper3@citrix.com>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt SET/GET_CPUFREQ_CPPC
> xen_sysctl_pm_op for amd-cppc driver
>
> On 14.08.2025 09:34, Penny, Zheng wrote:
> > [Public]
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Thursday, August 14, 2025 2:40 PM
> >> To: Penny, Zheng <penny.zheng@amd.com>
> >> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> >> <anthony.perard@vates.tech>; Andrew Cooper
> >> <andrew.cooper3@citrix.com>; Orzel, Michal <Michal.Orzel@amd.com>;
> >> Julien Grall <julien@xen.org>; Roger Pau Monné
> >> <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>;
> >> xen- devel@lists.xenproject.org
> >> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt
> SET/GET_CPUFREQ_CPPC
> >> xen_sysctl_pm_op for amd-cppc driver
> >>
> >> On 14.08.2025 05:13, Penny, Zheng wrote:
> >>> [Public]
> >>>
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: Thursday, July 24, 2025 10:44 PM
> >>>> To: Penny, Zheng <penny.zheng@amd.com>
> >>>> Cc: Huang, Ray <Ray.Huang@amd.com>; Anthony PERARD
> >>>> <anthony.perard@vates.tech>; Andrew Cooper
> >>>> <andrew.cooper3@citrix.com>; Orzel, Michal <Michal.Orzel@amd.com>;
> >>>> Julien Grall <julien@xen.org>; Roger Pau Monné
> >>>> <roger.pau@citrix.com>; Stefano Stabellini
> >>>> <sstabellini@kernel.org>;
> >>>> xen- devel@lists.xenproject.org
> >>>> Subject: Re: [PATCH v6 19/19] xen/cpufreq: Adapt
> >> SET/GET_CPUFREQ_CPPC
> >>>> xen_sysctl_pm_op for amd-cppc driver
> >>>>
> >>>> On 11.07.2025 05:51, Penny Zheng wrote:
> >>>>> Introduce helper set_amd_cppc_para() and get_amd_cppc_para() to
> >>>>> SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.
> >>>>>
> >>>>> In get_cpufreq_cppc()/set_cpufreq_cppc(), we include
> >>>>> "processor_pminfo[cpuid]->init & XEN_CPPC_INIT" condition check to
> >>>>> deal with cpufreq driver in amd-cppc.
> >>>>>
> >>>>> Also, a new field "policy" has also been added in "struct
> xen_get_cppc_para"
> >>>>> to describe performance policy in active mode. It gets printed
> >>>>> with other cppc paras. Move manifest constants
> "XEN_CPUFREQ_POLICY_xxx"
> >>>>> to public header to let it be used in user space tools. Also add a
> >>>>> new anchor "XEN_CPUFREQ_POLICY_xxx" for array overrun check.
> >>>>
> >>>> If only they indeed had XEN_ prefixes.
> >>>>
> >>>>> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> >>>>> ---
> >>>>> v1 -> v2:
> >>>>> - Give the variable des_perf an initializer of 0
> >>>>> - Use the strncmp()s directly in the if()
> >>>>> ---
> >>>>> v3 -> v4
> >>>>> - refactor comments
> >>>>> - remove double blank lines
> >>>>> - replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
> >>>>> ---
> >>>>> v4 -> v5:
> >>>>> - add new field "policy" in "struct xen_cppc_para"
> >>>>> - add new performamce policy XEN_CPUFREQ_POLICY_BALANCE
> >>>>> - drop string comparisons with "processor_pminfo[cpuid]->init &
> >>>> XEN_CPPC_INIT"
> >>>>> and "cpufreq.setpolicy == NULL"
> >>>>> - Blank line ahead of the main "return" of a function
> >>>>> - refactor comments, commit message and title
> >>>>> ---
> >>>>> v5 -> v6:
> >>>>> - remove duplicated manifest constants, and just move it to public
> >>>>> header
> >>>>> - use "else if" to avoid confusion that it looks as if both paths
> >>>>> could be taken
> >>>>> - add check for legitimate perf values
> >>>>> - use "unknown" instead of "none"
> >>>>> - introduce "CPUFREQ_POLICY_END" for array overrun check in user
> >>>>> space tools
> >>>>> + (set_cppc->maximum > data->caps.highest_perf ||
> >>>>> + set_cppc->maximum < data->caps.lowest_nonlinear_perf) )
> >>>>> + return -EINVAL;
> >>>>> + /*
> >>>>> + * Minimum performance may be set to any performance value in the
> range
> >>>>> + * [Nonlinear Lowest Performance, Highest Performance],
> >>>>> + inclusive but
> >> must
> >>>>> + * be set to a value that is less than or equal to Maximum Performance.
> >>>>> + */
> >>>>> + if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM &&
> >>>>> + (set_cppc->minimum < data->caps.lowest_nonlinear_perf ||
> >>>>> + (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM
> &&
> >>>>> + set_cppc->minimum > set_cppc->maximum) ||
> >>>>> + (!(set_cppc->set_params &
> XEN_SYSCTL_CPPC_SET_MAXIMUM)
> >> &&
> >>>>
> >>>> Hmm, I find this confusing to read, and was first thinking the !
> >>>> was wrong here. Imo such is better expressed with the conditional operator:
> >>>>
> >>>>
> >>>> set_cppc->minimum > (set_cppc->set_params &
> >>>> XEN_SYSCTL_CPPC_SET_MAXIMUM
> >>>> ? set_cppc->maximum
> >>>> : data->req.max_perf)
> >>>>
> >>>
> >>> Thx, understood!
> >>>
> >>>> Which also makes it easier to spot that here you use data->req,
> >>>> when in the minimum check you use data->caps. Why this difference?
> >>>>
> >>>
> >>> minimum check has two boundary check, left boundary check is
> >>> against
> >>> data->caps.lowest_nonlinear_perf. And right boundary check is
> >>> data->against req.max_perf. As it shall not only not larger than
> >>> caps.highest_perf , but also req.max_perf. The relation between
> >>> max_perf and highest_perf is validated in the maximum check. So
> >>> here, we are only considering max_perf
> >>
> >> I still don't get why one check is against capabilities (permitted
> >> values) why the other is again what's currently set.
> >
> > It needs to meet the following two criteria:
> >
> > 1. caps.lowest_nonlinear <= min_perf <= caps.highest_perf 2. min_perf
> > <= max_perf. If users don't set max_perf at the same time, we are using the
> values stored in req.max_perf, which is the last setting.
>
> Hmm, I see. Yet then what about the case of max being set without also setting
> min? Overall I'm expecting full symmetry in the checking that's being done.
>
Oh, True, I forget symmetry scenario, will add.
> Jan
^ permalink raw reply [flat|nested] 66+ messages in thread