The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/2] ACPI: CPPC: Add ospm_nominal_perf support
       [not found]   ` <8516aeea-f20b-4afa-a737-1dff636f5c2d@arm.com>
@ 2026-05-07 21:03     ` Sumit Gupta
  0 siblings, 0 replies; 3+ messages in thread
From: Sumit Gupta @ 2026-05-07 21:03 UTC (permalink / raw)
  To: Pierre Gondois, rafael, viresh.kumar, lenb, zhenglifeng1,
	zhanjie9, mario.limonciello, saket.dumbre, linux-acpi,
	linux-kernel, linux-pm, acpica-devel
  Cc: treding, jonathanh, vsethi, ksitaraman, sanjayc, bbasu, sumitg


On 30/04/26 21:55, Pierre Gondois wrote:
> External email: Use caution opening links or attachments
>
>
> Hello Sumit,
>
> On 4/30/26 16:24, Sumit Gupta wrote:
>> Add acpi_cppc/ospm_nominal_perf sysfs attribute (read-write) and
>> cppc_set_ospm_nominal_perf() API for the OSPM Nominal Performance
>> register (ACPI 6.6, Section 8.4.6.1.2.6).
>>
>> The register conveys the desired nominal performance level at which
>> the platform may run. OSPM can request a lower level than platform
>> nominal. Valid range is [Lowest Performance, Nominal Performance].
>> The value tells the platform what OSPM considers nominal. The
>> platform classifies performance above this as boosted and below as
>> throttled. It uses that for its power/thermal decisions.
>>
>> Although the register is write-only per spec, cache the OSPM-written
>> value in cpc_desc so userspace can observe it via sysfs, and to
>> skip redundant writes.
>>
>> Initialize to platform nominal at policy init. Override via sysfs
>> if needed.
>>
>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>> ---
>>   drivers/acpi/cppc_acpi.c       | 69 ++++++++++++++++++++++++++++++++++
>>   drivers/cpufreq/cppc_cpufreq.c | 10 +++++
>>   include/acpi/cppc_acpi.h       |  6 +++
>>   3 files changed, 85 insertions(+)
>>
>> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
>> index a1c91ce20cc8..fbc620adafad 100644
>> --- a/drivers/acpi/cppc_acpi.c
>> +++ b/drivers/acpi/cppc_acpi.c
>> @@ -155,6 +155,10 @@ static DEFINE_PER_CPU(struct cpc_desc *, 
>> cpc_desc_ptr);
>>   static struct kobj_attribute _name =                \
>>   __ATTR(_name, 0444, show_##_name, NULL)
>>
>> +#define define_one_cppc_rw(_name)            \
>> +static struct kobj_attribute _name =         \
>> +__ATTR(_name, 0644, show_##_name, store_##_name)
>> +
>>   #define to_cpc_desc(a) container_of(a, struct cpc_desc, kobj)
>>
>>   #define show_cppc_data(access_fn, struct_name, member_name)         \
>> @@ -211,6 +215,38 @@ static ssize_t show_feedback_ctrs(struct kobject 
>> *kobj,
>>   }
>>   define_one_cppc_ro(feedback_ctrs);
>>
>> +static ssize_t show_ospm_nominal_perf(struct kobject *kobj,
>> +                                   struct kobj_attribute *attr, char 
>> *buf)
>> +{
>> +     struct cpc_desc *cpc_ptr = to_cpc_desc(kobj);
>> +     u64 val = READ_ONCE(cpc_ptr->ospm_nominal_perf);
>> +
>> +     if (!val)
>> +             return -ENODATA;
>> +
>> +     return sysfs_emit(buf, "%llu\n", val);
>> +}
>> +
>> +static ssize_t store_ospm_nominal_perf(struct kobject *kobj,
>> +                                    struct kobj_attribute *attr,
>> +                                    const char *buf, size_t count)
>> +{
>> +     struct cpc_desc *cpc_ptr = to_cpc_desc(kobj);
>> +     u64 val;
>> +     int ret;
>> +
>> +     ret = kstrtou64(buf, 0, &val);
>> +     if (ret)
>> +             return ret;
>> +
>> +     ret = cppc_set_ospm_nominal_perf(cpc_ptr->cpu_id, val);
>> +     if (ret)
>> +             return ret;
>> +
>> +     return count;
>> +}
>> +define_one_cppc_rw(ospm_nominal_perf);
>> +
>>   static struct attribute *cppc_attrs[] = {
>>       &feedback_ctrs.attr,
>>       &reference_perf.attr,
>> @@ -222,6 +258,7 @@ static struct attribute *cppc_attrs[] = {
>>       &nominal_perf.attr,
>>       &nominal_freq.attr,
>>       &lowest_freq.attr,
>> +     &ospm_nominal_perf.attr,
>>       NULL
>>   };
>>   ATTRIBUTE_GROUPS(cppc);
>> @@ -1683,6 +1720,38 @@ int cppc_set_epp(int cpu, u64 epp_val)
>>   }
>>   EXPORT_SYMBOL_GPL(cppc_set_epp);
>>
>> +/**
>> + * cppc_set_ospm_nominal_perf() - Write OSPM Nominal Performance 
>> register.
>> + * @cpu: CPU on which to write register.
>> + * @ospm_nominal_perf: Value to write to the OSPM Nominal 
>> Performance register.
>> + *
>> + * OSPM Nominal Performance allows OSPM to inform the platform of 
>> the nominal
>> + * performance level it intends to maintain.
>> + *
>> + * Return: 0 for success, -EINVAL on invalid input, -EOPNOTSUPP if not
>> + * supported, -EIO otherwise.
>> + */
>> +int cppc_set_ospm_nominal_perf(int cpu, u64 ospm_nominal_perf)
>> +{
>> +     struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
>> +     int ret;
>> +
>> +     if (!ospm_nominal_perf || ospm_nominal_perf > U32_MAX)
>> +             return -EINVAL;
> I think the spec also requests to have a value in the range
>
> [lowest:nominal]. As these registers are read-only it should
>
> be ok to read the values here ?

Will add the [lowest_perf, nominal_perf] range check in v3,
fetching the bounds via cppc_get_perf_caps().


>
>> +
>> +     if (cpc_desc &&
>> +         READ_ONCE(cpc_desc->ospm_nominal_perf) == ospm_nominal_perf)
>> +             return 0;
>> +
>> +     ret = cppc_set_reg_val(cpu, OSPM_NOMINAL_PERF, ospm_nominal_perf);
>> +     if (ret)
>> +             return ret;
>> +
>
> Shouldn't we have some protection against concurrent accesses ?
>> + WRITE_ONCE(cpc_desc->ospm_nominal_perf, ospm_nominal_perf);
>> +     return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(cppc_set_ospm_nominal_perf);
>> +
>>   /**
>>    * cppc_get_auto_act_window() - Read autonomous activity window 
>> register.
>>    * @cpu: CPU from which to read register.
>> diff --git a/drivers/cpufreq/cppc_cpufreq.c 
>> b/drivers/cpufreq/cppc_cpufreq.c
>> index 7e7f9dfb7a24..d06cba963550 100644
>> --- a/drivers/cpufreq/cppc_cpufreq.c
>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>> @@ -715,6 +715,16 @@ static int cppc_cpufreq_cpu_init(struct 
>> cpufreq_policy *policy)
>>               goto out;
>>       }
>>
>> +     /*
>> +      * Initialize OSPM Nominal Performance to inform firmware of
>> +      * OSPM's nominal level. Performance above this value = boost;
>> +      * below = throttle. Uses platform nominal by default.
>> +      */
>> +     ret = cppc_set_ospm_nominal_perf(cpu, caps->nominal_perf);
>> +     if (ret && ret != -EOPNOTSUPP)
>> +             pr_debug("Failed to set ospm_nominal_perf for CPU%d: 
>> %d\n",
>> +                      cpu, ret);
>> +
>
> IIUC, if (ospm_nominal_perf == nominal_perf), the firmware should
> not behave differently. Is this really useful ?
>

Right, it's a no-op from the firmware's side. The init was only so that
sysfs would show a value (platform nominal) before any userspace write.
Will drop it in v3 and return 0 from sysfs until userspace writes a value.


> ------------
>
> Also this seems like there will need some synchronization
> mechanism to keep-up with the boost state.
>
> If the ospm_nominal_perf is lowered and boost is disabled,
> a freq. update should happen. IMO it looks like this could
> be handled with (another) freq_qos_request.
>
> This new freq_qos_request, if we name it ospm_nominal_freq_req,
> should only be taken into account if boost is disabled.
> Otherwise, if boost is enabled, ospm_nominal_freq_req
> should be ignored.
>

Agreed, will add the new freq_qos_request in a follow-up patch.

> ------------
>
> Also, the function seems to set the ospm_nominal_freq for
> a single CPU when the policy might be common for multiple
> CPUs right ?

In v3, after dropping the change from cppc_cpufreq_cpu_init,
the problem won't come in this specific instance.


>
> The issues this field raises seems similar to the auto_sel
> ones. I.e. :
>
> - concurrency accesses + need for a scratch value
>
> - what should happen when unloading the driver
>
> - the value can be set for single CPUs but we might
> want to have the same value for the whole policy
>
> Maybe a common solution should be found.
> (I m not suggesting anything right now unfortunately).
>

One way to address this is to move the sysfs from per-CPU acpi_cppc to
a per-policy node under cpufreq (ospm_nominal_perf_freq, kHz).
In the sysfs callback, we can convert kHz to perf and write the register
on every CPU in policy->cpus.
Concurrency is already covered by policy->rwsem at the cpufreq layer.
This is similar to how we were handling min/max_perf in earlier version.
Does this approach make sense?

Thank you,
Sumit Gupta

....



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 0/2] ACPI: CPPC: Add CPPC v4 support (ACPI 6.6)
       [not found] <20260430142430.755437-1-sumitg@nvidia.com>
       [not found] ` <20260430142430.755437-3-sumitg@nvidia.com>
@ 2026-05-08 19:01 ` Rafael J. Wysocki
  2026-05-11 21:20   ` Sumit Gupta
  1 sibling, 1 reply; 3+ messages in thread
From: Rafael J. Wysocki @ 2026-05-08 19:01 UTC (permalink / raw)
  To: Sumit Gupta
  Cc: rafael, viresh.kumar, lenb, pierre.gondois, zhenglifeng1,
	zhanjie9, mario.limonciello, saket.dumbre, linux-acpi,
	linux-kernel, linux-pm, acpica-devel, treding, jonathanh, vsethi,
	ksitaraman, sanjayc, bbasu

On Thu, Apr 30, 2026 at 4:25 PM Sumit Gupta <sumitg@nvidia.com> wrote:
>
> Add initial kernel support for CPPC v4 (ACPI 6.6, Section 8.4.6),
> which extends the _CPC package from 23 to 25 entries with two
> optional fields:
>
>   - OSPM Nominal Performance (8.4.6.1.2.6): register used by OSPM
>     to tell the platform what it considers nominal. The platform
>     classifies performance above this as boost and below as
>     throttle for power/thermal decisions.
>
>   - Resource Priority (8.4.6.1.2.7): Package of Resource Priority
>     Register Descriptor sub-packages. Full parsing is not yet
>     implemented; such entries are marked as unsupported.
>
> Patch 1: Add v4 _CPC parsing - validate the 25-entry layout,
> handle the Resource Priority package, and mark the two new
> registers optional.
>
> Patch 2: Add acpi_cppc/ospm_nominal_perf as a read-write sysfs
> attribute, and initialize it to the platform nominal value
> during cppc_cpufreq policy init.
>
> ---
> v1[1] -> v2:
> - Patch 1: added Reviewed-by from Mario Limonciello.
> - Patch 2:
>   - Make ospm_nominal_perf sysfs read-write; cache last write in
>     cpc_desc and skip redundant register writes.
>   - Validate input in cppc_set_ospm_nominal_perf.
>
> Sumit Gupta (2):
>   ACPI: CPPC: Add support for CPPC v4
>   ACPI: CPPC: Add ospm_nominal_perf support
>
>  drivers/acpi/cppc_acpi.c       | 93 +++++++++++++++++++++++++++++++---
>  drivers/cpufreq/cppc_cpufreq.c | 10 ++++
>  include/acpi/cppc_acpi.h       | 14 ++++-
>  3 files changed, 109 insertions(+), 8 deletions(-)
>
> [1] https://lore.kernel.org/lkml/20260427051823.280419-1-sumitg@nvidia.com/
>
> --

Can you please see the sashiko.dev feedback on this set:

https://sashiko.dev/#/patchset/20260430142430.755437-1-sumitg%40nvidia.com

and let me know what you think?  Especially regarding the second patch?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 0/2] ACPI: CPPC: Add CPPC v4 support (ACPI 6.6)
  2026-05-08 19:01 ` [PATCH v2 0/2] ACPI: CPPC: Add CPPC v4 support (ACPI 6.6) Rafael J. Wysocki
@ 2026-05-11 21:20   ` Sumit Gupta
  0 siblings, 0 replies; 3+ messages in thread
From: Sumit Gupta @ 2026-05-11 21:20 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: viresh.kumar, lenb, pierre.gondois, zhenglifeng1, zhanjie9,
	mario.limonciello, saket.dumbre, linux-acpi, linux-kernel,
	linux-pm, acpica-devel, treding, jonathanh, vsethi, ksitaraman,
	sanjayc, bbasu, sumitg

Hi Rafael,


On 09/05/26 00:31, Rafael J. Wysocki wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Apr 30, 2026 at 4:25 PM Sumit Gupta <sumitg@nvidia.com> wrote:
>> Add initial kernel support for CPPC v4 (ACPI 6.6, Section 8.4.6),
>> which extends the _CPC package from 23 to 25 entries with two
>> optional fields:
>>
>>    - OSPM Nominal Performance (8.4.6.1.2.6): register used by OSPM
>>      to tell the platform what it considers nominal. The platform
>>      classifies performance above this as boost and below as
>>      throttle for power/thermal decisions.
>>
>>    - Resource Priority (8.4.6.1.2.7): Package of Resource Priority
>>      Register Descriptor sub-packages. Full parsing is not yet
>>      implemented; such entries are marked as unsupported.
>>
>> Patch 1: Add v4 _CPC parsing - validate the 25-entry layout,
>> handle the Resource Priority package, and mark the two new
>> registers optional.
>>
>> Patch 2: Add acpi_cppc/ospm_nominal_perf as a read-write sysfs
>> attribute, and initialize it to the platform nominal value
>> during cppc_cpufreq policy init.
>>
>> ---
>> v1[1] -> v2:
>> - Patch 1: added Reviewed-by from Mario Limonciello.
>> - Patch 2:
>>    - Make ospm_nominal_perf sysfs read-write; cache last write in
>>      cpc_desc and skip redundant register writes.
>>    - Validate input in cppc_set_ospm_nominal_perf.
>>
>> Sumit Gupta (2):
>>    ACPI: CPPC: Add support for CPPC v4
>>    ACPI: CPPC: Add ospm_nominal_perf support
>>
>>   drivers/acpi/cppc_acpi.c       | 93 +++++++++++++++++++++++++++++++---
>>   drivers/cpufreq/cppc_cpufreq.c | 10 ++++
>>   include/acpi/cppc_acpi.h       | 14 ++++-
>>   3 files changed, 109 insertions(+), 8 deletions(-)
>>
>> [1] https://lore.kernel.org/lkml/20260427051823.280419-1-sumitg@nvidia.com/
>>
>> --
> Can you please see the sashiko.dev feedback on this set:
>
> https://sashiko.dev/#/patchset/20260430142430.755437-1-sumitg%40nvidia.com
>
> and let me know what you think?  Especially regarding the second patch?


Thank you for sharing this.

Patch 1:
- Comments #1 and #2 are pre-existing issues with rare occurrence.
   I will address them in a separate hardening patch.

- Comment #3: In v3, will limit the ACPI_TYPE_PACKAGE handling to the
   RESOURCE_PRIORITY entry. So a Package at any other slot will be
   treated as invalid and abort probe, as it did before this patch.

----------------

Patch 2:
Discussed the changes for v3 in some detail on this thread already
which address most of the points (Please see my reply to Pierre [1]).

Summary of how each point will be addressed below:


 > The commit message states the valid range is [Lowest Performance,
 > Nominal Performance]. Does this code allow writing arbitrary values
 > outside that range by only checking against U32_MAX, without fetching
 > the CPU's capabilities to validate the input?
Will fetch the bounds via cppc_get_perf_caps() and reject values
outside [lowest_perf, nominal_perf] in v3.


 > If the hardware loses state during a logical CPU hotplug or system
 > suspend, but the software cache is not invalidated, will this check
 > prevent the register from being correctly re-initialized when the CPU
 > comes back online?
The redundant write check will be removed in v3, so the stale cache
failure mode won't be possible.


 > Can concurrent sysfs writes permanently desynchronize the software
 > cache from the hardware register?
 > ...
 > Is a lock needed around the read-modify-write cycle?
This will not occur in v3 since concurrent calls for the same
policy are serialized by policy->rwsem at the cpufreq layer (see [1]).


 > Additionally, can a time-of-check to time-of-use race lead to a NULL
 > pointer dereference if cpc_desc_ptr is initialized concurrently?
 > ...
 > Would this cause the WRITE_ONCE() to dereference the locally fetched NULL
 > cpc_desc pointer? Should this explicitly return -ENODEV early if 
!cpc_desc?
Will add the early -ENODEV return at the top of the function in v3,
eliminating the NULL cpc_desc race.


 > For shared cpufreq policies where policy->cpus contains multiple
 > logical cores (such as CPUFREQ_SHARED_TYPE_ANY), does this skip
 > initializing the secondary CPUs in the domain?
 >
 > If they are uninitialized, will their local cache remain 0, causing
 > sysfs reads for those secondary CPUs to incorrectly return -ENODATA?
Will move the rw sysfs from the per-CPU acpi_cppc interface to a
per-policy cpufreq interface in v3, and write the register on every
CPU in policy->cpus/domain.
The -ENODATA on unwritten read path will go away with the per-CPU node,
and the per-policy show returns 0 until user-space writes a value. See [1].


 > Also, since the sysfs attribute is tied to the physical CPU device
 > lifetime and persists independently of cpufreq policy teardowns, will
 > unconditionally setting the nominal performance here silently clobber
 > any persistent userspace configurations when a CPU is taken offline
 > and online?
Will drop the unconditional cpu_init write in v3, so the user-set
value won't be overwritten on CPU hotplug.

[1] 
https://lore.kernel.org/all/9c32f75a-294f-4cea-810e-c011c4dd91ab@nvidia.com/

Thank you,
Sumit Gupta



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-11 21:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260430142430.755437-1-sumitg@nvidia.com>
     [not found] ` <20260430142430.755437-3-sumitg@nvidia.com>
     [not found]   ` <8516aeea-f20b-4afa-a737-1dff636f5c2d@arm.com>
2026-05-07 21:03     ` [PATCH v2 2/2] ACPI: CPPC: Add ospm_nominal_perf support Sumit Gupta
2026-05-08 19:01 ` [PATCH v2 0/2] ACPI: CPPC: Add CPPC v4 support (ACPI 6.6) Rafael J. Wysocki
2026-05-11 21:20   ` Sumit Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox