From: Sumit Gupta <sumitg@nvidia.com>
To: "zhenglifeng (A)" <zhenglifeng1@huawei.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>, <lenb@kernel.org>,
<robert.moore@intel.com>, <corbet@lwn.net>,
<linux-pm@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
<linux-doc@vger.kernel.org>, <acpica-devel@lists.linux.dev>,
<linux-kernel@vger.kernel.org>, <linux-tegra@vger.kernel.org>,
<treding@nvidia.com>, <jonathanh@nvidia.com>, <sashal@nvidia.com>,
<vsethi@nvidia.com>, <ksitaraman@nvidia.com>,
<sanjayc@nvidia.com>, <bbasu@nvidia.com>,
Sumit Gupta <sumitg@nvidia.com>
Subject: Re: [Patch 0/5] Support Autonomous Selection mode in cppc_cpufreq
Date: Wed, 30 Apr 2025 20:30:20 +0530 [thread overview]
Message-ID: <076c199c-a081-4a7f-956c-f395f4d5e156@nvidia.com> (raw)
In-Reply-To: <d12a2ee3-0ea0-48bf-9f75-b29fc0039d9e@huawei.com>
>
>> Hi Sumit,
>>
>> May I resend the patch 8 in [1] first? Because I really need this new
>> feature.
>>
>> After that patch being merged, you can resend this series base on that,
>> change the paths of the sysfs files, add a new cppc_cpufreq instance or do
>> anything in that series. Then we can continue this discussion.
>>
>> Is that all right?
>
> Hi Sumit,
>
> Please let me know if you are OK with it.
>
Hi Zhenglifeng,
Both the ways do the same job i.e. set CPC registers.
To move this work forward, I am Ok with adding sysfs entries under
cpufreq syfs node and not under acpi_cppc if the maintainers are fine.
I will later send my updated patch to add more entries under cpufreq
sysfs for updating more CPC registers as done in patch 8 in [1].
Thank you,
Sumit Gupta
>>
>> On 2025/4/1 21:56, zhenglifeng (A) wrote:
>>
>>> Sorry for the delay.
>>>
>>> On 2025/3/14 20:48, Sumit Gupta wrote:
>>>>
>>>>
>>>>>>>
>>>>>>> There seems to be some quite fundamental disagreement on how this
>>>>>>> should be done, so I'm afraid I cannot do much about it ATM.
>>>>>>>
>>>>>>> Please agree on a common approach and come back to me when you are ready.
>>>>>>>
>>>>>>> Sending two concurrent patchsets under confusingly similar names again
>>>>>>> and again isn't particularly helpful.
>>>>>>>
>>>>>>> Thanks!
>>>>>>
>>>>>> Hi Rafael,
>>>>>>
>>>>>> Thank you for looking into this.
>>>>>>
>>>>>> Hi Lifeng,
>>>>>>
>>>>>> As per the discussion, we can make the driver future extensible and
>>>>>> also can optimize the register read/write access.
>>>>>>
>>>>>> I gave some thought and below is my proposal.
>>>>>>
>>>>>> 1) Pick 'Patch 1-7' from your patch series [1] which optimize API's
>>>>>> to read/write a cpc register.
>>>>>>
>>>>>> 2) Pick my patches in [2]:
>>>>>> - Patch 1-4: Keep all cpc registers together under acpi_cppc sysfs.
>>>>>> Also, update existing API's to read/write regs in batch.
>>>>>> - Patch 5: Creates 'cppc_cpufreq_epp_driver' instance for booting
>>>>>> all CPU's in Auto mode and set registers with right values.
>>>>>> They can be updated after boot from sysfs to change hints to HW.
>>>>>> I can use the optimized API's from [1] where required in [2].
>>>>>>
>>>>>> Let me know if you are okay with this proposal.
>>>>>> I can also send an updated patch series with all the patches combined?
>>>>>>
>>>>>> [1] https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
>>>>>> [2] https://lore.kernel.org/lkml/20250211103737.447704-1-sumitg@nvidia.com/
>>>>>>
>>>>>> Regards,
>>>>>> Sumit Gupta
>>>>>>
>>>>>
>>>>> Hi Sumit,
>>>>>
>>>>> Over the past few days, I've been thinking about your proposal and
>>>>> scenario.
>>>>>
>>>>> I think we both agree that PATCH 1-7 in [1] doesn't conflicts with [2], so
>>>>> the rest of the discussion focuses on the differences between [2] and the
>>>>> PATCH 8 in [1].
>>>>>
>>>>> We both tried to support autonomous selection mode in cppc_cpufreq but on
>>>>> different ways. I think the differences between these two approaches can be
>>>>> summarized into three questions:
>>>>>
>>>>> 1. Which sysfs files to expose? I think this is not a problem, we can keep
>>>>> all of them.
>>>>>
>>>>> 2. Where to expose these sysfs files? I understand your willing to keep all
>>>>> cpc registers together under acpi_cppc sysfs. But in my opinion, it is more
>>>>> suitable to expose them under cppc_cpufreq_attr, for these reasons:
>>>>>
>>>>> 1) It may probably introduce concurrency and data consistency issues, as
>>>>> I mentioned before.
>>>>>
>>>>
>>>> As explained in previous reply, this will be solved with the ifdef
>>>> check to enable the attributes for only those CPUFREQ drivers which want
>>>> to use the generic nodes.
>>>> e.g: '#ifdef CONFIG_ACPI_CPPC_CPUFREQ' for 'cppc_cpufreq'.
>>>>
>>>> These CPC register read/write sysfs nodes are generic as per the ACPI
>>>> specification and without any vendor specific logic.
>>>> So, adding them in the lib file 'cppc_acpi.c'(CONFIG_ACPI_CPPC_LIB) will
>>>> avoid code duplication if a different or new ACPI based CPUFREQ driver
>>>> also wants to use them just by adding their macro check. Such ifdef check is also used in other places for attributes creation like below.
>>>> So, don't look like a problem.
>>>> $ grep -A4 "acpi_cpufreq_attr\[" drivers/cpufreq/acpi-cpufreq.c
>>>> static struct freq_attr *acpi_cpufreq_attr[] = {
>>>> &freqdomain_cpus,
>>>> #ifdef CONFIG_X86_ACPI_CPUFREQ_CPB
>>>> &cpb,
>>>> #endif
>>>
>>> So in the future, we will see:
>>>
>>> static struct attribute *cppc_attrs[] = {
>>> ...
>>> #ifdef CONFIG_XXX
>>> &xxx.attr,
>>> &xxx.attr,
>>> #endif
>>> #ifdef CONFIG_XXX
>>> &xxx.attr,
>>> #endif
>>> #ifdef CONFIG_XXX
>>> &xxx.attr,
>>> ...
>>> };
>>>
>>> I think you are making things more complicated.
>>>
>>>>
>>>>> 2) The store functions call cpufreq_cpu_get() to get policy and update
>>>>> the driver_data which is a cppc_cpudata. Only the driver_data in
>>>>> cppc_cpufreq's policy is a cppc_cpudata! These operations are inappropriate
>>>>> in cppc_acpi. This file currently provides interfaces for cpufreq drivers
>>>>> to use. Reverse calls might mess up call relationships, break code
>>>>> structures, and cause problems that are hard to pinpoint the root cause!
>>>>>
>>>>
>>>> If we don't want to update the cpufreq policy from 'cppc_acpi.c' and only update it from within the cpufreq, then this could be one valid
>>>> point to not add the write syfs nodes in 'cppc_acpi.c' lib file.
>>>>
>>>> @Rafael, @Viresh : Do you have any comments on this?
>>>
>>> I think updating cpufreq policy from 'cppc_acpi.c' should be forbidden.
>>>
>>>>
>>>>> 3) Difficult to extend. Different cpufreq drivers may have different
>>>>> processing logic when reading from and writing to these CPC registers.
>>>>> Limiting all sysfs here makes it difficult for each cpufreq driver to
>>>>> extend. I think this is why there are only read-only interfaces under
>>>>> cppc_attrs before.
>>>>>
>>>>
>>>> We are updating the CPC registers as per the generic ACPI specification.
>>>> So, any ACPI based CPUFREQ driver can use these generic nodes to
>>>> read/write reg's until they have a vendor specific requirement or
>>>> implementation.
>>>> As explained above, If someone wants to update in different way and use
>>>> their own CPUFREQ driver then these generic attributes won't be created
>>>> due to the CPUFREQ driver macro check.
>>>> I think AMD and Intel are doing more than just reading/updating the registers. That's why they needed their driver specific implementations.
>>>>
>>>>> Adding a 'ifdef' is not a good way to solve these problems. Defining this
>>>>> config does not necessarily mean that the cpufreq driver is cppc_cpufreq.
>>>>>
>>>>
>>>> It means that only.
>>>> ./drivers/cpufreq/Makefile:obj-$(CONFIG_ACPI_CPPC_CPUFREQ) += cppc_cpufreq.o
>>>
>>> Compile this file does not mean that the cpufreq driver is cppc_cpufreq.
>>> Driver registration may fail, and the actually loaded driver may be
>>> another. It'll be dangerous to expose these sysfs files for users to update
>>> registers' value in this case.
>>>
>>>>
>>>>> 3. Is it necessary to add a new driver instance? [1] exposed the sysfs
>>>>> files to support users dynamically change the auto selection mode of each
>>>>> policy. Each policy can be operated seperately. It seems to me that if you
>>>>> want to boot all CPUs in auto mode, it should be sufficient to set all
>>>>> relevant registers to the correct values at boot time. I can't see why the
>>>>> new instance is necessary unless you explain it further. Could you explain
>>>>> more about why you add a new instance starting from answer these questions:
>>>>>
>>>>> For a specific CPU, what is the difference between using the two instances
>>>>> when auto_sel is 1? And what is the difference when auto_sel is 0?
>>>>>
>>>>
>>>> Explained this in previous reply. Let me elaborate more.
>>>>
>>>> For hundred's of CPU's, we don't need to explicitly set multiple sysfs
>>>> after boot to enable and configure Auto mode with right params. That's why an easy option is to pass boot argument or module param for enabling
>>>> and configuration.
>>>> A separate instance 'cppc_cpufreq_epp' of the 'cppc_cpufreq' driver is
>>>> added because policy min/max need to be updated to the min/max_perf
>>>> and not nominal/lowest nonlinear perf which is done by the default
>>>> init hook. Min_perf value can be lower than lowest nonlinear perf and Max_perf can be higher than nominal perf.
>>>> If some CPU is booted with epp instance and later the auto mode is disabled or min/max_perf is changed from sysfs then also the policy
>>>> min/max need to be updated accordingly.
>>>>
>>>> Another is that in Autonomous mode the freq selection and setting is
>>>> done by HW. So, cpufreq_driver->target() hook is not needed.
>>>> These are few reasons which I am aware of as of now.
>>>> I think in future there can be more. Having a separate instance
>>>> reflecting a HW based Autonomous frequency selection will make it easy
>>>> for any future changes.
>>>
>>> So CPUs will act totally differently under these two instance. But what if
>>> I want part of the CPUs in HW mode and others in SW mode? Should I boot on
>>> HW mode and set some policies' auto_set to false or the other way? It seems
>>> like the effects of theses two approaches are completely different. In my
>>> opinion, this new instance is more like a completely different driver than
>>> cppc_cpufreq.
>>>
>>>>
>>>>> If it turns out that the new instance is necessary, I think we can reach a
>>>>> common approach by adding this new cpufreq driver instance and place the
>>>>> attributes in 'cppc_cpufreq_epp_attr', like amd-pstate did.
>>>>>
>>>>> What do you think?
>>>>
>>>> I initially thought about this but there was a problem.
>>>> What if we boot with non-epp instance which doesn't have these attributes and later want to enable Auto mode for few CPU's from sysfs.
>>>
>>> That's the problem. CPUs can be set to Auto mode with or without this new
>>> instance. So what's the point of it?
>>>
>>>>
>>>>
>>>> Best Regards,
>>>> Sumit Gupta
>>>
>>>
>>>
>>
>>
>
prev parent reply other threads:[~2025-04-30 15:00 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 10:37 [Patch 0/5] Support Autonomous Selection mode in cppc_cpufreq Sumit Gupta
2025-02-11 10:37 ` [Patch 1/5] ACPI: CPPC: add read perf ctrls api and rename few existing Sumit Gupta
2025-02-12 8:03 ` kernel test robot
2025-02-12 8:25 ` kernel test robot
2025-02-11 10:37 ` [Patch 2/5] ACPI: CPPC: expand macro to create store acpi_cppc sysfs node Sumit Gupta
2025-02-11 10:37 ` [Patch 3/5] ACPI: CPPC: support updating epp, auto_sel and {min|max_perf} from sysfs Sumit Gupta
2025-02-24 10:24 ` Pierre Gondois
2025-03-14 13:11 ` Sumit Gupta
2025-02-11 10:37 ` [Patch 4/5] Documentation: ACPI: add autonomous mode ctrls info in cppc_sysfs.txt Sumit Gupta
2025-02-11 10:37 ` [Patch 5/5] cpufreq: CPPC: Add cppc_cpufreq_epp instance for Autonomous mode Sumit Gupta
2025-02-12 9:27 ` kernel test robot
2025-02-11 10:44 ` [Patch 0/5] Support Autonomous Selection mode in cppc_cpufreq Viresh Kumar
2025-02-11 12:01 ` zhenglifeng (A)
2025-02-11 14:08 ` Sumit Gupta
2025-02-12 10:52 ` zhenglifeng (A)
2025-02-14 7:08 ` Sumit Gupta
2025-02-18 19:23 ` Rafael J. Wysocki
2025-02-21 13:14 ` Sumit Gupta
2025-02-22 10:06 ` zhenglifeng (A)
2025-02-26 10:22 ` zhenglifeng (A)
2025-03-14 12:48 ` Sumit Gupta
2025-04-01 13:56 ` zhenglifeng (A)
2025-04-19 7:44 ` zhenglifeng (A)
2025-04-27 6:23 ` zhenglifeng (A)
2025-04-30 15:00 ` Sumit Gupta [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=076c199c-a081-4a7f-956c-f395f4d5e156@nvidia.com \
--to=sumitg@nvidia.com \
--cc=acpica-devel@lists.linux.dev \
--cc=bbasu@nvidia.com \
--cc=corbet@lwn.net \
--cc=jonathanh@nvidia.com \
--cc=ksitaraman@nvidia.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=robert.moore@intel.com \
--cc=sanjayc@nvidia.com \
--cc=sashal@nvidia.com \
--cc=treding@nvidia.com \
--cc=viresh.kumar@linaro.org \
--cc=vsethi@nvidia.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox