From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
Linux PM <linux-pm@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Lukasz Luba <lukasz.luba@arm.com>,
Peter Zijlstra <peterz@infradead.org>,
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
Pierre Gondois <pierre.gondois@arm.com>
Subject: Re: [RFC][PATCH v021 5/9] PM: EM: Introduce em_dev_expand_perf_domain()
Date: Mon, 6 Jan 2025 13:59:59 +0100 [thread overview]
Message-ID: <8f26fac2-787d-44a9-a0cd-c3035a91149c@arm.com> (raw)
In-Reply-To: <CAJZ5v0jus4bzeZhUK4WC7uypQkh-_MMuU1M54figsGV3+5OhUg@mail.gmail.com>
On 17/12/2024 21:40, Rafael J. Wysocki wrote:
> On Tue, Dec 17, 2024 at 10:38 AM Dietmar Eggemann
> <dietmar.eggemann@arm.com> wrote:
>>
>> On 29/11/2024 17:02, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Introduce a helper function for adding a CPU to an existing EM perf
>>> domain.
>>>
>>> Subsequently, this will be used by the intel_pstate driver to add new
>>> CPUs to existing perf domains when those CPUs go online for the first
>>> time after the initialization of the driver.
>>>
>>> No intentional functional impact.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> ---
>>>
>>> v0.1 -> v0.2: No changes
>>
>> Could you add information why this new EM interface is needed?
>
> There is some of it in the changelog already.
>
> In fact, it is only needed in a corner case when the system starts
> with some CPUs offline and they only go online later (as a result of
> an explicit request from user space). That is the only case when a
> new CPU may need to be added to an existing perf domain.
OK, I see. Arm doesn't need this since we derive the masks from the
CPUfreq policies so far.
I just verified, we both keep hotplugged-out CPU within the PD. That's
why we mask the PD cpus with cpu_online_mask in:
find_energy_efficient_cpu()
...
for (; pd; pd = pd->next)
cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
>> IIRC, you can't use the existing way (cpufreq_driver::register_em) since
>> it gets called to early (3) for the PD cpumasks to be ready. This issue
>> will be there for any system in which uarch domains are not congruent
>> with clock domains which we hadn't have to deal with Arm's heterogeneous
>> CPUs so far.
>
> Basically, yes.
>
>> __init intel_pstate_init()
>>
>> intel_pstate_register_driver()
>>
>> cpufreq_register_driver()
>>
>> subsys_interface_register()
>>
>> sif->add_dev() -> cpufreq_add_dev()
>>
>> cpufreq_online()
>>
>> if (!new_policy && cpufreq_driver->online)
>>
>> else
>>
>> cpufreq_driver->init() -> intel_pstate_cpu_init()
>>
>> __intel_pstate_cpu_init()
>>
>> intel_pstate_init_cpu()
>>
>> intel_pstate_get_cpu_pstates()
>>
>> hybrid_add_to_domain()
>>
>> em_dev_expand_perf_domain() <-- (1)
>>
>> intel_pstate_init_acpi_perf_limits()
>>
>> intel_pstate_set_itmt_prio() <-- (2)
>>
>> if (new_policy)
>>
>> cpufreq_driver->register_em() <-- (3)
>>
>> hybrid_init_cpu_capacity_scaling()
>>
>> hybrid_refresh_cpu_capacity_scaling()
>>
>> __hybrid_refresh_cpu_capacity_scaling() <-- (4)
>>
>> hybrid_register_all_perf_domains()
>>
>> hybrid_register_perf_domain()
>>
>> em_dev_register_perf_domain() <-- (5)
>>
>> /* Enable EAS */
>> sched_clear_itmt_support() <-- (6)
>>
>> Debugging this on a 'nosmt' i7-13700K (online mask =
>> [0,2,4,6,8,10,12,14,16-23]
>>
>> (1) Add CPU to existing hybrid PD or create new hybrid PD.
>
> Not exactly.
>
> (1) is just to expand an existing perf domain if the CPU is new (was
> offline all the time previously).
OK.
>
> Likewise, the direct hybrid_register_perf_domain() call in
> hybrid_add_to_domain() is to add a new perf domain if the given CPU is
> new (was offline all the time previously) and is the first one of the
> given type (say, the system is starting with all E-cores offline).
> It won't succeed before (4).
>
> For CPUs that are online to start with, hybrid_add_to_domain() assigns
> them to hybrid domains and PDs are created for them in
> hybrid_register_all_perf_domains().
Understood.
>
>> (2) Triggers sched domain rebuild (+ enabling EAS) already here during
>> startup ?
>
> This is just to enable ITMT which is the default mechanism for Intel
> hybrid platforms. It also requires a rebuild of sched domains to be
> enabled.
>
>> IMHO, reason is that max_highest_perf > min_highest_perf because of
>> different itmt prio
>
> Yes (which means that the platform is at least not homogenous).
>
> This really has been introduced for the handling of favored cores on
> otherwise non-hybrid platforms (say Tiger Lake).
>
>> Happens for CPU8 on my machine (after CPU8 is added to hybrid PD
>> 0,2,4,6,8) (itmt prio for CPU8=69 (1024) instead of 68 (1009)).
>> So it looks like EAS is enabled before (6) ?
>
> No, it is ITMT because CPU8 is a favored core.
>
>> (3) ARM's way to do (5)
>> (4) Setting hybrid_max_perf_cpu
>> (5) Register EM here
>> (6) Actual call to initially triggers sched domain rebuild (+ enable
>> EAS) (done already in (2) on my machine)
>
> This is the second rebuild of sched domains to turn off ITMT and
> enable EAS. The previous one is to enable ITMT.
>
> The earlier enabling of ITMT could be avoided I think, but that would
> be a complication on platforms that contain favored cores but
> otherwise are not hybrid.
OK, the fact that EAS will already be enabled in (2) is not an issue IMHO.
>
>> So (3) is not possible for Intel hybrid since the policy's cpumask(s)
>
> It is possible in principle, but not particularly convenient because
> at that point it is not clear whether or not the platform is really
> hybrid and SMT is off and so whether or not EAS is to be used.
>
>> contain only one CPUs, i.e. CPUs are not sharing clock.
>> And those cpumasks have to be build under (1) to be used in (5)?
>
> They are built by the caller of (1) to be precise, but yes.
OK.
I still see an issue in putting all performance CPUs in one PD.
find_energy_efficient_cpu() assumes that all PD CPUs has the same CPU
capacity value.
find_energy_efficient_cpu()
...
for (; pd; pd = pd->next) {
...
/* Account external pressure for the energy estimation */
cpu = cpumask_first(cpus);
cpu_actual_cap = get_actual_cpu_capacity(cpu); --> (1)
Even though X86 does not implement hw_load_avg() or
cpufreq_get_pressure() we would still assume that all PD CPUs have the
same cpu_capacity:
get_actual_cpu_capacity() <-- (1)
capacity = arch_scale_cpu_capacity(cpu)
Now the error introduced is small (1024 versus 1009) on my i7-13700K but
it's there. How big can those diffs based om itmt prio be?
next prev parent reply other threads:[~2025-01-06 13:00 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-29 15:55 [RFC][PATCH v021 0/9] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT Rafael J. Wysocki
2024-11-29 15:56 ` [RFC][PATCH v021 1/9] cpufreq: intel_pstate: Use CPPC to get scaling factors Rafael J. Wysocki
2024-11-29 15:56 ` [RFC][PATCH v021 2/9] cpufreq: intel_pstate: Drop Arrow Lake from "scaling factor" list Rafael J. Wysocki
2024-11-29 15:59 ` [RFC][PATCH v021 3/9] PM: EM: Move perf rebuilding function from schedutil to EM Rafael J. Wysocki
2024-12-11 10:32 ` Christian Loehle
2024-11-29 16:00 ` [RFC][PATCH v021 4/9] sched/topology: Adjust cpufreq checks for EAS Rafael J. Wysocki
2024-12-11 10:33 ` Christian Loehle
2024-12-11 11:29 ` Rafael J. Wysocki
2024-12-11 11:43 ` Christian Loehle
2024-12-11 11:59 ` Rafael J. Wysocki
2024-12-11 13:25 ` Vincent Guittot
2024-12-11 16:37 ` Rafael J. Wysocki
2024-12-11 17:07 ` Vincent Guittot
2024-12-11 17:55 ` Rafael J. Wysocki
2024-12-12 8:34 ` Vincent Guittot
2024-12-16 14:49 ` Dietmar Eggemann
2024-12-16 14:58 ` Rafael J. Wysocki
2024-11-29 16:02 ` [RFC][PATCH v021 5/9] PM: EM: Introduce em_dev_expand_perf_domain() Rafael J. Wysocki
2024-12-17 9:38 ` Dietmar Eggemann
2024-12-17 20:40 ` Rafael J. Wysocki
2025-01-06 12:59 ` Dietmar Eggemann [this message]
2024-11-29 16:06 ` [RFC][PATCH v021 6/9] PM: EM: Call em_compute_costs() from em_create_perf_table() Rafael J. Wysocki
2024-11-29 16:15 ` [RFC][PATCH v021 7/9] PM: EM: Register perf domains with ho :active_power() callbacks Rafael J. Wysocki
2024-12-16 10:59 ` Dietmar Eggemann
2024-12-16 11:58 ` Rafael J. Wysocki
2024-11-29 16:21 ` [RFC][PATCH v021 8/9] cpufreq: intel_pstate: Introduce hybrid domains Rafael J. Wysocki
2024-12-12 17:04 ` Christian Loehle
2024-11-29 16:28 ` [RFC][PATCH v021 9/9] cpufreq: intel_pstate: Add basic EAS support on hybrid platforms Rafael J. Wysocki
2025-01-25 11:18 ` [RFC][PATCH v021 0/9] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT Dietmar Eggemann
2025-01-27 13:57 ` Rafael J. Wysocki
2025-02-01 12:43 ` Christian Loehle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f26fac2-787d-44a9-a0cd-c3035a91149c@arm.com \
--to=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=pierre.gondois@arm.com \
--cc=rafael@kernel.org \
--cc=ricardo.neri-calderon@linux.intel.com \
--cc=rjw@rjwysocki.net \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox