From: Juri Lelli <juri.lelli@redhat.com>
To: Quentin Perret <quentin.perret@arm.com>
Cc: peterz@infradead.org, rjw@rjwysocki.net,
gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
linux-pm@vger.kernel.org, mingo@redhat.com,
dietmar.eggemann@arm.com, morten.rasmussen@arm.com,
chris.redpath@arm.com, patrick.bellasi@arm.com,
valentin.schneider@arm.com, vincent.guittot@linaro.org,
thara.gopinath@linaro.org, viresh.kumar@linaro.org,
tkjos@google.com, joelaf@google.com, smuckle@google.com,
adharmap@quicinc.com, skannan@quicinc.com,
pkondeti@codeaurora.org, edubezval@gmail.com,
srinivas.pandruvada@linux.intel.com, currojerez@riseup.net,
javi.merino@kernel.org
Subject: Re: [RFC PATCH v3 03/10] PM: Introduce an Energy Model management framework
Date: Thu, 7 Jun 2018 16:44:09 +0200 [thread overview]
Message-ID: <20180607144409.GB3311@localhost.localdomain> (raw)
In-Reply-To: <20180521142505.6522-4-quentin.perret@arm.com>
On 21/05/18 15:24, Quentin Perret wrote:
> Several subsystems in the kernel (scheduler and/or thermal at the time
> of writing) can benefit from knowing about the energy consumed by CPUs.
> Yet, this information can come from different sources (DT or firmware for
> example), in different formats, hence making it hard to exploit without
> a standard API.
>
> This patch attempts to solve this issue by introducing a centralized
> Energy Model (EM) framework which can be used to interface the data
> providers with the client subsystems. This framework standardizes the
> API to expose power costs, and to access them from multiple locations.
>
> The current design assumes that all CPUs in a frequency domain share the
> same micro-architecture. As such, the EM data is structured in a
> per-frequency-domain fashion. Drivers aware of frequency domains
> (typically, but not limited to, CPUFreq drivers) are expected to register
> data in the EM framework using the em_register_freq_domain() API. To do
> so, the drivers must provide a callback function that will be called by
> the EM framework to populate the tables. As of today, only the active
> power of the CPUs is considered. For each frequency domain, the EM
> includes a list of <frequency, power, capacity> tuples for the capacity
> states of the domain alongside a cpumask covering the involved CPUs.
>
> The EM framework also provides an API to re-scale the capacity values
> of the model asynchronously, after it has been created. This is required
> for architectures where the capacity scale factor of CPUs can change at
> run-time. This is the case for Arm/Arm64 for example where the
> arch_topology driver recomputes the capacity scale factors of the CPUs
> after the maximum frequency of all CPUs has been discovered. Although
> complex, the process of creating and re-scaling the EM has to be kept in
> two separate steps to fulfill the needs of the different users. The thermal
> subsystem doesn't use the capacity values and shouldn't have dependencies
> on subsystems providing them. On the other hand, the task scheduler needs
> the capacity values, and it will benefit from seeing them up-to-date when
> applicable.
>
> Because of this need for asynchronous update, the capacity state table
> of each frequency domain is protected by RCU, hence guaranteeing a safe
> modification of the table and a fast access to readers in latency-sensitive
> code paths.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> ---
OK, I think I'll start with a few comments while I get more into
understanding the set. :)
> +static void fd_update_cs_table(struct em_cs_table *cs_table, int cpu)
> +{
> + unsigned long cmax = arch_scale_cpu_capacity(NULL, cpu);
> + int max_cap_state = cs_table->nr_cap_states - 1;
^
You don't need this on the stack, right?
> + unsigned long fmax = cs_table->state[max_cap_state].frequency;
> + int i;
> +
> + for (i = 0; i < cs_table->nr_cap_states; i++)
> + cs_table->state[i].capacity = cmax *
> + cs_table->state[i].frequency / fmax;
> +}
> +
> +static struct em_freq_domain *em_create_fd(cpumask_t *span, int nr_states,
> + struct em_data_callback *cb)
> +{
> + unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
> + int i, ret, cpu = cpumask_first(span);
> + struct em_freq_domain *fd;
> + unsigned long power, freq;
> +
> + if (!cb->active_power)
> + return NULL;
> +
> + fd = kzalloc(sizeof(*fd), GFP_KERNEL);
> + if (!fd)
> + return NULL;
> +
> + fd->cs_table = alloc_cs_table(nr_states);
Mmm, don't you need to rcu_assign_pointer this first one as well?
> + if (!fd->cs_table)
> + goto free_fd;
> +
> + /* Copy the span of the frequency domain */
> + cpumask_copy(&fd->cpus, span);
> +
> + /* Build the list of capacity states for this freq domain */
> + for (i = 0, freq = 0; i < nr_states; i++, freq++) {
^ ^
The fact that this relies on active_power() to use ceil OPP for a given
freq might deserve a comment. Also, is this behaviour of active_power()
standardized?
> + ret = cb->active_power(&power, &freq, cpu);
> + if (ret)
> + goto free_cs_table;
> +
> + fd->cs_table->state[i].power = power;
> + fd->cs_table->state[i].frequency = freq;
> +
> + /*
> + * The hertz/watts efficiency ratio should decrease as the
> + * frequency grows on sane platforms. If not, warn the user
> + * that some high OPPs are more power efficient than some
> + * of the lower ones.
> + */
> + opp_eff = freq / power;
> + if (opp_eff >= prev_opp_eff)
> + pr_warn("%*pbl: hz/watt efficiency: OPP %d >= OPP%d\n",
> + cpumask_pr_args(span), i, i - 1);
> + prev_opp_eff = opp_eff;
> + }
> + fd_update_cs_table(fd->cs_table, cpu);
> +
> + return fd;
> +
> +free_cs_table:
> + free_cs_table(fd->cs_table);
> +free_fd:
> + kfree(fd);
> +
> + return NULL;
> +}
> +
> +static void rcu_free_cs_table(struct rcu_head *rp)
> +{
> + struct em_cs_table *table;
> +
> + table = container_of(rp, struct em_cs_table, rcu);
> + free_cs_table(table);
> +}
> +
> +/**
> + * em_rescale_cpu_capacity() - Re-scale capacity values of the Energy Model
> + *
> + * This re-scales the capacity values for all capacity states of all frequency
> + * domains of the Energy Model. This should be used when the capacity values
> + * of the CPUs are updated at run-time, after the EM was registered.
> + */
> +void em_rescale_cpu_capacity(void)
So, is this thought to be called eventually also after thermal capping
events and such?
> +{
> + struct em_cs_table *old_table, *new_table;
> + struct em_freq_domain *fd;
> + unsigned long flags;
> + int nr_states, cpu;
> +
> + read_lock_irqsave(&em_data_lock, flags);
Don't you need write_lock_ here, since you are going to exchange the
em tables?
> + for_each_cpu(cpu, cpu_possible_mask) {
> + fd = per_cpu(em_data, cpu);
> + if (!fd || cpu != cpumask_first(&fd->cpus))
> + continue;
> +
> + /* Copy the existing table. */
> + old_table = rcu_dereference(fd->cs_table);
> + nr_states = old_table->nr_cap_states;
> + new_table = alloc_cs_table(nr_states);
> + if (!new_table) {
> + read_unlock_irqrestore(&em_data_lock, flags);
> + return;
> + }
> + memcpy(new_table->state, old_table->state,
> + nr_states * sizeof(*new_table->state));
> +
> + /* Re-scale the capacity values on the copy. */
> + fd_update_cs_table(new_table, cpumask_first(&fd->cpus));
> +
> + /* Replace the table with the rescaled version. */
> + rcu_assign_pointer(fd->cs_table, new_table);
> + call_rcu(&old_table->rcu, rcu_free_cs_table);
> + }
> + read_unlock_irqrestore(&em_data_lock, flags);
> + pr_debug("Re-scaled CPU capacities\n");
> +}
> +EXPORT_SYMBOL_GPL(em_rescale_cpu_capacity);
> +
> +/**
> + * em_cpu_get() - Return the frequency domain for a CPU
> + * @cpu : CPU to find the frequency domain for
> + *
> + * Return: the frequency domain to which 'cpu' belongs, or NULL if it doesn't
> + * exist.
> + */
> +struct em_freq_domain *em_cpu_get(int cpu)
> +{
> + struct em_freq_domain *fd;
> + unsigned long flags;
> +
> + read_lock_irqsave(&em_data_lock, flags);
> + fd = per_cpu(em_data, cpu);
> + read_unlock_irqrestore(&em_data_lock, flags);
> +
> + return fd;
> +}
> +EXPORT_SYMBOL_GPL(em_cpu_get);
Mmm, this gets complicated pretty fast eh? :)
I had to go back and forth between patches to start understanding the
different data structures and how they are use, and I'm not sure yet
I've got the full picture. I guess some nice diagram (cover letter or
documentation patch) would help a lot.
Locking of such data structures is pretty involved as well, adding
comments/docs shouldn't harm. :)
Best,
- Juri
next prev parent reply other threads:[~2018-06-07 14:44 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-21 14:24 [RFC PATCH v3 00/10] Energy Aware Scheduling Quentin Perret
2018-05-21 14:24 ` [RFC PATCH v3 01/10] sched: Relocate arch_scale_cpu_capacity Quentin Perret
2018-05-21 14:24 ` [RFC PATCH v3 02/10] sched/cpufreq: Factor out utilization to frequency mapping Quentin Perret
2018-05-21 14:24 ` [RFC PATCH v3 03/10] PM: Introduce an Energy Model management framework Quentin Perret
2018-06-06 13:12 ` Dietmar Eggemann
2018-06-06 14:37 ` Quentin Perret
2018-06-06 15:20 ` Juri Lelli
2018-06-06 15:29 ` Quentin Perret
2018-06-06 16:26 ` Quentin Perret
2018-06-07 15:58 ` Dietmar Eggemann
2018-06-08 13:39 ` Javi Merino
2018-06-08 15:47 ` Quentin Perret
2018-06-09 8:24 ` Javi Merino
2018-06-06 16:47 ` Juri Lelli
2018-06-06 16:59 ` Quentin Perret
2018-06-07 14:44 ` Juri Lelli [this message]
2018-06-07 15:19 ` Quentin Perret
2018-06-07 15:55 ` Dietmar Eggemann
2018-06-08 8:25 ` Quentin Perret
2018-06-08 9:36 ` Juri Lelli
2018-06-08 10:31 ` Quentin Perret
2018-06-08 12:39 ` Dietmar Eggemann
2018-06-08 13:11 ` Quentin Perret
2018-06-08 16:39 ` Dietmar Eggemann
2018-06-08 17:02 ` Quentin Perret
2018-06-07 16:04 ` Juri Lelli
2018-06-07 17:31 ` Quentin Perret
2018-06-09 8:13 ` Javi Merino
2018-06-19 11:07 ` Peter Zijlstra
2018-06-19 12:35 ` Quentin Perret
2018-06-19 11:31 ` Peter Zijlstra
2018-06-19 12:40 ` Quentin Perret
2018-06-19 11:34 ` Peter Zijlstra
2018-06-19 12:58 ` Quentin Perret
2018-06-19 13:23 ` Peter Zijlstra
2018-06-19 13:38 ` Quentin Perret
2018-06-19 14:16 ` Peter Zijlstra
2018-06-19 14:21 ` Peter Zijlstra
2018-06-19 14:30 ` Peter Zijlstra
2018-06-19 14:23 ` Quentin Perret
2018-05-21 14:24 ` [RFC PATCH v3 04/10] PM / EM: Expose the Energy Model in sysfs Quentin Perret
2018-06-19 12:16 ` Peter Zijlstra
2018-06-19 13:06 ` Quentin Perret
2018-05-21 14:25 ` [RFC PATCH v3 05/10] sched/topology: Reference the Energy Model of CPUs when available Quentin Perret
2018-06-07 14:44 ` Juri Lelli
2018-06-07 16:02 ` Quentin Perret
2018-06-07 16:29 ` Juri Lelli
2018-06-07 17:26 ` Quentin Perret
2018-06-19 12:26 ` Peter Zijlstra
2018-06-19 13:24 ` Quentin Perret
2018-06-19 16:20 ` Peter Zijlstra
2018-06-19 17:13 ` Quentin Perret
2018-06-19 18:42 ` Peter Zijlstra
2018-06-20 7:58 ` Quentin Perret
2018-05-21 14:25 ` [RFC PATCH v3 06/10] sched: Add over-utilization/tipping point indicator Quentin Perret
2018-06-19 7:01 ` Pavan Kondeti
2018-06-19 10:26 ` Dietmar Eggemann
2018-05-21 14:25 ` [RFC PATCH v3 07/10] sched/fair: Introduce an energy estimation helper function Quentin Perret
2018-06-08 10:30 ` Juri Lelli
2018-06-19 9:51 ` Pavan Kondeti
2018-06-19 9:53 ` Quentin Perret
2018-05-21 14:25 ` [RFC PATCH v3 08/10] sched: Lowest energy aware balancing sched_domain level pointer Quentin Perret
2018-05-21 14:25 ` [RFC PATCH v3 09/10] sched/fair: Select an energy-efficient CPU on task wake-up Quentin Perret
2018-06-08 10:24 ` Juri Lelli
2018-06-08 11:19 ` Quentin Perret
2018-06-08 11:59 ` Juri Lelli
2018-06-08 16:26 ` Quentin Perret
2018-06-19 5:06 ` Pavan Kondeti
2018-06-19 7:57 ` Quentin Perret
2018-06-19 8:41 ` Pavan Kondeti
2018-05-21 14:25 ` [RFC PATCH v3 10/10] arch_topology: Start Energy Aware Scheduling Quentin Perret
2018-06-19 9:18 ` Pavan Kondeti
2018-06-19 9:40 ` Quentin Perret
2018-06-19 9:47 ` Juri Lelli
2018-06-19 10:02 ` Quentin Perret
2018-06-19 10:19 ` Juri Lelli
2018-06-19 10:25 ` Quentin Perret
2018-06-19 10:31 ` Juri Lelli
2018-06-19 10:49 ` Quentin Perret
2018-06-01 9:29 ` [RFC PATCH v3 00/10] " Quentin Perret
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180607144409.GB3311@localhost.localdomain \
--to=juri.lelli@redhat.com \
--cc=adharmap@quicinc.com \
--cc=chris.redpath@arm.com \
--cc=currojerez@riseup.net \
--cc=dietmar.eggemann@arm.com \
--cc=edubezval@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=javi.merino@kernel.org \
--cc=joelaf@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=patrick.bellasi@arm.com \
--cc=peterz@infradead.org \
--cc=pkondeti@codeaurora.org \
--cc=quentin.perret@arm.com \
--cc=rjw@rjwysocki.net \
--cc=skannan@quicinc.com \
--cc=smuckle@google.com \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=thara.gopinath@linaro.org \
--cc=tkjos@google.com \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).