From: David Dai <davidai@google.com>
To: Pavan Kondeti <quic_pkondeti@quicinc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Rob Herring <robh+dt@kernel.org>,
Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
Conor Dooley <conor+dt@kernel.org>,
Sudeep Holla <sudeep.holla@arm.com>,
Saravana Kannan <saravanak@google.com>,
Quentin Perret <qperret@google.com>,
Masami Hiramatsu <mhiramat@google.com>,
Will Deacon <will@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Marc Zyngier <maz@kernel.org>,
Oliver Upton <oliver.upton@linux.dev>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Gupta Pankaj <pankaj.gupta@amd.com>, Mel Gorman <mgorman@suse.de>,
kernel-team@android.com, linux-pm@vger.kernel.org,
devicetree@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/2] cpufreq: add virtual-cpufreq driver
Date: Fri, 4 Aug 2023 16:46:11 -0700 [thread overview]
Message-ID: <CABN1KCKUt3GN=LqF9AK3Dc+4x98Asj-wpW4UNYsfjRz4Di8N5Q@mail.gmail.com> (raw)
In-Reply-To: <80f47262-9354-472f-8122-5ae262c0a46d@quicinc.com>
Hi Pavan,
Thanks for reviewing!
On Wed, Aug 2, 2023 at 9:18 PM Pavan Kondeti <quic_pkondeti@quicinc.com> wrote:
>
> On Mon, Jul 31, 2023 at 10:46:09AM -0700, David Dai wrote:
> > Introduce a virtualized cpufreq driver for guest kernels to improve
> > performance and power of workloads within VMs.
> >
> > This driver does two main things:
> >
> > 1. Sends the frequency of vCPUs as a hint to the host. The host uses the
> > hint to schedule the vCPU threads and decide physical CPU frequency.
> >
> > 2. If a VM does not support a virtualized FIE(like AMUs), it queries the
> > host CPU frequency by reading a MMIO region of a virtual cpufreq device
> > to update the guest's frequency scaling factor periodically. This enables
> > accurate Per-Entity Load Tracking for tasks running in the guest.
> >
> > Co-developed-by: Saravana Kannan <saravanak@google.com>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > Signed-off-by: David Dai <davidai@google.com>
>
> [...]
>
> > +static void virt_scale_freq_tick(void)
> > +{
> > + struct cpufreq_policy *policy = cpufreq_cpu_get(smp_processor_id());
> > + struct virt_cpufreq_drv_data *data = policy->driver_data;
> > + u32 max_freq = (u32)policy->cpuinfo.max_freq;
> > + u64 cur_freq;
> > + u64 scale;
> > +
> > + cpufreq_cpu_put(policy);
> > +
> > + cur_freq = (u64)data->ops->get_freq(policy);
> > + cur_freq <<= SCHED_CAPACITY_SHIFT;
> > + scale = div_u64(cur_freq, max_freq);
> > +
> > + this_cpu_write(arch_freq_scale, (unsigned long)scale);
> > +}
> > +
>
> We expect the host to provide the frequency in kHz, can you please add a
> comment about it. It is not very obvious when you look at the
> REG_CUR_FREQ_OFFSET register name.
I’ll include a KHZ in the offset names.
>
> > +static struct scale_freq_data virt_sfd = {
> > + .source = SCALE_FREQ_SOURCE_VIRT,
> > + .set_freq_scale = virt_scale_freq_tick,
> > +};
> > +
> > +static unsigned int virt_cpufreq_set_perf(struct cpufreq_policy *policy)
> > +{
> > + struct virt_cpufreq_drv_data *data = policy->driver_data;
> > + /*
> > + * Use cached frequency to avoid rounding to freq table entries
> > + * and undo 25% frequency boost applied by schedutil.
> > + */
> > + u32 freq = mult_frac(policy->cached_target_freq, 80, 100);
> > +
> > + data->ops->set_freq(policy, freq);
> > + return 0;
> > +}
>
> Why do we undo the frequency boost? A governor may apply other boosts
> like RT (uclamp), iowait. It is not clear why we need to worry about
> governor policies here.
See Saravana’s response to Quentin for more details, but in short,
we’ll remove this particular snippet in the driver.
>
> > +
> > +static unsigned int virt_cpufreq_fast_switch(struct cpufreq_policy *policy,
> > + unsigned int target_freq)
> > +{
> > + virt_cpufreq_set_perf(policy);
> > + return target_freq;
> > +}
> > +
> > +static int virt_cpufreq_target_index(struct cpufreq_policy *policy,
> > + unsigned int index)
> > +{
> > + return virt_cpufreq_set_perf(policy);
> > +}
> > +
> > +static int virt_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > +{
> > + struct virt_cpufreq_drv_data *drv_data = cpufreq_get_driver_data();
> > + struct cpufreq_frequency_table *table;
> > + struct device *cpu_dev;
> > + int ret;
> > +
> > + cpu_dev = get_cpu_device(policy->cpu);
> > + if (!cpu_dev)
> > + return -ENODEV;
> > +
> > + ret = dev_pm_opp_of_add_table(cpu_dev);
> > + if (ret)
> > + return ret;
> > +
> > + ret = dev_pm_opp_get_opp_count(cpu_dev);
> > + if (ret <= 0) {
> > + dev_err(cpu_dev, "OPP table can't be empty\n");
> > + return -ENODEV;
> > + }
> > +
> > + ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &table);
> > + if (ret) {
> > + dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
> > + return ret;
> > + }
> > +
> > + policy->freq_table = table;
> > + policy->dvfs_possible_from_any_cpu = false;
> > + policy->fast_switch_possible = true;
> > + policy->driver_data = drv_data;
> > +
> > + /*
> > + * Only takes effect if another FIE source such as AMUs
> > + * have not been registered.
> > + */
> > + topology_set_scale_freq_source(&virt_sfd, policy->cpus);
> > +
> > + return 0;
> > +
> > +}
> > +
>
> Do we need to register as FIE source even with the below commit? By
> registering as a source, we are not supplying any accurate metric. We
> still fallback on the same source that cpufreq implements it.
The arch_set_freq_scale() done at cpufreq driver’s frequency updates
at cpufreq_freq_transition_end() and cpufreq_driver_fast_switch() only
represent the guest’s frequency request. However, this does not
accurately represent the physical CPU’s frequency that the vCPU is
running on. E.g. There may be other processes sharing the same
physical CPU that results in a much higher CPU frequency than what’s
requested by the vCPU.
Thanks,
David
>
> 874f63531064 ("cpufreq: report whether cpufreq supports Frequency
> Invariance (FI)")
>
> Thanks,
> Pavan
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
next prev parent reply other threads:[~2023-08-04 23:46 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-31 17:46 [PATCH v3 0/2] Improve VM CPUfreq and task placement behavior David Dai
2023-07-31 17:46 ` [PATCH v3 1/2] dt-bindings: cpufreq: add bindings for virtual cpufreq David Dai
2023-07-31 18:12 ` Rob Herring
2023-08-05 19:38 ` Krzysztof Kozlowski
2023-08-08 23:31 ` Saravana Kannan
2023-08-09 6:28 ` Krzysztof Kozlowski
2023-07-31 17:46 ` [PATCH v3 2/2] cpufreq: add virtual-cpufreq driver David Dai
2023-07-31 22:02 ` Randy Dunlap
2023-07-31 23:46 ` David Dai
2023-08-01 9:36 ` Viresh Kumar
2023-08-02 22:16 ` Saravana Kannan
2023-08-03 5:51 ` Viresh Kumar
2023-08-04 22:24 ` Saravana Kannan
2023-08-03 16:50 ` David Dai
2023-08-04 4:42 ` Viresh Kumar
2023-08-01 9:45 ` Quentin Perret
2023-08-01 9:49 ` Quentin Perret
2023-08-04 22:23 ` Saravana Kannan
2023-08-03 4:18 ` Pavan Kondeti
2023-08-04 23:46 ` David Dai [this message]
2023-08-07 3:22 ` Pavan Kondeti
2023-08-24 23:55 ` David Dai
2023-08-12 2:55 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABN1KCKUt3GN=LqF9AK3Dc+4x98Asj-wpW4UNYsfjRz4Di8N5Q@mail.gmail.com' \
--to=davidai@google.com \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=kernel-team@android.com \
--cc=krzysztof.kozlowski+dt@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mgorman@suse.de \
--cc=mhiramat@google.com \
--cc=oliver.upton@linux.dev \
--cc=pankaj.gupta@amd.com \
--cc=peterz@infradead.org \
--cc=qperret@google.com \
--cc=quic_pkondeti@quicinc.com \
--cc=rafael@kernel.org \
--cc=robh+dt@kernel.org \
--cc=saravanak@google.com \
--cc=sudeep.holla@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).