From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: Julia Lawall <julia.lawall@inria.fr>,
Francisco Jerez <currojerez@riseup.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Linux PM <linux-pm@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range
Date: Thu, 06 Jan 2022 12:28:26 -0800 [thread overview]
Message-ID: <1b2be990d5c31f62d9ce33aa2eb2530708d5607a.camel@linux.intel.com> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2201062044340.3098@hadrien>
On Thu, 2022-01-06 at 20:49 +0100, Julia Lawall wrote:
>
> On Wed, 5 Jan 2022, Francisco Jerez wrote:
>
> > Julia Lawall <julia.lawall@inria.fr> writes:
> >
> > > On Tue, 4 Jan 2022, Rafael J. Wysocki wrote:
> > >
> > > > On Tue, Jan 4, 2022 at 4:49 PM Julia Lawall <
> > > > julia.lawall@inria.fr> wrote:
> > > > > I tried the whole experiment again on an Intel w2155 (one
> > > > > socket, 10
> > > > > physical cores, pstates 12, 33, and 45).
> > > > >
> > > > > For the CPU there is a small jump a between 32 and 33 - less
> > > > > than for the
> > > > > 6130.
> > > > >
> > > > > For the RAM, there is a big jump between 21 and 22.
> > > > >
> > > > > Combining them leaves a big jump between 21 and 22.
> > > >
> > > > These jumps are most likely related to voltage increases.
> > > >
> > > > > It seems that the definition of efficient is that there is no
> > > > > more cost
> > > > > for the computation than the cost of simply having the
> > > > > machine doing any
> > > > > computation at all. It doesn't take into account the time
> > > > > and energy
> > > > > required to do some actual amount of work.
> > > >
> > > > Well, that's not what I wanted to say.
> > >
> > > I was referring to Francisco's comment that the lowest indicated
> > > frequency
> > > should be the most efficient one. Turbostat also reports the
> > > lowest
> > > frequency as the most efficient one. In my graph, there are the
> > > pstates 7
> > > and 10, which give exactly the same energy consumption as 12. 7
> > > and 10
> > > are certainly less efficient, because the energy consumption is
> > > the same,
> > > but the execution speed is lower.
> > >
> > > > Of course, the configuration that requires less energy to be
> > > > spent to
> > > > do a given amount of work is more energy-efficient. To measure
> > > > this,
> > > > the system needs to be given exactly the same amount of work
> > > > for each
> > > > run and the energy spent by it during each run needs to be
> > > > compared.
> >
> > I disagree that the system needs to be given the exact same amount
> > of
> > work in order to measure differences in energy efficiency. The
> > average
> > energy efficiency of Julia's 10s workloads can be calculated easily
> > in
> > both cases (e.g. as the W/E ratio below, W will just be a different
> > value for each run), and the result will likely approximate the
> > instantaneous energy efficiency of the fixed P-states we're
> > comparing,
> > since her workload seems to be fairly close to a steady state.
> >
> > > This is bascially my point of view, but there is a question about
> > > it. If
> > > over 10 seconds you consume 10J and by running twice as fast you
> > > would
> > > consume only 6J, then how do you account for the nest 5
> > > seconds? If the
> > > machine is then idle for the next 5 seconds, maybe you would end
> > > up
> > > consuming 8J in total over the 10 seconds. But if you take
> > > advantage of
> > > the free 5 seconds to pack in another job, then you end up
> > > consuming 12J.
> > >
> >
> > Geometrically, such an oscillatory workload with periods of idling
> > and
> > periods of activity would give an average power consumption along
> > the
> > line that passes through the points corresponding to both states on
> > the
> > CPU's power curve -- IOW your average power consumption will just
> > be the
> > weighted average of the power consumption of each state (with the
> > duty
> > cycle t_i/t_total of each state being its weight):
> >
> > P_avg = t_0/t_total * P_0 + t_1/t_total * P_1
> >
> > Your energy usage would just be 10s times that P_avg, since you're
> > assuming that the total runtime of the workload is fixed at 10s
> > independent of how long the CPU actually takes to complete the
> > computation. In cases where the P-state during the period of
> > activity
> > t_1 is equal or lower to the maximum efficiency P-state, that line
> > segment is guaranteed to lie below the power curve, indicating that
> > such
> > oscillation is more efficient than running the workload fixed to
> > its
> > average P-state.
> >
> > That said, this scenario doesn't really seem very relevant to your
> > case,
> > since the last workload you've provided turbostat traces for seems
> > to
> > show almost no oscillation. If there was such an oscillation, your
> > total energy usage would still be greater for oscillations between
> > idle
> > and some P-state different from the most efficient one. Such an
> > oscillation doesn't explain the anomaly we're seeing on your
> > traces,
> > which show more energy-efficient instantaneous behavior for a P-
> > state 2x
> > the one reported by your processor as the most energy-efficient.
>
> All the turbostat output and graphs I have sent recently were just
> for
> continuous spinning:
>
> for(;;);
>
> Now I am trying running for the percentage of the time corresponding
> to
> 10 / P for pstate P (ie 0.5 of the time for pstate 20), and then
> sleeping,
> to see whether one can just add the sleeping power consumption of the
> machine to compute the efficiency as Rafael suggested.
>
Before doing comparison try freezing uncore.
wrmsr -a 0x620 0x0808
to Freeze uncore at 800MHz. Any other value is fine.
Thanks,
Srinivas
> julia
>
> > > > However, I think that you are interested in answering a
> > > > different
> > > > question: Given a specific amount of time (say T) to run the
> > > > workload,
> > > > what frequency to run the CPUs doing the work at in order to
> > > > get the
> > > > maximum amount of work done per unit of energy spent by the
> > > > system (as
> > > > a whole)? Or, given 2 different frequency levels, which of
> > > > them to
> > > > run the CPUs at to get more work done per energy unit?
> > >
> > > This is the approach where you assume that the machine will be
> > > idle in any
> > > leftover time. And it accounts for the energy consumed in that
> > > idle time.
> > >
> > > > The work / energy ratio can be estimated as
> > > >
> > > > W / E = C * f / P(f)
> > > >
> > > > where C is a constant and P(f) is the power drawn by the whole
> > > > system
> > > > while the CPUs doing the work are running at frequency f, and
> > > > of
> > > > course for the system discussed previously it is greater in the
> > > > 2 GHz
> > > > case.
> > > >
> > > > However P(f) can be divided into two parts, P_1(f) that really
> > > > depends
> > > > on the frequency and P_0 that does not depend on it. If P_0 is
> > > > large
> > > > enough to dominate P(f), which is the case in the 10-20 range
> > > > of
> > > > P-states on the system in question, it is better to run the
> > > > CPUs doing
> > > > the work faster (as long as there is always enough work to do
> > > > for
> > > > them; see below). This doesn't mean that P(f) is not a convex
> > > > function of f, though.
> > > >
> > > > Moreover, this assumes that there will always be enough work
> > > > for the
> > > > system to do when running the busy CPUs at 2 GHz, or that it
> > > > can go
> > > > completely idle when it doesn't do any work, but let's see what
> > > > happens if the amount of work to do is W_1 = C * 1 GHz * T and
> > > > the
> > > > system cannot go completely idle when the work is done.
> > > >
> > > > Then, nothing changes for the busy CPUs running at 1 GHz, but
> > > > in the 2
> > > > GHz case we get W = W_1 and E = P(2 GHz) * T/2 + P_0 * T/2,
> > > > because
> > > > the busy CPUs are only busy 1/2 of the time, but power P_0 is
> > > > drawn by
> > > > the system regardless. Hence, in the 2 GHz case (assuming P(2
> > > > GHz) =
> > > > 120 W and P_0 = 90 W), we get
> > > >
> > > > W / E = 2 * C * 1 GHz / (P(2 GHz) + P_0) = 0.0095 * C * 1 GHz
> > > >
> > > > which is slightly less than the W / E ratio at 1 GHz
> > > > approximately
> > > > equal to 0.01 * C * 1 GHz (assuming P(1 GHz) = 100 W), so in
> > > > these
> > > > conditions it would be better to run the busy CPUs at 1 GHz.
> > >
> > > OK, I'll try to measure this.
> > >
> > > thanks,
> > > julia
next prev parent reply other threads:[~2022-01-06 20:28 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-13 22:52 cpufreq: intel_pstate: map utilization into the pstate range Julia Lawall
2021-12-17 18:36 ` Rafael J. Wysocki
2021-12-17 19:32 ` Julia Lawall
2021-12-17 20:36 ` Francisco Jerez
2021-12-17 22:51 ` Julia Lawall
2021-12-18 0:04 ` Francisco Jerez
2021-12-18 6:12 ` Julia Lawall
2021-12-18 10:19 ` Francisco Jerez
2021-12-18 11:07 ` Julia Lawall
2021-12-18 22:12 ` Francisco Jerez
2021-12-19 6:42 ` Julia Lawall
2021-12-19 14:19 ` Rafael J. Wysocki
2021-12-19 14:30 ` Rafael J. Wysocki
2021-12-19 21:47 ` Julia Lawall
2021-12-19 22:10 ` Francisco Jerez
2021-12-19 22:41 ` Julia Lawall
2021-12-19 23:31 ` Francisco Jerez
2021-12-21 17:04 ` Rafael J. Wysocki
2021-12-21 23:56 ` Francisco Jerez
2021-12-22 14:54 ` Rafael J. Wysocki
2021-12-24 11:08 ` Julia Lawall
2021-12-28 16:58 ` Julia Lawall
2021-12-28 17:40 ` Rafael J. Wysocki
2021-12-28 17:46 ` Julia Lawall
2021-12-28 18:06 ` Rafael J. Wysocki
2021-12-28 18:16 ` Julia Lawall
2021-12-29 9:13 ` Julia Lawall
2021-12-30 17:03 ` Rafael J. Wysocki
2021-12-30 17:54 ` Julia Lawall
2021-12-30 17:58 ` Rafael J. Wysocki
2021-12-30 18:20 ` Julia Lawall
2021-12-30 18:37 ` Rafael J. Wysocki
2021-12-30 18:44 ` Julia Lawall
2022-01-03 15:50 ` Rafael J. Wysocki
2022-01-03 16:41 ` Julia Lawall
2022-01-03 18:23 ` Julia Lawall
2022-01-03 19:58 ` Rafael J. Wysocki
2022-01-03 20:51 ` Julia Lawall
2022-01-04 14:09 ` Rafael J. Wysocki
2022-01-04 15:49 ` Julia Lawall
2022-01-04 19:22 ` Rafael J. Wysocki
2022-01-05 20:19 ` Julia Lawall
2022-01-05 23:46 ` Francisco Jerez
2022-01-06 19:49 ` Julia Lawall
2022-01-06 20:28 ` Srinivas Pandruvada [this message]
2022-01-06 20:43 ` Julia Lawall
2022-01-06 21:55 ` srinivas pandruvada
2022-01-06 21:58 ` Julia Lawall
2022-01-05 0:38 ` Francisco Jerez
2021-12-19 14:14 ` Rafael J. Wysocki
2021-12-19 17:03 ` Julia Lawall
2021-12-19 22:30 ` Francisco Jerez
2021-12-21 18:10 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b2be990d5c31f62d9ce33aa2eb2530708d5607a.camel@linux.intel.com \
--to=srinivas.pandruvada@linux.intel.com \
--cc=currojerez@riseup.net \
--cc=julia.lawall@inria.fr \
--cc=juri.lelli@redhat.com \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox