public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: Julia Lawall <julia.lawall@inria.fr>,
	Francisco Jerez <currojerez@riseup.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range
Date: Thu, 06 Jan 2022 12:28:26 -0800	[thread overview]
Message-ID: <1b2be990d5c31f62d9ce33aa2eb2530708d5607a.camel@linux.intel.com> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2201062044340.3098@hadrien>

On Thu, 2022-01-06 at 20:49 +0100, Julia Lawall wrote:
> 
> On Wed, 5 Jan 2022, Francisco Jerez wrote:
> 
> > Julia Lawall <julia.lawall@inria.fr> writes:
> > 
> > > On Tue, 4 Jan 2022, Rafael J. Wysocki wrote:
> > > 
> > > > On Tue, Jan 4, 2022 at 4:49 PM Julia Lawall <
> > > > julia.lawall@inria.fr> wrote:
> > > > > I tried the whole experiment again on an Intel w2155 (one
> > > > > socket, 10
> > > > > physical cores, pstates 12, 33, and 45).
> > > > > 
> > > > > For the CPU there is a small jump a between 32 and 33 - less
> > > > > than for the
> > > > > 6130.
> > > > > 
> > > > > For the RAM, there is a big jump between 21 and 22.
> > > > > 
> > > > > Combining them leaves a big jump between 21 and 22.
> > > > 
> > > > These jumps are most likely related to voltage increases.
> > > > 
> > > > > It seems that the definition of efficient is that there is no
> > > > > more cost
> > > > > for the computation than the cost of simply having the
> > > > > machine doing any
> > > > > computation at all.  It doesn't take into account the time
> > > > > and energy
> > > > > required to do some actual amount of work.
> > > > 
> > > > Well, that's not what I wanted to say.
> > > 
> > > I was referring to Francisco's comment that the lowest indicated
> > > frequency
> > > should be the most efficient one.  Turbostat also reports the
> > > lowest
> > > frequency as the most efficient one.  In my graph, there are the
> > > pstates 7
> > > and 10, which give exactly the same energy consumption as 12.  7
> > > and 10
> > > are certainly less efficient, because the energy consumption is
> > > the same,
> > > but the execution speed is lower.
> > > 
> > > > Of course, the configuration that requires less energy to be
> > > > spent to
> > > > do a given amount of work is more energy-efficient.  To measure
> > > > this,
> > > > the system needs to be given exactly the same amount of work
> > > > for each
> > > > run and the energy spent by it during each run needs to be
> > > > compared.
> > 
> > I disagree that the system needs to be given the exact same amount
> > of
> > work in order to measure differences in energy efficiency.  The
> > average
> > energy efficiency of Julia's 10s workloads can be calculated easily
> > in
> > both cases (e.g. as the W/E ratio below, W will just be a different
> > value for each run), and the result will likely approximate the
> > instantaneous energy efficiency of the fixed P-states we're
> > comparing,
> > since her workload seems to be fairly close to a steady state.
> > 
> > > This is bascially my point of view, but there is a question about
> > > it.  If
> > > over 10 seconds you consume 10J and by running twice as fast you
> > > would
> > > consume only 6J, then how do you account for the nest 5
> > > seconds?  If the
> > > machine is then idle for the next 5 seconds, maybe you would end
> > > up
> > > consuming 8J in total over the 10 seconds.  But if you take
> > > advantage of
> > > the free 5 seconds to pack in another job, then you end up
> > > consuming 12J.
> > > 
> > 
> > Geometrically, such an oscillatory workload with periods of idling
> > and
> > periods of activity would give an average power consumption along
> > the
> > line that passes through the points corresponding to both states on
> > the
> > CPU's power curve -- IOW your average power consumption will just
> > be the
> > weighted average of the power consumption of each state (with the
> > duty
> > cycle t_i/t_total of each state being its weight):
> > 
> > P_avg = t_0/t_total * P_0 + t_1/t_total * P_1
> > 
> > Your energy usage would just be 10s times that P_avg, since you're
> > assuming that the total runtime of the workload is fixed at 10s
> > independent of how long the CPU actually takes to complete the
> > computation.  In cases where the P-state during the period of
> > activity
> > t_1 is equal or lower to the maximum efficiency P-state, that line
> > segment is guaranteed to lie below the power curve, indicating that
> > such
> > oscillation is more efficient than running the workload fixed to
> > its
> > average P-state.
> > 
> > That said, this scenario doesn't really seem very relevant to your
> > case,
> > since the last workload you've provided turbostat traces for seems
> > to
> > show almost no oscillation.  If there was such an oscillation, your
> > total energy usage would still be greater for oscillations between
> > idle
> > and some P-state different from the most efficient one.  Such an
> > oscillation doesn't explain the anomaly we're seeing on your
> > traces,
> > which show more energy-efficient instantaneous behavior for a P-
> > state 2x
> > the one reported by your processor as the most energy-efficient.
> 
> All the turbostat output and graphs I have sent recently were just
> for
> continuous spinning:
> 
> for(;;);
> 
> Now I am trying running for the percentage of the time corresponding
> to
> 10 / P for pstate P (ie 0.5 of the time for pstate 20), and then
> sleeping,
> to see whether one can just add the sleeping power consumption of the
> machine to compute the efficiency as Rafael suggested.
> 
Before doing comparison try freezing uncore.

wrmsr -a 0x620 0x0808

to Freeze uncore at 800MHz. Any other value is fine.

Thanks,
Srinivas

> julia
> 
> > > > However, I think that you are interested in answering a
> > > > different
> > > > question: Given a specific amount of time (say T) to run the
> > > > workload,
> > > > what frequency to run the CPUs doing the work at in order to
> > > > get the
> > > > maximum amount of work done per unit of energy spent by the
> > > > system (as
> > > > a whole)?  Or, given 2 different frequency levels, which of
> > > > them to
> > > > run the CPUs at to get more work done per energy unit?
> > > 
> > > This is the approach where you assume that the machine will be
> > > idle in any
> > > leftover time.  And it accounts for the energy consumed in that
> > > idle time.
> > > 
> > > > The work / energy ratio can be estimated as
> > > > 
> > > > W / E = C * f / P(f)
> > > > 
> > > > where C is a constant and P(f) is the power drawn by the whole
> > > > system
> > > > while the CPUs doing the work are running at frequency f, and
> > > > of
> > > > course for the system discussed previously it is greater in the
> > > > 2 GHz
> > > > case.
> > > > 
> > > > However P(f) can be divided into two parts, P_1(f) that really
> > > > depends
> > > > on the frequency and P_0 that does not depend on it.  If P_0 is
> > > > large
> > > > enough to dominate P(f), which is the case in the 10-20 range
> > > > of
> > > > P-states on the system in question, it is better to run the
> > > > CPUs doing
> > > > the work faster (as long as there is always enough work to do
> > > > for
> > > > them; see below).  This doesn't mean that P(f) is not a convex
> > > > function of f, though.
> > > > 
> > > > Moreover, this assumes that there will always be enough work
> > > > for the
> > > > system to do when running the busy CPUs at 2 GHz, or that it
> > > > can go
> > > > completely idle when it doesn't do any work, but let's see what
> > > > happens if the amount of work to do is W_1 = C * 1 GHz * T and
> > > > the
> > > > system cannot go completely idle when the work is done.
> > > > 
> > > > Then, nothing changes for the busy CPUs running at 1 GHz, but
> > > > in the 2
> > > > GHz case we get W = W_1 and E = P(2 GHz) * T/2 + P_0 * T/2,
> > > > because
> > > > the busy CPUs are only busy 1/2 of the time, but power P_0 is
> > > > drawn by
> > > > the system regardless.  Hence, in the 2 GHz case (assuming P(2
> > > > GHz) =
> > > > 120 W and P_0 = 90 W), we get
> > > > 
> > > > W / E = 2 * C * 1 GHz / (P(2 GHz) + P_0) = 0.0095 * C * 1 GHz
> > > > 
> > > > which is slightly less than the W / E ratio at 1 GHz
> > > > approximately
> > > > equal to 0.01 * C * 1 GHz (assuming P(1 GHz) = 100 W), so in
> > > > these
> > > > conditions it would be better to run the busy CPUs at 1 GHz.
> > > 
> > > OK, I'll try to measure this.
> > > 
> > > thanks,
> > > julia


  reply	other threads:[~2022-01-06 20:28 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13 22:52 cpufreq: intel_pstate: map utilization into the pstate range Julia Lawall
2021-12-17 18:36 ` Rafael J. Wysocki
2021-12-17 19:32   ` Julia Lawall
2021-12-17 20:36     ` Francisco Jerez
2021-12-17 22:51       ` Julia Lawall
2021-12-18  0:04         ` Francisco Jerez
2021-12-18  6:12           ` Julia Lawall
2021-12-18 10:19             ` Francisco Jerez
2021-12-18 11:07               ` Julia Lawall
2021-12-18 22:12                 ` Francisco Jerez
2021-12-19  6:42                   ` Julia Lawall
2021-12-19 14:19                     ` Rafael J. Wysocki
2021-12-19 14:30                       ` Rafael J. Wysocki
2021-12-19 21:47                       ` Julia Lawall
2021-12-19 22:10                     ` Francisco Jerez
2021-12-19 22:41                       ` Julia Lawall
2021-12-19 23:31                         ` Francisco Jerez
2021-12-21 17:04                       ` Rafael J. Wysocki
2021-12-21 23:56                         ` Francisco Jerez
2021-12-22 14:54                           ` Rafael J. Wysocki
2021-12-24 11:08                             ` Julia Lawall
2021-12-28 16:58                           ` Julia Lawall
2021-12-28 17:40                             ` Rafael J. Wysocki
2021-12-28 17:46                               ` Julia Lawall
2021-12-28 18:06                                 ` Rafael J. Wysocki
2021-12-28 18:16                                   ` Julia Lawall
2021-12-29  9:13                                   ` Julia Lawall
2021-12-30 17:03                                     ` Rafael J. Wysocki
2021-12-30 17:54                                       ` Julia Lawall
2021-12-30 17:58                                         ` Rafael J. Wysocki
2021-12-30 18:20                                           ` Julia Lawall
2021-12-30 18:37                                             ` Rafael J. Wysocki
2021-12-30 18:44                                               ` Julia Lawall
2022-01-03 15:50                                                 ` Rafael J. Wysocki
2022-01-03 16:41                                                   ` Julia Lawall
2022-01-03 18:23                                                   ` Julia Lawall
2022-01-03 19:58                                                     ` Rafael J. Wysocki
2022-01-03 20:51                                                       ` Julia Lawall
2022-01-04 14:09                                                         ` Rafael J. Wysocki
2022-01-04 15:49                                                           ` Julia Lawall
2022-01-04 19:22                                                             ` Rafael J. Wysocki
2022-01-05 20:19                                                               ` Julia Lawall
2022-01-05 23:46                                                                 ` Francisco Jerez
2022-01-06 19:49                                                                   ` Julia Lawall
2022-01-06 20:28                                                                     ` Srinivas Pandruvada [this message]
2022-01-06 20:43                                                                       ` Julia Lawall
2022-01-06 21:55                                                                         ` srinivas pandruvada
2022-01-06 21:58                                                                           ` Julia Lawall
2022-01-05  0:38                                                         ` Francisco Jerez
2021-12-19 14:14     ` Rafael J. Wysocki
2021-12-19 17:03       ` Julia Lawall
2021-12-19 22:30         ` Francisco Jerez
2021-12-21 18:10         ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b2be990d5c31f62d9ce33aa2eb2530708d5607a.camel@linux.intel.com \
    --to=srinivas.pandruvada@linux.intel.com \
    --cc=currojerez@riseup.net \
    --cc=julia.lawall@inria.fr \
    --cc=juri.lelli@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox