Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A

devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Morten Rasmussen <morten.rasmussen@arm.com>
To: Sowjanya Komatineni <skomatineni@nvidia.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>,
	sudeep.holla@arm.com, souvik.chakravarty@arm.com,
	thierry.reding@gmail.com, mark.rutland@arm.com,
	lorenzo.pieralisi@arm.com, daniel.lezcano@linaro.org,
	robh+dt@kernel.org, jonathanh@nvidia.com, ksitaraman@nvidia.com,
	sanjayc@nvidia.com, linux-arm-kernel@lists.infradead.org,
	linux-tegra@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org, devicetree@vger.kernel.org
Subject: Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
Date: Mon, 26 Apr 2021 15:11:50 +0200	[thread overview]
Message-ID: <20210426131150.GA36549@e123083-lin> (raw)
In-Reply-To: <486856be-1e66-fd77-e306-949b91bcdb1d@nvidia.com>

Hi,

On Fri, Apr 23, 2021 at 03:24:51PM -0700, Sowjanya Komatineni wrote:
> On 4/23/21 1:16 PM, Lukasz Luba wrote:
> > Hi Sowjanya,
> > 
> > On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> > > Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs
> > > which is
> > > in charge of deciding on state transition based on target state,
> > > state idle
> > > time, and some other Tegra CPU core cluster states information.
> > > 
> > > Current PSCI specification don't have function defined for passing
> > > runtime
> > > state idle time predicted by governor (based on next events and
> > > state target
> > > residency) to ARM trusted firmware.
> > 
> > Do you have some numbers from experiments showing that these idle
> > governor prediction values, which are passed from kernel to MCE
> > firmware, are making a good 'guess'?
> > How much precision (1us? 1ms?) in the values do you need there?
> 
> it could also be in few ms depending on when next cpu event/activity might
> happen which is not transparent to MCE firmware.
> 
> > 
> > IIRC (probably Rafael's presentations) predicting in the kernel
> > something like CPU idle time residency is not a trivial thing.
> > 
> > Another idea (depending on DT structure and PSCI bits):
> > Could this be solved differently, but just having a knowledge that if
> > the governor requested some C-state, this means governor 'predicted'
> > an idle residency to be greater that min_residency attached to this
> > C-state?
> > Then, when that request shows up in your FW, you know that it must be at
> > least min_residency because of this C-state id.
> C6 is the only deepest state for Tegra194 Carmel CPU that we support in
> addition to C1 (WFI) idle state.
> 
> MCE firmware gets state crossover thresholds for C1 to C6 transition from
> TF-A and uses it along with state idle time to decide on C6 state entry
> based on its background work.
> 
> Assuming for now if we use min_residency as state idle time which is static
> value from DT, then it enters into deepest state C6 always as we use
> min_residency value we use is always higher than state crossover threshold.
> 
> But MCE firmware is not aware of when next cpu event can happen to predict
> if next event can take longer than state min_residency time.
> 
> Using min residency in such case is very conservative where MCE firmware
> exits C6 state early where we may not have better power saving.
> 
> But with MCE firmware being aware of when next event can happen it can use
> that to stay in C6 state without early exit for better power savings.
> 
> > It would depend on number of available states, max_residency, scale
> > that you would choose while assigning values from [0, max_residency]
> > to each state.
> > IIRC there can be many state IDs for idle, so it would depend on
> > number of bits encoding this state, and your needs. Example of
> > linear scale:
> > 4-bits encoding idle state and max predicted residency 10msec,
> > that means 10000us / 16 states = 625us/state.
> > The max_residency might be split differently, using different than
> > linear function, to have some rage more precised.
> > 
> > Open question is if these idle states must be all represented
> > in DT, or there is a way of describing a 'set of idle states'
> > automatically.
> We only support C6 state through DT as C6 is the only deepest state for
> Tegra194 carmel CPU. WFI idle state is completely handled by kernel and does
> not require MCE sequences for entry/exit.

I think Lukasz's point is that you can encode the predicted idle time by
having multiple idle_state entries with different min_residency mapping
to the same actual idle-state. So you would several variants of C6 with
different min_residencies and if the OS picks one with longer
min_residency firmware would have a better estimate of the predicted
idle residency.

I'm not convinced it is the right way to work around passing this
information on to firmware. I would rather see an example of how well
this works (best with numbers) and have a proper solution.

Morten

     prev parent reply	other threads:[~2021-04-26 13:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 20:30 [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A Sowjanya Komatineni
2021-04-22 20:30 ` [RFC PATCH 1/4] firmware/psci: add support for PSCI function SET_STATE_IDLE_TIME Sowjanya Komatineni
2021-04-22 20:30 ` [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state Sowjanya Komatineni
2021-04-23 12:22   ` Rafael J. Wysocki
2021-04-23 18:33     ` Sowjanya Komatineni
2021-04-22 20:30 ` [RFC PATCH 3/4] cpuidle: psci: pass state idle time before state enter callback Sowjanya Komatineni
2021-04-22 20:30 ` [RFC PATCH 4/4] arm64: dts: tegra194: Add CPU idle states Sowjanya Komatineni
2021-04-23  1:03 ` [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A Sowjanya Komatineni
2021-04-23 12:27 ` Rafael J. Wysocki
2021-04-23 18:32   ` Sowjanya Komatineni
2021-04-23 20:16 ` Lukasz Luba
2021-04-23 22:24   ` Sowjanya Komatineni
2021-04-26 10:10     ` Souvik Chakravarty
2021-04-26 13:11     ` Morten Rasmussen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426131150.GA36549@e123083-lin \
    --to=morten.rasmussen@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=devicetree@vger.kernel.org \
    --cc=jonathanh@nvidia.com \
    --cc=ksitaraman@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=lukasz.luba@arm.com \
    --cc=mark.rutland@arm.com \
    --cc=robh+dt@kernel.org \
    --cc=sanjayc@nvidia.com \
    --cc=skomatineni@nvidia.com \
    --cc=souvik.chakravarty@arm.com \
    --cc=sudeep.holla@arm.com \
    --cc=thierry.reding@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).