linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: deepthi@linux.vnet.ibm.com, shangw@linux.vnet.ibm.com,
	arnd@arndb.de, linux-pm@vger.kernel.org, geoff@infradead.org,
	Frederic Weisbecker <fweisbec@gmail.com>,
	linux-kernel@vger.kernel.org, rostedt@goodmis.org, rjw@sisk.pl,
	paul.gortmaker@windriver.com, paulus@samba.org,
	srivatsa.bhat@linux.vnet.ibm.com, schwidefsky@de.ibm.com,
	john.stultz@linaro.org, tglx@linutronix.de,
	paulmck@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org,
	chenhui.zhao@freescale.com
Subject: Re: [RFC PATCH 4/5] cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints
Date: Sat, 27 Jul 2013 16:30:05 +1000	[thread overview]
Message-ID: <1374906605.3795.11.camel@pasglop> (raw)
In-Reply-To: <51F1E15B.3050106@linux.vnet.ibm.com>

On Fri, 2013-07-26 at 08:09 +0530, Preeti U Murthy wrote:
> *The lapic of a broadcast CPU is active always*. Say CPUX, wants the
> broadcast CPU to wake it up at timeX.  Since we cannot program the lapic
> of a remote CPU, CPUX will need to send an IPI to the broadcast CPU,
> asking it to program its lapic to fire at timeX so as to wake up CPUX.
> *With multiple CPUs the overhead of sending IPI, could result in
> performance bottlenecks and may not scale well.*
> 
> Hence the workaround is that the broadcast CPU on each of its timer
> interrupt checks if any of the next timer event of a CPU in deep idle
> state has expired, which can very well be found from dev->next_event of
> that CPU. For example the timeX that has been mentioned above has
> expired. If so the broadcast handler is called to send an IPI to the
> idling CPU to wake it up.
> 
> *If the broadcast CPU, is in tickless idle, its timer interrupt could be
> many ticks away. It could miss waking up a CPU in deep idle*, if its
> wakeup is much before this timer interrupt of the broadcast CPU. But
> without tickless idle, atleast at each period we are assured of a timer
> interrupt. At which time broadcast handling is done as stated in the
> previous paragraph and we will not miss wakeup of CPUs in deep idle states.

But that means a great loss of power saving on the broadcast CPU when the machine
is basically completely idle. We might be able to come up with some thing better.

(Note : I do no know the timer offload code if it exists already, I'm describing
how things could happen "out of the blue" without any knowledge of pre-existing
framework here)

We can know when the broadcast CPU expects to wake up next. When a CPU goes to
a deep sleep state, it can then

 - Indicate to the broadcast CPU when it intends to be woken up by queuing
itself into an ordered queue (ordered by target wakeup time). (OPTIMISATION:
Play with the locality of that: have one queue (and one "broadcast CPU") per
chip or per node instead of a global one to limit cache bouncing).

 - Check if that happens before the broadcast CPU intended wake time (we
need statistics to see how often that happens), and in that case send an IPI
to wake it up now. When the broadcast CPU goes to sleep, it limits its sleep
time to the min of it's intended sleep time and the new sleeper time.
(OPTIMISATION: Dynamically chose a broadcast CPU based on closest expiry ?)

 - We can probably limit spurrious wakeups a *LOT* by aligning that target time
to a global jiffy boundary, meaning that several CPUs going to idle are likely
to be choosing the same. Or maybe better, an adaptative alignment by essentially
getting more coarse grained as we go further in the future

 - When the "broadcast" CPU goes to sleep, it can play the same game of alignment.

I don't like the concept of a dedicated broadcast CPU however. I'd rather have a
general queue (or per node) of sleepers needing a wakeup and more/less dynamically
pick a waker to be the last man standing, but it does make things a bit more
tricky with tickless scheduler (non-idle).

Still, I wonder if we could just have some algorithm to actually pick wakers
more dynamically based on who ever has the closest "next wakeup" planned,
that sort of thing. A fixed broadcaster will create an imbalance in
power/thermal within the chip in addition to needing to be moved around on
hotplug etc...

Cheers,
Ben.

  parent reply	other threads:[~2013-07-27  6:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-25  9:02 [RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Preeti U Murthy
2013-07-25  9:02 ` [RFC PATCH 1/5] powerpc: Free up the IPI message slot of ipi call function (PPC_MSG_CALL_FUNC) Preeti U Murthy
2013-07-25  9:02 ` [RFC PATCH 2/5] powerpc: Implement broadcast timer interrupt as an IPI message Preeti U Murthy
2013-07-25  9:02 ` [RFC PATCH 3/5] cpuidle/ppc: Add timer offload framework to support deep idle states Preeti U Murthy
2013-07-25  9:03 ` [RFC PATCH 4/5] cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints Preeti U Murthy
2013-07-25 13:30   ` Frederic Weisbecker
2013-07-26  2:39     ` Preeti U Murthy
2013-07-26  3:19       ` Paul Mackerras
2013-07-26  3:35         ` Preeti U Murthy
2013-07-26  4:11       ` Preeti U Murthy
2013-07-27  6:30       ` Benjamin Herrenschmidt [this message]
2013-07-27  7:50         ` Preeti U Murthy
2013-07-29  5:28           ` Vaidyanathan Srinivasan
2013-07-29 10:11             ` Preeti U Murthy
2013-07-29  5:11         ` Vaidyanathan Srinivasan
2013-07-26  3:03     ` Preeti U Murthy
2013-07-25  9:03 ` [RFC PATCH 5/5] cpuidle/ppc: Add longnap state to the idle states on powernv Preeti U Murthy
2013-07-26 10:05 ` [RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Li Yang-R58472
2013-07-26 13:11   ` Preeti U Murthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1374906605.3795.11.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=arnd@arndb.de \
    --cc=chenhui.zhao@freescale.com \
    --cc=deepthi@linux.vnet.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=geoff@infradead.org \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=rostedt@goodmis.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=shangw@linux.vnet.ibm.com \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).