From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: deepthi@linux.vnet.ibm.com, shangw@linux.vnet.ibm.com,
arnd@arndb.de, linux-pm@vger.kernel.org, geoff@infradead.org,
linux-kernel@vger.kernel.org, rostedt@goodmis.org, rjw@sisk.pl,
paul.gortmaker@windriver.com, paulus@samba.org,
srivatsa.bhat@linux.vnet.ibm.com, schwidefsky@de.ibm.com,
john.stultz@linaro.org, tglx@linutronix.de,
paulmck@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org,
chenhui.zhao@freescale.com
Subject: Re: [RFC PATCH 4/5] cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints
Date: Fri, 26 Jul 2013 08:09:23 +0530 [thread overview]
Message-ID: <51F1E15B.3050106@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130725133044.GA7400@somewhere>
Hi Frederic,
On 07/25/2013 07:00 PM, Frederic Weisbecker wrote:
> On Thu, Jul 25, 2013 at 02:33:02PM +0530, Preeti U Murthy wrote:
>> In the current design of timer offload framework, the broadcast cpu should
>> *not* go into tickless idle so as to avoid missed wakeups on CPUs in deep idle states.
>>
>> Since we prevent the CPUs entering deep idle states from programming the lapic of the
>> broadcast cpu for their respective next local events for reasons mentioned in
>> PATCH[3/5], the broadcast CPU checks if there are any CPUs to be woken up during
>> each of its timer interrupt programmed to its local events.
>>
>> With tickless idle, the broadcast CPU might not get a timer interrupt till after
>> many ticks which can result in missed wakeups on CPUs in deep idle states. By
>> disabling tickless idle, worst case, the tick_sched hrtimer will trigger a
>> timer interrupt every period to check for broadcast.
>>
>> However the current setup of tickless idle does not let us make the choice
>> of tickless on individual cpus. NOHZ_MODE_INACTIVE which disables tickless idle,
>> is a system wide setting. Hence resort to an arch specific call to check if a cpu
>> can go into tickless idle.
>
> Hi Preeti,
>
> I'm not exactly sure why you can't enter the broadcast CPU in dynticks idle mode.
> I read in the previous patch that's because in dynticks idle mode the broadcast
> CPU deactivates its lapic so it doesn't receive the IPI. But may be I misunderstood.
> Anyway that's not good for powersaving.
Let me elaborate. The CPUs in deep idle states have their lapics
deactivated. This means the next timer event which would typically have
been taken care of by a lapic firing at the appropriate moment does not
get taken care of in deep idle states, due to the lapic being switched off.
Hence such CPUs offload their next timer event to the broadcast CPU,
which should *not* enter deep idle states. The broadcast CPU has the
responsibility of waking the CPUs in deep idle states.
*The lapic of a broadcast CPU is active always*. Say CPUX, wants the
broadcast CPU to wake it up at timeX. Since we cannot program the lapic
of a remote CPU, CPUX will need to send an IPI to the broadcast CPU,
asking it to program its lapic to fire at timeX so as to wake up CPUX.
*With multiple CPUs the overhead of sending IPI, could result in
performance bottlenecks and may not scale well.*
Hence the workaround is that the broadcast CPU on each of its timer
interrupt checks if any of the next timer event of a CPU in deep idle
state has expired, which can very well be found from dev->next_event of
that CPU. For example the timeX that has been mentioned above has
expired. If so the broadcast handler is called to send an IPI to the
idling CPU to wake it up.
*If the broadcast CPU, is in tickless idle, its timer interrupt could be
many ticks away. It could miss waking up a CPU in deep idle*, if its
wakeup is much before this timer interrupt of the broadcast CPU. But
without tickless idle, atleast at each period we are assured of a timer
interrupt. At which time broadcast handling is done as stated in the
previous paragraph and we will not miss wakeup of CPUs in deep idle states.
Yeah it is true that not allowing the broadcast CPU to enter tickless
idle is bad for power savings, but for the use case that we are aiming
at in this patch series, the current approach seems to be the best, with
minimal trade-offs in performance, power savings, scalability and no
change in the broadcast framework that exists today in the kernel.
>
> Also when an arch wants to prevent a CPU from entering dynticks idle mode, it typically
> use arch_needs_cpu(). May be that could fit for you as well?
Oh ok thanks :) I will look into this and get back on if we can use it.
Regards
Preeti U Murthy
next prev parent reply other threads:[~2013-07-26 2:42 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-25 9:02 [RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Preeti U Murthy
2013-07-25 9:02 ` [RFC PATCH 1/5] powerpc: Free up the IPI message slot of ipi call function (PPC_MSG_CALL_FUNC) Preeti U Murthy
2013-07-25 9:02 ` [RFC PATCH 2/5] powerpc: Implement broadcast timer interrupt as an IPI message Preeti U Murthy
2013-07-25 9:02 ` [RFC PATCH 3/5] cpuidle/ppc: Add timer offload framework to support deep idle states Preeti U Murthy
2013-07-25 9:03 ` [RFC PATCH 4/5] cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints Preeti U Murthy
2013-07-25 13:30 ` Frederic Weisbecker
2013-07-26 2:39 ` Preeti U Murthy [this message]
2013-07-26 3:19 ` Paul Mackerras
2013-07-26 3:35 ` Preeti U Murthy
2013-07-26 4:11 ` Preeti U Murthy
2013-07-27 6:30 ` Benjamin Herrenschmidt
2013-07-27 7:50 ` Preeti U Murthy
2013-07-29 5:28 ` Vaidyanathan Srinivasan
2013-07-29 10:11 ` Preeti U Murthy
2013-07-29 5:11 ` Vaidyanathan Srinivasan
2013-07-26 3:03 ` Preeti U Murthy
2013-07-25 9:03 ` [RFC PATCH 5/5] cpuidle/ppc: Add longnap state to the idle states on powernv Preeti U Murthy
2013-07-26 10:05 ` [RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Li Yang-R58472
2013-07-26 13:11 ` Preeti U Murthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51F1E15B.3050106@linux.vnet.ibm.com \
--to=preeti@linux.vnet.ibm.com \
--cc=arnd@arndb.de \
--cc=chenhui.zhao@freescale.com \
--cc=deepthi@linux.vnet.ibm.com \
--cc=fweisbec@gmail.com \
--cc=geoff@infradead.org \
--cc=john.stultz@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paul.gortmaker@windriver.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=rjw@sisk.pl \
--cc=rostedt@goodmis.org \
--cc=schwidefsky@de.ibm.com \
--cc=shangw@linux.vnet.ibm.com \
--cc=srivatsa.bhat@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).