From: Paul Clarke <pc@us.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: paulmck@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: powerpc: mitigate impact of decrementer reset
Date: Thu, 13 Nov 2014 13:33:58 -0600 [thread overview]
Message-ID: <546507A6.9040800@us.ibm.com> (raw)
In-Reply-To: <1415846346.28703.1.camel@concordia>
On 11/12/2014 08:39 PM, Michael Ellerman wrote:
> On Wed, 2014-11-05 at 11:06 -0600, Paul Clarke wrote:
>> On 10/07/2014 09:52 PM, Michael Ellerman wrote:
>>> On Tue, 2014-07-10 at 19:13:24 UTC, Paul Clarke wrote:
>>>> This patch short-circuits the reset of the decrementer, exiting after
>>>> the decrementer reset, but before the housekeeping tasks if the only
>>>> need for the interrupt is simply to reset it. After this patch,
>>>> the latency spike was measured at about 150 nanoseconds.
>>
>>> Thanks for the excellent changelog. But this patch makes me a bit nervous :)
>>>
>>> Do you know where the latency is coming from? Is it primarily the irq work?
>>
>> Yes, it is all under irq_enter (measured at ~10us) and irq_exit (~12us).
>
> Hmm, OK. I actually meant irq_work_run().
>
> AIUI irq_enter/exit() are just state tracking, they shouldn't be actually
> running work.
>
> How are you measuring it?
ftrace function_graph tracer:
--
127.425212 | | .irq_enter() {
127.425213 | | .rcu_irq_enter() {
127.425213 | + 12.206 us | .rcu_eqs_exit_common.isra.41();
127.425226 | + 12.750 us | }
... RCU is a big hitter
127.425226 | | .vtime_common_account_irq_enter() {
127.425226 | | .vtime_account_user() {
127.425226 | 0.032 us | ._raw_spin_lock();
127.425227 | 0.034 us | .get_vtime_delta();
127.425227 | | .account_user_time() {
127.425228 | 0.030 us | .cpuacct_account_field();
127.425228 | | .acct_account_cputime() {
127.425228 | 0.082 us | .__acct_update_integrals();
127.425229 | 0.562 us | }
127.425229 | 1.500 us | }
127.425229 | 2.954 us | }
127.425230 | 3.434 us | }
... but even accounting is not insignificant
127.425230 | + 17.218 us | }
127.425230 | | /* timer_interrupt_entry: [...] */
... nothing to see here, because there's nothing to do except reset the
decrementer
127.425230 | | /* timer_interrupt_exit: [...] */
... (less than 1 us spent doing the "required" work)
127.425231 | | .irq_exit() {
127.425231 | | .vtime_gen_account_irq_exit() {
127.425231 | 0.036 us | ._raw_spin_lock();
127.425232 | | .__vtime_account_system() {
127.425232 | 0.030 us | .get_vtime_delta();
127.425232 | | .account_system_time() {
127.425233 | 0.030 us | .cpuacct_account_field();
127.425233 | | .acct_account_cputime() {
127.425233 | 0.072 us | .__acct_update_integrals();
127.425234 | 0.564 us | }
127.425234 | 1.546 us | }
127.425234 | 2.528 us | }
127.425235 | 3.700 us | }
... significant accounting time
127.425235 | 0.032 us | .idle_cpu();
127.425235 | | .tick_nohz_irq_exit() {
127.425236 | | .can_stop_full_tick() {
127.425236 | 0.022 us | .sched_can_stop_tick();
127.425236 | 0.020 us | .posix_cpu_timers_can_stop_tick()
127.425237 | 0.970 us | }
127.425237 | 0.082 us | .ktime_get();
127.425238 | | .tick_nohz_stop_sched_tick() {
127.425238 | 0.032 us | .timekeeping_max_deferment();
127.425238 | | .get_next_timer_interrupt() {
127.425239 | 0.038 us | ._raw_spin_lock();
127.425239 | | .hrtimer_get_next_event() {
127.425239 | 0.030 us | ._raw_spin_lock_irqsave();
127.425240 | 0.028 us | ._raw_spin_unlock_irqrestore
127.425240 | 0.984 us | }
127.425241 | 1.936 us | }
127.425241 | 0.032 us | .scheduler_tick_max_deferment();
127.425241 | 3.438 us | }
127.425242 | 5.880 us | }
127.425242 | | .rcu_irq_exit() {
127.425242 | 0.102 us | .rcu_eqs_enter_common.isra.40();
127.425243 | 0.576 us | }
127.425243 | + 12.156 us | }
This one was almost 30 us total (17.218 + 12.156 = 29.374 us), just to
reset the decrementer.
>>> If so I'd prefer if we could move the short circuit into __timer_interrupt()
>>> itself. That way we'd still have the trace points usable, and it would
>>> hopefully result in less duplicated logic.
>>
>> But irq_enter and irq_exit are called in timer_interrupt, before
>> __timer_interrupt is called. I don't see how that helps. The time
>> spent in __timer_interrupt is minuscule by comparison.
>
> Right, it won't help if it's irq_enter() that is causing the delay. But I was
> assuming it was irq_work_run().
>
>> Are you suggesting that irq_enter/exit be moved into __timer_interrupt
>> as well? (I'm not sure how that would impact the existing call to
>> __timer_interrupt from tick_broadcast_ipi_handler? And if there is no
>> impact, what's the point of separating timer_interrupt and
>> __timer_interrupt?)
>
> The point is __timer_interrupt() is called from tick_broadcast_ipi_handler(),
> which is called from smp_ipi_demux(), from icp_hv_ipi_action(), from
> __do_irq(), which has already done irq_enter() (and will do irq_exit()).
If that's the only impact, maybe an "IRQ entered" flag would suffice to
either prevent a 2nd call to irq_enter(), or allow irq_enter to be
"reentrant" (where it just returns if it was called and the last call
was not paired with an irq_exit yet? Alternatively, a new parameter to
__timer_interrupt() which indicates the same.
PC
next prev parent reply other threads:[~2014-11-13 19:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1412708517-84726-1-git-send-email-pc@us.ibm.com>
2014-10-07 19:13 ` [PATCH] powerpc: mitigate impact of decrementer reset Paul Clarke
2014-10-08 2:52 ` Michael Ellerman
2014-10-08 10:27 ` Preeti U Murthy
2014-11-05 17:06 ` Paul Clarke
2014-11-13 2:39 ` Michael Ellerman
2014-11-13 19:33 ` Paul Clarke [this message]
2014-10-08 5:37 ` [PATCH] " Heinz Wrobel
2014-10-08 12:27 ` Paul Clarke
2014-11-10 10:08 ` Benjamin Herrenschmidt
2014-11-10 20:58 ` Paul Clarke
2014-11-13 2:42 ` Michael Ellerman
2014-11-17 19:18 ` Paul E. McKenney
2014-11-18 1:46 ` Michael Ellerman
2014-11-18 3:08 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=546507A6.9040800@us.ibm.com \
--to=pc@us.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).