linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Paul Clarke <pc@us.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: paulmck@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: powerpc: mitigate impact of decrementer reset
Date: Thu, 13 Nov 2014 13:33:58 -0600	[thread overview]
Message-ID: <546507A6.9040800@us.ibm.com> (raw)
In-Reply-To: <1415846346.28703.1.camel@concordia>

On 11/12/2014 08:39 PM, Michael Ellerman wrote:
> On Wed, 2014-11-05 at 11:06 -0600, Paul Clarke wrote:
>> On 10/07/2014 09:52 PM, Michael Ellerman wrote:
>>> On Tue, 2014-07-10 at 19:13:24 UTC, Paul Clarke wrote:
>>>> This patch short-circuits the reset of the decrementer, exiting after
>>>> the decrementer reset, but before the housekeeping tasks if the only
>>>> need for the interrupt is simply to reset it.  After this patch,
>>>> the latency spike was measured at about 150 nanoseconds.
>>
>>> Thanks for the excellent changelog. But this patch makes me a bit nervous :)
>>>
>>> Do you know where the latency is coming from? Is it primarily the irq work?
>>
>> Yes, it is all under irq_enter (measured at ~10us) and irq_exit (~12us).
>
> Hmm, OK. I actually meant irq_work_run().
>
> AIUI irq_enter/exit() are just state tracking, they shouldn't be actually
> running work.
>
> How are you measuring it?

ftrace function_graph tracer:
--
   127.425212 |                |  .irq_enter() {
   127.425213 |                |    .rcu_irq_enter() {
   127.425213 |  + 12.206 us   |      .rcu_eqs_exit_common.isra.41();
   127.425226 |  + 12.750 us   |    }
... RCU is a big hitter
   127.425226 |                |    .vtime_common_account_irq_enter() {
   127.425226 |                |      .vtime_account_user() {
   127.425226 |    0.032 us    |        ._raw_spin_lock();
   127.425227 |    0.034 us    |        .get_vtime_delta();
   127.425227 |                |        .account_user_time() {
   127.425228 |    0.030 us    |          .cpuacct_account_field();
   127.425228 |                |          .acct_account_cputime() {
   127.425228 |    0.082 us    |            .__acct_update_integrals();
   127.425229 |    0.562 us    |          }
   127.425229 |    1.500 us    |        }
   127.425229 |    2.954 us    |      }
   127.425230 |    3.434 us    |    }
... but even accounting is not insignificant
   127.425230 |  + 17.218 us   |  }
   127.425230 |                |  /* timer_interrupt_entry: [...] */
... nothing to see here, because there's nothing to do except reset the 
decrementer
   127.425230 |                |  /* timer_interrupt_exit: [...] */
... (less than 1 us spent doing the "required" work)
   127.425231 |                |  .irq_exit() {
   127.425231 |                |    .vtime_gen_account_irq_exit() {
   127.425231 |    0.036 us    |      ._raw_spin_lock();
   127.425232 |                |      .__vtime_account_system() {
   127.425232 |    0.030 us    |        .get_vtime_delta();
   127.425232 |                |        .account_system_time() {
   127.425233 |    0.030 us    |          .cpuacct_account_field();
   127.425233 |                |          .acct_account_cputime() {
   127.425233 |    0.072 us    |            .__acct_update_integrals();
   127.425234 |    0.564 us    |          }
   127.425234 |    1.546 us    |        }
   127.425234 |    2.528 us    |      }
   127.425235 |    3.700 us    |    }
... significant accounting time
   127.425235 |    0.032 us    |    .idle_cpu();
   127.425235 |                |    .tick_nohz_irq_exit() {
   127.425236 |                |      .can_stop_full_tick() {
   127.425236 |    0.022 us    |        .sched_can_stop_tick();
   127.425236 |    0.020 us    |        .posix_cpu_timers_can_stop_tick()
   127.425237 |    0.970 us    |      }
   127.425237 |    0.082 us    |      .ktime_get();
   127.425238 |                |      .tick_nohz_stop_sched_tick() {
   127.425238 |    0.032 us    |        .timekeeping_max_deferment();
   127.425238 |                |        .get_next_timer_interrupt() {
   127.425239 |    0.038 us    |          ._raw_spin_lock();
   127.425239 |                |          .hrtimer_get_next_event() {
   127.425239 |    0.030 us    |            ._raw_spin_lock_irqsave();
   127.425240 |    0.028 us    |            ._raw_spin_unlock_irqrestore
   127.425240 |    0.984 us    |          }
   127.425241 |    1.936 us    |        }
   127.425241 |    0.032 us    |        .scheduler_tick_max_deferment();
   127.425241 |    3.438 us    |      }
   127.425242 |    5.880 us    |    }
   127.425242 |                |    .rcu_irq_exit() {
   127.425242 |    0.102 us    |      .rcu_eqs_enter_common.isra.40();
   127.425243 |    0.576 us    |    }
   127.425243 |  + 12.156 us   |  }

This one was almost 30 us total (17.218 + 12.156 = 29.374 us), just to 
reset the decrementer.

>>> If so I'd prefer if we could move the short circuit into __timer_interrupt()
>>> itself. That way we'd still have the trace points usable, and it would
>>> hopefully result in less duplicated logic.
>>
>> But irq_enter and irq_exit are called in timer_interrupt, before
>> __timer_interrupt is called.  I don't see how that helps.  The time
>> spent in __timer_interrupt is minuscule by comparison.
>
> Right, it won't help if it's irq_enter() that is causing the delay. But I was
> assuming it was irq_work_run().
>
>> Are you suggesting that irq_enter/exit be moved into __timer_interrupt
>> as well?  (I'm not sure how that would impact the existing call to
>> __timer_interrupt from tick_broadcast_ipi_handler?  And if there is no
>> impact, what's the point of separating timer_interrupt and
>> __timer_interrupt?)
>
> The point is __timer_interrupt() is called from tick_broadcast_ipi_handler(),
> which is called from smp_ipi_demux(), from icp_hv_ipi_action(), from
> __do_irq(), which has already done irq_enter() (and will do irq_exit()).

If that's the only impact, maybe an "IRQ entered" flag would suffice to 
either prevent a 2nd call to irq_enter(), or allow irq_enter to be 
"reentrant" (where it just returns if it was called and the last call 
was not paired with an irq_exit yet?  Alternatively, a new parameter to 
__timer_interrupt() which indicates the same.

PC

  reply	other threads:[~2014-11-13 19:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1412708517-84726-1-git-send-email-pc@us.ibm.com>
2014-10-07 19:13 ` [PATCH] powerpc: mitigate impact of decrementer reset Paul Clarke
2014-10-08  2:52   ` Michael Ellerman
2014-10-08 10:27     ` Preeti U Murthy
2014-11-05 17:06     ` Paul Clarke
2014-11-13  2:39       ` Michael Ellerman
2014-11-13 19:33         ` Paul Clarke [this message]
2014-10-08  5:37   ` [PATCH] " Heinz Wrobel
2014-10-08 12:27     ` Paul Clarke
2014-11-10 10:08   ` Benjamin Herrenschmidt
2014-11-10 20:58     ` Paul Clarke
2014-11-13  2:42       ` Michael Ellerman
2014-11-17 19:18         ` Paul E. McKenney
2014-11-18  1:46           ` Michael Ellerman
2014-11-18  3:08             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546507A6.9040800@us.ibm.com \
    --to=pc@us.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).