From: Marcelo Tosatti <mtosatti@redhat.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org, Rik van Riel <riel@redhat.com>,
Luiz Capitulino <lcapitulino@redhat.com>
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups (v2)
Date: Fri, 10 Apr 2015 15:09:07 -0300 [thread overview]
Message-ID: <20150410180907.GA13199@amt.cnet> (raw)
In-Reply-To: <20150407221244.GB6143@lerouge>
On Wed, Apr 08, 2015 at 12:12:45AM +0200, Frederic Weisbecker wrote:
> On Tue, Apr 07, 2015 at 11:10:49PM +0200, Thomas Gleixner wrote:
> > On Mon, 6 Apr 2015, Marcelo Tosatti wrote:
> > > It is only necessary to raise timer softirq
> > > in case there are active timers.
> >
> > Depends. See below.
> >
> > > Limit the ksoftirqd wakeup to that case.
> > >
> > > Fixes a latency spike with isolated CPUs and
> > > nohz full mode.
> >
> > This lacks a proper explanation of the observed issue.
> >
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > unsigned long rcu_delta_jiffies;
> > > struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> > > u64 time_delta;
> > > + bool raise_softirq = false;
> >
> > This shadows the function name raise_softirq(). Not pretty.
> >
> > > time_delta = timekeeping_max_deferment();
> > >
> > > @@ -584,7 +585,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > delta_jiffies = 1;
> > > } else {
> > > /* Get the next timer wheel timer */
> > > - next_jiffies = get_next_timer_interrupt(last_jiffies);
> > > + next_jiffies = get_next_timer_interrupt(last_jiffies,
> > > + &raise_softirq);
> > > delta_jiffies = next_jiffies - last_jiffies;
> > > if (rcu_delta_jiffies < delta_jiffies) {
> > > next_jiffies = last_jiffies + rcu_delta_jiffies;
> > > @@ -703,7 +705,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > */
> > > tick_do_update_jiffies64(ktime_get());
> > > }
> > > - raise_softirq_irqoff(TIMER_SOFTIRQ);
> > > + if (raise_softirq)
> > > + raise_softirq_irqoff(TIMER_SOFTIRQ);
> >
> > This breaks when high resolution timers are disabled (compile or
> > runtime) because then the hrtimer queues are run from the timer
> > softirq.
> >
> > Now assume the following situation:
> >
> > Tick is stopped completely with no timers and no hrtimers pending.
> >
> > Interrupt happens and schedules a hrtimer.
> >
> > nohz_stop_sched_tick()
> > get_next_timer_interrupt(..., &raise_softirq);
> >
> > ---> base->active_timers = 0, so raise_softirq is false
> >
> > tick_program_event(expires)
> > clockevents_program_event(expires)
> >
> > ---> Assume expires is already in the past
> >
> > if (expires <= ktime_get())
> > return -ETIME;
> >
> > if (raise_softirq)
> > raise_softirq_irqoff(TIMER_SOFTIRQ);
> >
> > So because the tick device was not armed you wont get a tick
> > interrupt up to the point where tick_nohz_stop_sched_tick() is called
> > again which might be far off.
> >
> > I can see that the unconditional raise_softirq_irqoff() is suboptimal,
> > but it was a rather simple solution to get stuff rolling again because
> > it forces the cpu out of the inner idle loop which in turn restarts
> > the tick.
>
> Doh, that's the kind of side effect I was worried about, thanks for the
> explanation. The necessary exit out of the idle loop implied by this
> softirq when the timer fails to be programmed really deserves a comment.
>
> And note how it relies on the magic !in_interrupt() in this piece of
> hardirq code, otherwise that would be softirq from hardirq without
> reschedule() and thus no exit from idle loop, and thus no tick
> reprogramming.
>
> Let's see if I can come up with some solution to clean this up, if
> Marcelo doesn't beat me at it.
The problem is the following from -RT:
#ifdef CONFIG_PREEMPT_RT_BASE
if (!hrtimer_rt_defer(timer))
return -ETIME;
#endif
It seems a valid solution for this interrupt is to program
sched_timer to the nearest future possible.
if (expires < now)
expires = now + safe_margin;
program_timer(expires);
(perhaps a for loop increasing safe_margin if program_timer fails...)
Is that what you mean by clean up, Frederic?
next prev parent reply other threads:[~2015-04-10 18:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-06 23:15 kernel/timer: avoid spurious ksoftirqd wakeups (v2) Marcelo Tosatti
2015-04-06 23:20 ` Rik van Riel
2015-04-07 21:10 ` Thomas Gleixner
2015-04-07 22:12 ` Frederic Weisbecker
2015-04-10 18:09 ` Marcelo Tosatti [this message]
2015-04-11 1:30 ` Luiz Capitulino
2015-04-11 9:25 ` Thomas Gleixner
2015-04-13 15:06 ` Luiz Capitulino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150410180907.GA13199@amt.cnet \
--to=mtosatti@redhat.com \
--cc=fweisbec@gmail.com \
--cc=lcapitulino@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.