From: Frederic Weisbecker <fweisbec@gmail.com>
To: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>, Tejun Heo <tj@kernel.org>,
John Stultz <johnstul@us.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Mel Gorman <mel@csn.ul.ie>, Mike Frysinger <vapier@gentoo.org>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Konstantin Khlebnikov <khlebnikov@openvz.org>,
Christoph Lameter <cl@linux.com>,
Chris Metcalf <cmetcalf@tilera.com>,
Hakan Akkan <hakanakkan@gmail.com>,
Max Krasnyansky <maxk@qualcomm.com>,
linux-mm@kvack.org
Subject: Re: [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about no future event
Date: Fri, 4 May 2012 14:04:58 +0200 [thread overview]
Message-ID: <20120504120455.GB4413@somewhere.redhat.com> (raw)
In-Reply-To: <1336056962-10465-2-git-send-email-gilad@benyossef.com>
On Thu, May 03, 2012 at 05:55:57PM +0300, Gilad Ben-Yossef wrote:
> Current timer code fails to correctly return a value meaning
> that there is no future timer event, with the result that
> the timer keeps getting re-armed in HZ one shot mode even
> when we could turn it off, generating unneeded interrupts.
> This patch attempts to fix this problem.
>
> What is happening is that when __next_timer_interrupt() wishes
> to return a value that signifies "there is no future timer
> event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).
>
> However, the code in tick_nohz_stop_sched_tick(), which called
> __next_timer_interrupt() via get_next_timer_interrupt(),
> compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
> to see if the timer needs to be re-armed.
>
> base->timer_jiffies != last_jiffies and so
> tick_nohz_stop_sched_tick() interperts the return value as
> indication that there is a distant future event 12 days
> from now and programs the timer to fire next after KIME_MAX
> nsecs instead of avoiding to arm it. This ends up causesing
> a needless interrupt once every KTIME_MAX nsecs.
Good catch! So if I understand correctly, base->timer_jiffies can
be backward compared to last_jiffies. If we return
base->timer_jiffies + NEXT_TIMER_MAX_DELTA, the next_jiffies - last_jiffies
diff gives a delta that is a bit before NEXT_TIMER_MAX_DELTA.
And this can indeed happen if we haven't got any timer list executed since
we updated jiffies last, timer_jiffies can be a backward compared to last_jiffies.
This is harmless but causes needless timers.
I just have small comment below:
>
> I've noticed a similar but slightly different fix to the
> same problem in the Tilera kernel tree from Chris M. (I've
> wrote this before seeing that one), so some variation of this
> fix is in use on real hardware for some time now.
>
> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Tejun Heo <tj@kernel.org>
> CC: John Stultz <johnstul@us.ibm.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Mike Frysinger <vapier@gentoo.org>
> CC: David Rientjes <rientjes@google.com>
> CC: Hugh Dickins <hughd@google.com>
> CC: Minchan Kim <minchan.kim@gmail.com>
> CC: Konstantin Khlebnikov <khlebnikov@openvz.org>
> CC: Christoph Lameter <cl@linux.com>
> CC: Chris Metcalf <cmetcalf@tilera.com>
> CC: Hakan Akkan <hakanakkan@gmail.com>
> CC: Max Krasnyansky <maxk@qualcomm.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: linux-kernel@vger.kernel.org
> CC: linux-mm@kvack.org
> ---
> kernel/timer.c | 31 +++++++++++++++++++++----------
> 1 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index a297ffc..32ba64a 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1187,11 +1187,13 @@ static inline void __run_timers(struct tvec_base *base)
> * is used on S/390 to stop all activity when a CPU is idle.
> * This function needs to be called with interrupts disabled.
> */
> -static unsigned long __next_timer_interrupt(struct tvec_base *base)
> +static bool __next_timer_interrupt(struct tvec_base *base,
> + unsigned long *next_timer)
> {
> unsigned long timer_jiffies = base->timer_jiffies;
> unsigned long expires = timer_jiffies + NEXT_TIMER_MAX_DELTA;
> - int index, slot, array, found = 0;
> + int index, slot, array;
> + bool found = false;
> struct timer_list *nte;
> struct tvec *varray[4];
>
> @@ -1202,12 +1204,12 @@ static unsigned long __next_timer_interrupt(struct tvec_base *base)
> if (tbase_get_deferrable(nte->base))
> continue;
>
> - found = 1;
> + found = true;
> expires = nte->expires;
> /* Look at the cascade bucket(s)? */
> if (!index || slot < index)
> goto cascade;
> - return expires;
> + goto out;
> }
> slot = (slot + 1) & TVR_MASK;
> } while (slot != index);
> @@ -1233,7 +1235,7 @@ cascade:
> if (tbase_get_deferrable(nte->base))
> continue;
>
> - found = 1;
> + found = true;
> if (time_before(nte->expires, expires))
> expires = nte->expires;
> }
> @@ -1245,7 +1247,7 @@ cascade:
> /* Look at the cascade bucket(s)? */
> if (!index || slot < index)
> break;
> - return expires;
> + goto out;
> }
> slot = (slot + 1) & TVN_MASK;
> } while (slot != index);
> @@ -1254,7 +1256,10 @@ cascade:
> timer_jiffies += TVN_SIZE - index;
> timer_jiffies >>= TVN_BITS;
> }
> - return expires;
> +out:
> + if (found)
> + *next_timer = expires;
> + return found;
> }
>
> /*
> @@ -1317,9 +1322,15 @@ unsigned long get_next_timer_interrupt(unsigned long now)
> if (cpu_is_offline(smp_processor_id()))
> return now + NEXT_TIMER_MAX_DELTA;
> spin_lock(&base->lock);
> - if (time_before_eq(base->next_timer, base->timer_jiffies))
> - base->next_timer = __next_timer_interrupt(base);
> - expires = base->next_timer;
> + if (time_before_eq(base->next_timer, base->timer_jiffies)) {
> +
> + if (__next_timer_interrupt(base, &expires))
> + base->next_timer = expires;
> + else
> + expires = now + NEXT_TIMER_MAX_DELTA;
I believe you can update base->next_timer to now + NEXT_TIMER_MAX_DELTA,
so on any further idle interrupt exit that call tick_nohz_stop_sched_tick(),
we won't get again the overhead of __next_timer_interrupt().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-05-04 12:05 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-03 14:55 [PATCH v1 0/6] reduce workqueue and timer noise Gilad Ben-Yossef
2012-05-03 14:55 ` [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about no future event Gilad Ben-Yossef
2012-05-04 12:04 ` Frederic Weisbecker [this message]
2012-05-04 12:20 ` Frederic Weisbecker
2012-05-25 20:48 ` Thomas Gleixner
2012-05-25 20:56 ` Chris Metcalf
2012-05-25 21:04 ` Thomas Gleixner
2012-05-03 14:55 ` [PATCH v1 2/6] workqueue: introduce schedule_on_each_cpu_mask Gilad Ben-Yossef
2012-05-04 4:44 ` Srivatsa S. Bhat
2012-05-03 14:55 ` [PATCH v1 3/6] workqueue: introduce schedule_on_each_cpu_cond Gilad Ben-Yossef
2012-05-03 15:39 ` Tejun Heo
2012-05-06 13:15 ` Gilad Ben-Yossef
2012-05-07 17:17 ` Tejun Heo
2012-05-09 14:26 ` Gilad Ben-Yossef
2012-05-04 4:51 ` Srivatsa S. Bhat
2012-05-06 13:16 ` Gilad Ben-Yossef
2012-05-03 14:56 ` [PATCH v1 4/6] mm: make lru_drain selective where it schedules work Gilad Ben-Yossef
2012-05-03 14:56 ` [PATCH v1 5/6] mm: make vmstat_update periodic run conditional Gilad Ben-Yossef
2012-05-07 15:29 ` Christoph Lameter
2012-05-07 19:33 ` KOSAKI Motohiro
2012-05-07 19:40 ` Christoph Lameter
2012-05-08 15:25 ` Gilad Ben-Yossef
2012-05-08 15:34 ` Christoph Lameter
2012-05-09 14:26 ` Gilad Ben-Yossef
2012-05-08 15:22 ` Gilad Ben-Yossef
2012-05-08 15:18 ` Gilad Ben-Yossef
2012-05-08 15:24 ` Christoph Lameter
2012-05-03 14:56 ` [PATCH v1 6/6] x86: make clocksource watchdog configurable (not for mainline) Gilad Ben-Yossef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120504120455.GB4413@somewhere.redhat.com \
--to=fweisbec@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=cmetcalf@tilera.com \
--cc=gilad@benyossef.com \
--cc=hakanakkan@gmail.com \
--cc=hughd@google.com \
--cc=johnstul@us.ibm.com \
--cc=khlebnikov@openvz.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maxk@qualcomm.com \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
--cc=rientjes@google.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vapier@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).