From: Frederic Weisbecker <fweisbec@gmail.com>
To: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>, Tejun Heo <tj@kernel.org>,
John Stultz <johnstul@us.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Mel Gorman <mel@csn.ul.ie>, Mike Frysinger <vapier@gentoo.org>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Konstantin Khlebnikov <khlebnikov@openvz.org>,
Christoph Lameter <cl@linux.com>,
Chris Metcalf <cmetcalf@tilera.com>,
Hakan Akkan <hakanakkan@gmail.com>,
Max Krasnyansky <maxk@qualcomm.com>,
linux-mm@kvack.org
Subject: Re: [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about no future event
Date: Fri, 4 May 2012 14:04:58 +0200 [thread overview]
Message-ID: <20120504120455.GB4413@somewhere.redhat.com> (raw)
In-Reply-To: <1336056962-10465-2-git-send-email-gilad@benyossef.com>
On Thu, May 03, 2012 at 05:55:57PM +0300, Gilad Ben-Yossef wrote:
> Current timer code fails to correctly return a value meaning
> that there is no future timer event, with the result that
> the timer keeps getting re-armed in HZ one shot mode even
> when we could turn it off, generating unneeded interrupts.
> This patch attempts to fix this problem.
>
> What is happening is that when __next_timer_interrupt() wishes
> to return a value that signifies "there is no future timer
> event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).
>
> However, the code in tick_nohz_stop_sched_tick(), which called
> __next_timer_interrupt() via get_next_timer_interrupt(),
> compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
> to see if the timer needs to be re-armed.
>
> base->timer_jiffies != last_jiffies and so
> tick_nohz_stop_sched_tick() interperts the return value as
> indication that there is a distant future event 12 days
> from now and programs the timer to fire next after KIME_MAX
> nsecs instead of avoiding to arm it. This ends up causesing
> a needless interrupt once every KTIME_MAX nsecs.
Good catch! So if I understand correctly, base->timer_jiffies can
be backward compared to last_jiffies. If we return
base->timer_jiffies + NEXT_TIMER_MAX_DELTA, the next_jiffies - last_jiffies
diff gives a delta that is a bit before NEXT_TIMER_MAX_DELTA.
And this can indeed happen if we haven't got any timer list executed since
we updated jiffies last, timer_jiffies can be a backward compared to last_jiffies.
This is harmless but causes needless timers.
I just have small comment below:
>
> I've noticed a similar but slightly different fix to the
> same problem in the Tilera kernel tree from Chris M. (I've
> wrote this before seeing that one), so some variation of this
> fix is in use on real hardware for some time now.
>
> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Tejun Heo <tj@kernel.org>
> CC: John Stultz <johnstul@us.ibm.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Mike Frysinger <vapier@gentoo.org>
> CC: David Rientjes <rientjes@google.com>
> CC: Hugh Dickins <hughd@google.com>
> CC: Minchan Kim <minchan.kim@gmail.com>
> CC: Konstantin Khlebnikov <khlebnikov@openvz.org>
> CC: Christoph Lameter <cl@linux.com>
> CC: Chris Metcalf <cmetcalf@tilera.com>
> CC: Hakan Akkan <hakanakkan@gmail.com>
> CC: Max Krasnyansky <maxk@qualcomm.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: linux-kernel@vger.kernel.org
> CC: linux-mm@kvack.org
> ---
> kernel/timer.c | 31 +++++++++++++++++++++----------
> 1 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index a297ffc..32ba64a 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1187,11 +1187,13 @@ static inline void __run_timers(struct tvec_base *base)
> * is used on S/390 to stop all activity when a CPU is idle.
> * This function needs to be called with interrupts disabled.
> */
> -static unsigned long __next_timer_interrupt(struct tvec_base *base)
> +static bool __next_timer_interrupt(struct tvec_base *base,
> + unsigned long *next_timer)
> {
> unsigned long timer_jiffies = base->timer_jiffies;
> unsigned long expires = timer_jiffies + NEXT_TIMER_MAX_DELTA;
> - int index, slot, array, found = 0;
> + int index, slot, array;
> + bool found = false;
> struct timer_list *nte;
> struct tvec *varray[4];
>
> @@ -1202,12 +1204,12 @@ static unsigned long __next_timer_interrupt(struct tvec_base *base)
> if (tbase_get_deferrable(nte->base))
> continue;
>
> - found = 1;
> + found = true;
> expires = nte->expires;
> /* Look at the cascade bucket(s)? */
> if (!index || slot < index)
> goto cascade;
> - return expires;
> + goto out;
> }
> slot = (slot + 1) & TVR_MASK;
> } while (slot != index);
> @@ -1233,7 +1235,7 @@ cascade:
> if (tbase_get_deferrable(nte->base))
> continue;
>
> - found = 1;
> + found = true;
> if (time_before(nte->expires, expires))
> expires = nte->expires;
> }
> @@ -1245,7 +1247,7 @@ cascade:
> /* Look at the cascade bucket(s)? */
> if (!index || slot < index)
> break;
> - return expires;
> + goto out;
> }
> slot = (slot + 1) & TVN_MASK;
> } while (slot != index);
> @@ -1254,7 +1256,10 @@ cascade:
> timer_jiffies += TVN_SIZE - index;
> timer_jiffies >>= TVN_BITS;
> }
> - return expires;
> +out:
> + if (found)
> + *next_timer = expires;
> + return found;
> }
>
> /*
> @@ -1317,9 +1322,15 @@ unsigned long get_next_timer_interrupt(unsigned long now)
> if (cpu_is_offline(smp_processor_id()))
> return now + NEXT_TIMER_MAX_DELTA;
> spin_lock(&base->lock);
> - if (time_before_eq(base->next_timer, base->timer_jiffies))
> - base->next_timer = __next_timer_interrupt(base);
> - expires = base->next_timer;
> + if (time_before_eq(base->next_timer, base->timer_jiffies)) {
> +
> + if (__next_timer_interrupt(base, &expires))
> + base->next_timer = expires;
> + else
> + expires = now + NEXT_TIMER_MAX_DELTA;
I believe you can update base->next_timer to now + NEXT_TIMER_MAX_DELTA,
so on any further idle interrupt exit that call tick_nohz_stop_sched_tick(),
we won't get again the overhead of __next_timer_interrupt().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>, Tejun Heo <tj@kernel.org>,
John Stultz <johnstul@us.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Mel Gorman <mel@csn.ul.ie>, Mike Frysinger <vapier@gentoo.org>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Konstantin Khlebnikov <khlebnikov@openvz.org>,
Christoph Lameter <cl@linux.com>,
Chris Metcalf <cmetcalf@tilera.com>,
Hakan Akkan <hakanakkan@gmail.com>,
Max Krasnyansky <maxk@qualcomm.com>,
linux-mm@kvack.org
Subject: Re: [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about no future event
Date: Fri, 4 May 2012 14:04:58 +0200 [thread overview]
Message-ID: <20120504120455.GB4413@somewhere.redhat.com> (raw)
In-Reply-To: <1336056962-10465-2-git-send-email-gilad@benyossef.com>
On Thu, May 03, 2012 at 05:55:57PM +0300, Gilad Ben-Yossef wrote:
> Current timer code fails to correctly return a value meaning
> that there is no future timer event, with the result that
> the timer keeps getting re-armed in HZ one shot mode even
> when we could turn it off, generating unneeded interrupts.
> This patch attempts to fix this problem.
>
> What is happening is that when __next_timer_interrupt() wishes
> to return a value that signifies "there is no future timer
> event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).
>
> However, the code in tick_nohz_stop_sched_tick(), which called
> __next_timer_interrupt() via get_next_timer_interrupt(),
> compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
> to see if the timer needs to be re-armed.
>
> base->timer_jiffies != last_jiffies and so
> tick_nohz_stop_sched_tick() interperts the return value as
> indication that there is a distant future event 12 days
> from now and programs the timer to fire next after KIME_MAX
> nsecs instead of avoiding to arm it. This ends up causesing
> a needless interrupt once every KTIME_MAX nsecs.
Good catch! So if I understand correctly, base->timer_jiffies can
be backward compared to last_jiffies. If we return
base->timer_jiffies + NEXT_TIMER_MAX_DELTA, the next_jiffies - last_jiffies
diff gives a delta that is a bit before NEXT_TIMER_MAX_DELTA.
And this can indeed happen if we haven't got any timer list executed since
we updated jiffies last, timer_jiffies can be a backward compared to last_jiffies.
This is harmless but causes needless timers.
I just have small comment below:
>
> I've noticed a similar but slightly different fix to the
> same problem in the Tilera kernel tree from Chris M. (I've
> wrote this before seeing that one), so some variation of this
> fix is in use on real hardware for some time now.
>
> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Tejun Heo <tj@kernel.org>
> CC: John Stultz <johnstul@us.ibm.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Mike Frysinger <vapier@gentoo.org>
> CC: David Rientjes <rientjes@google.com>
> CC: Hugh Dickins <hughd@google.com>
> CC: Minchan Kim <minchan.kim@gmail.com>
> CC: Konstantin Khlebnikov <khlebnikov@openvz.org>
> CC: Christoph Lameter <cl@linux.com>
> CC: Chris Metcalf <cmetcalf@tilera.com>
> CC: Hakan Akkan <hakanakkan@gmail.com>
> CC: Max Krasnyansky <maxk@qualcomm.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: linux-kernel@vger.kernel.org
> CC: linux-mm@kvack.org
> ---
> kernel/timer.c | 31 +++++++++++++++++++++----------
> 1 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index a297ffc..32ba64a 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1187,11 +1187,13 @@ static inline void __run_timers(struct tvec_base *base)
> * is used on S/390 to stop all activity when a CPU is idle.
> * This function needs to be called with interrupts disabled.
> */
> -static unsigned long __next_timer_interrupt(struct tvec_base *base)
> +static bool __next_timer_interrupt(struct tvec_base *base,
> + unsigned long *next_timer)
> {
> unsigned long timer_jiffies = base->timer_jiffies;
> unsigned long expires = timer_jiffies + NEXT_TIMER_MAX_DELTA;
> - int index, slot, array, found = 0;
> + int index, slot, array;
> + bool found = false;
> struct timer_list *nte;
> struct tvec *varray[4];
>
> @@ -1202,12 +1204,12 @@ static unsigned long __next_timer_interrupt(struct tvec_base *base)
> if (tbase_get_deferrable(nte->base))
> continue;
>
> - found = 1;
> + found = true;
> expires = nte->expires;
> /* Look at the cascade bucket(s)? */
> if (!index || slot < index)
> goto cascade;
> - return expires;
> + goto out;
> }
> slot = (slot + 1) & TVR_MASK;
> } while (slot != index);
> @@ -1233,7 +1235,7 @@ cascade:
> if (tbase_get_deferrable(nte->base))
> continue;
>
> - found = 1;
> + found = true;
> if (time_before(nte->expires, expires))
> expires = nte->expires;
> }
> @@ -1245,7 +1247,7 @@ cascade:
> /* Look at the cascade bucket(s)? */
> if (!index || slot < index)
> break;
> - return expires;
> + goto out;
> }
> slot = (slot + 1) & TVN_MASK;
> } while (slot != index);
> @@ -1254,7 +1256,10 @@ cascade:
> timer_jiffies += TVN_SIZE - index;
> timer_jiffies >>= TVN_BITS;
> }
> - return expires;
> +out:
> + if (found)
> + *next_timer = expires;
> + return found;
> }
>
> /*
> @@ -1317,9 +1322,15 @@ unsigned long get_next_timer_interrupt(unsigned long now)
> if (cpu_is_offline(smp_processor_id()))
> return now + NEXT_TIMER_MAX_DELTA;
> spin_lock(&base->lock);
> - if (time_before_eq(base->next_timer, base->timer_jiffies))
> - base->next_timer = __next_timer_interrupt(base);
> - expires = base->next_timer;
> + if (time_before_eq(base->next_timer, base->timer_jiffies)) {
> +
> + if (__next_timer_interrupt(base, &expires))
> + base->next_timer = expires;
> + else
> + expires = now + NEXT_TIMER_MAX_DELTA;
I believe you can update base->next_timer to now + NEXT_TIMER_MAX_DELTA,
so on any further idle interrupt exit that call tick_nohz_stop_sched_tick(),
we won't get again the overhead of __next_timer_interrupt().
next prev parent reply other threads:[~2012-05-04 12:05 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-03 14:55 [PATCH v1 0/6] reduce workqueue and timer noise Gilad Ben-Yossef
2012-05-03 14:55 ` Gilad Ben-Yossef
2012-05-03 14:55 ` [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about no future event Gilad Ben-Yossef
2012-05-03 14:55 ` Gilad Ben-Yossef
2012-05-04 12:04 ` Frederic Weisbecker [this message]
2012-05-04 12:04 ` Frederic Weisbecker
2012-05-04 12:20 ` Frederic Weisbecker
2012-05-04 12:20 ` Frederic Weisbecker
2012-05-25 20:48 ` Thomas Gleixner
2012-05-25 20:48 ` Thomas Gleixner
2012-05-25 20:56 ` Chris Metcalf
2012-05-25 20:56 ` Chris Metcalf
2012-05-25 21:04 ` Thomas Gleixner
2012-05-25 21:04 ` Thomas Gleixner
2012-05-03 14:55 ` [PATCH v1 2/6] workqueue: introduce schedule_on_each_cpu_mask Gilad Ben-Yossef
2012-05-03 14:55 ` Gilad Ben-Yossef
2012-05-04 4:44 ` Srivatsa S. Bhat
2012-05-04 4:44 ` Srivatsa S. Bhat
2012-05-03 14:55 ` [PATCH v1 3/6] workqueue: introduce schedule_on_each_cpu_cond Gilad Ben-Yossef
2012-05-03 14:55 ` Gilad Ben-Yossef
2012-05-03 15:39 ` Tejun Heo
2012-05-03 15:39 ` Tejun Heo
2012-05-06 13:15 ` Gilad Ben-Yossef
2012-05-06 13:15 ` Gilad Ben-Yossef
2012-05-07 17:17 ` Tejun Heo
2012-05-07 17:17 ` Tejun Heo
2012-05-09 14:26 ` Gilad Ben-Yossef
2012-05-09 14:26 ` Gilad Ben-Yossef
2012-05-04 4:51 ` Srivatsa S. Bhat
2012-05-04 4:51 ` Srivatsa S. Bhat
2012-05-06 13:16 ` Gilad Ben-Yossef
2012-05-06 13:16 ` Gilad Ben-Yossef
2012-05-03 14:56 ` [PATCH v1 4/6] mm: make lru_drain selective where it schedules work Gilad Ben-Yossef
2012-05-03 14:56 ` Gilad Ben-Yossef
2012-05-03 14:56 ` [PATCH v1 5/6] mm: make vmstat_update periodic run conditional Gilad Ben-Yossef
2012-05-03 14:56 ` Gilad Ben-Yossef
2012-05-07 15:29 ` Christoph Lameter
2012-05-07 15:29 ` Christoph Lameter
2012-05-07 19:33 ` KOSAKI Motohiro
2012-05-07 19:33 ` KOSAKI Motohiro
2012-05-07 19:40 ` Christoph Lameter
2012-05-07 19:40 ` Christoph Lameter
2012-05-08 15:25 ` Gilad Ben-Yossef
2012-05-08 15:25 ` Gilad Ben-Yossef
2012-05-08 15:34 ` Christoph Lameter
2012-05-08 15:34 ` Christoph Lameter
2012-05-09 14:26 ` Gilad Ben-Yossef
2012-05-09 14:26 ` Gilad Ben-Yossef
2012-05-08 15:22 ` Gilad Ben-Yossef
2012-05-08 15:22 ` Gilad Ben-Yossef
2012-05-08 15:18 ` Gilad Ben-Yossef
2012-05-08 15:18 ` Gilad Ben-Yossef
2012-05-08 15:24 ` Christoph Lameter
2012-05-08 15:24 ` Christoph Lameter
2012-05-03 14:56 ` [PATCH v1 6/6] x86: make clocksource watchdog configurable (not for mainline) Gilad Ben-Yossef
2012-05-03 14:56 ` Gilad Ben-Yossef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120504120455.GB4413@somewhere.redhat.com \
--to=fweisbec@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=cmetcalf@tilera.com \
--cc=gilad@benyossef.com \
--cc=hakanakkan@gmail.com \
--cc=hughd@google.com \
--cc=johnstul@us.ibm.com \
--cc=khlebnikov@openvz.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maxk@qualcomm.com \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
--cc=rientjes@google.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vapier@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.