linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>, Tejun Heo <tj@kernel.org>,
	John Stultz <johnstul@us.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mel Gorman <mel@csn.ul.ie>, Mike Frysinger <vapier@gentoo.org>,
	David Rientjes <rientjes@google.com>,
	Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	Christoph Lameter <cl@linux.com>,
	Chris Metcalf <cmetcalf@tilera.com>,
	Hakan Akkan <hakanakkan@gmail.com>,
	Max Krasnyansky <maxk@qualcomm.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about no future event
Date: Fri, 4 May 2012 14:04:58 +0200	[thread overview]
Message-ID: <20120504120455.GB4413@somewhere.redhat.com> (raw)
In-Reply-To: <1336056962-10465-2-git-send-email-gilad@benyossef.com>

On Thu, May 03, 2012 at 05:55:57PM +0300, Gilad Ben-Yossef wrote:
> Current timer code fails to correctly return a value meaning
> that there is no future timer event, with the result that
> the timer keeps getting re-armed in HZ one shot mode even
> when we could turn it off, generating unneeded interrupts.
> This patch attempts to fix this problem.
> 
> What is happening is that when __next_timer_interrupt() wishes
> to return a value that signifies "there is no future timer
> event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).
> 
> However, the code in tick_nohz_stop_sched_tick(), which called
> __next_timer_interrupt() via get_next_timer_interrupt(),
> compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
> to see if the timer needs to be re-armed.
> 
> base->timer_jiffies != last_jiffies and so
> tick_nohz_stop_sched_tick() interperts the return value as
> indication that there is a distant future event 12 days
> from now and programs the timer to fire next after KIME_MAX
> nsecs instead of avoiding to arm it. This ends up causesing
> a needless interrupt once every KTIME_MAX nsecs.

Good catch! So if I understand correctly, base->timer_jiffies can
be backward compared to last_jiffies. If we return
base->timer_jiffies + NEXT_TIMER_MAX_DELTA, the next_jiffies - last_jiffies
diff gives a delta that is a bit before NEXT_TIMER_MAX_DELTA.

And this can indeed happen if we haven't got any timer list executed since
we updated jiffies last, timer_jiffies can be a backward compared to last_jiffies.

This is harmless but causes needless timers.

I just have small comment below:

> 
> I've noticed a similar but slightly different fix to the
> same problem in the Tilera kernel tree from Chris M. (I've
> wrote this before seeing that one), so some variation of this
> fix is in use on real hardware for some time now.
> 
> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Tejun Heo <tj@kernel.org>
> CC: John Stultz <johnstul@us.ibm.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Mike Frysinger <vapier@gentoo.org>
> CC: David Rientjes <rientjes@google.com>
> CC: Hugh Dickins <hughd@google.com>
> CC: Minchan Kim <minchan.kim@gmail.com>
> CC: Konstantin Khlebnikov <khlebnikov@openvz.org>
> CC: Christoph Lameter <cl@linux.com>
> CC: Chris Metcalf <cmetcalf@tilera.com>
> CC: Hakan Akkan <hakanakkan@gmail.com>
> CC: Max Krasnyansky <maxk@qualcomm.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: linux-kernel@vger.kernel.org
> CC: linux-mm@kvack.org
> ---
>  kernel/timer.c |   31 +++++++++++++++++++++----------
>  1 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/timer.c b/kernel/timer.c
> index a297ffc..32ba64a 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1187,11 +1187,13 @@ static inline void __run_timers(struct tvec_base *base)
>   * is used on S/390 to stop all activity when a CPU is idle.
>   * This function needs to be called with interrupts disabled.
>   */
> -static unsigned long __next_timer_interrupt(struct tvec_base *base)
> +static bool __next_timer_interrupt(struct tvec_base *base,
> +					unsigned long *next_timer)
>  {
>  	unsigned long timer_jiffies = base->timer_jiffies;
>  	unsigned long expires = timer_jiffies + NEXT_TIMER_MAX_DELTA;
> -	int index, slot, array, found = 0;
> +	int index, slot, array;
> +	bool found = false;
>  	struct timer_list *nte;
>  	struct tvec *varray[4];
>  
> @@ -1202,12 +1204,12 @@ static unsigned long __next_timer_interrupt(struct tvec_base *base)
>  			if (tbase_get_deferrable(nte->base))
>  				continue;
>  
> -			found = 1;
> +			found = true;
>  			expires = nte->expires;
>  			/* Look at the cascade bucket(s)? */
>  			if (!index || slot < index)
>  				goto cascade;
> -			return expires;
> +			goto out;
>  		}
>  		slot = (slot + 1) & TVR_MASK;
>  	} while (slot != index);
> @@ -1233,7 +1235,7 @@ cascade:
>  				if (tbase_get_deferrable(nte->base))
>  					continue;
>  
> -				found = 1;
> +				found = true;
>  				if (time_before(nte->expires, expires))
>  					expires = nte->expires;
>  			}
> @@ -1245,7 +1247,7 @@ cascade:
>  				/* Look at the cascade bucket(s)? */
>  				if (!index || slot < index)
>  					break;
> -				return expires;
> +				goto out;
>  			}
>  			slot = (slot + 1) & TVN_MASK;
>  		} while (slot != index);
> @@ -1254,7 +1256,10 @@ cascade:
>  			timer_jiffies += TVN_SIZE - index;
>  		timer_jiffies >>= TVN_BITS;
>  	}
> -	return expires;
> +out:
> +	if (found)
> +		*next_timer = expires;
> +	return found;
>  }
>  
>  /*
> @@ -1317,9 +1322,15 @@ unsigned long get_next_timer_interrupt(unsigned long now)
>  	if (cpu_is_offline(smp_processor_id()))
>  		return now + NEXT_TIMER_MAX_DELTA;
>  	spin_lock(&base->lock);
> -	if (time_before_eq(base->next_timer, base->timer_jiffies))
> -		base->next_timer = __next_timer_interrupt(base);
> -	expires = base->next_timer;
> +	if (time_before_eq(base->next_timer, base->timer_jiffies)) {
> +
> +		if (__next_timer_interrupt(base, &expires))
> +			base->next_timer = expires;
> +		else
> +			expires = now + NEXT_TIMER_MAX_DELTA;

I believe you can update base->next_timer to now + NEXT_TIMER_MAX_DELTA,
so on any further idle interrupt exit that call tick_nohz_stop_sched_tick(),
we won't get again the overhead of __next_timer_interrupt().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-05-04 12:05 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-03 14:55 [PATCH v1 0/6] reduce workqueue and timer noise Gilad Ben-Yossef
2012-05-03 14:55 ` [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about no future event Gilad Ben-Yossef
2012-05-04 12:04   ` Frederic Weisbecker [this message]
2012-05-04 12:20     ` Frederic Weisbecker
2012-05-25 20:48   ` Thomas Gleixner
2012-05-25 20:56     ` Chris Metcalf
2012-05-25 21:04       ` Thomas Gleixner
2012-05-03 14:55 ` [PATCH v1 2/6] workqueue: introduce schedule_on_each_cpu_mask Gilad Ben-Yossef
2012-05-04  4:44   ` Srivatsa S. Bhat
2012-05-03 14:55 ` [PATCH v1 3/6] workqueue: introduce schedule_on_each_cpu_cond Gilad Ben-Yossef
2012-05-03 15:39   ` Tejun Heo
2012-05-06 13:15     ` Gilad Ben-Yossef
2012-05-07 17:17       ` Tejun Heo
2012-05-09 14:26         ` Gilad Ben-Yossef
2012-05-04  4:51   ` Srivatsa S. Bhat
2012-05-06 13:16     ` Gilad Ben-Yossef
2012-05-03 14:56 ` [PATCH v1 4/6] mm: make lru_drain selective where it schedules work Gilad Ben-Yossef
2012-05-03 14:56 ` [PATCH v1 5/6] mm: make vmstat_update periodic run conditional Gilad Ben-Yossef
2012-05-07 15:29   ` Christoph Lameter
2012-05-07 19:33     ` KOSAKI Motohiro
2012-05-07 19:40       ` Christoph Lameter
2012-05-08 15:25         ` Gilad Ben-Yossef
2012-05-08 15:34           ` Christoph Lameter
2012-05-09 14:26             ` Gilad Ben-Yossef
2012-05-08 15:22       ` Gilad Ben-Yossef
2012-05-08 15:18     ` Gilad Ben-Yossef
2012-05-08 15:24       ` Christoph Lameter
2012-05-03 14:56 ` [PATCH v1 6/6] x86: make clocksource watchdog configurable (not for mainline) Gilad Ben-Yossef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120504120455.GB4413@somewhere.redhat.com \
    --to=fweisbec@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=cmetcalf@tilera.com \
    --cc=gilad@benyossef.com \
    --cc=hakanakkan@gmail.com \
    --cc=hughd@google.com \
    --cc=johnstul@us.ibm.com \
    --cc=khlebnikov@openvz.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maxk@qualcomm.com \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vapier@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).