From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+w=401wt.eu-S932146AbZEGSSY@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932146AbZEGSSY (ORCPT <rfc822;w@1wt.eu>);
	Thu, 7 May 2009 14:18:24 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755858AbZEGSSM
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Thu, 7 May 2009 14:18:12 -0400
Received: from relay2.sgi.com ([192.48.179.30]:36500 "EHLO relay.sgi.com"
	rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
	id S1755022AbZEGSSK (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 7 May 2009 14:18:10 -0400
Date: Thu, 7 May 2009 13:18:10 -0500
From: Dimitri Sivanich <sivanich@sgi.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
       LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
       Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [patch 2/3] timer: move calc_load to softirq
Message-ID: <20090507181810.GB6549@sgi.com>
References: <20090502190500.513452861@linutronix.de> <20090502190545.489264416@linutronix.de> <20090502122459.ccc870fc.akpm@linux-foundation.org> <alpine.LFD.2.00.0905022138010.3375@localhost.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.LFD.2.00.0905022138010.3375@localhost.localdomain>
User-Agent: Mutt/1.5.17 (2007-11-01)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, May 02, 2009 at 09:54:49PM +0200, Thomas Gleixner wrote:
> On Sat, 2 May 2009, Andrew Morton wrote:
> > > +	spin_lock(&avenrun_lock);
> > > +	ticks = atomic_read(&avenrun_ticks);
> > > +	if (ticks >= LOAD_FREQ) {
> > > +		atomic_sub(LOAD_FREQ, &avenrun_ticks);
> > > +		calc_global_load();
> > >  	}
> > > +	spin_unlock(&avenrun_lock);
> > > +	*calc = 0;
> > > +}
> > 
> > I wonder if we really really need avenrun_lock.  Various bits of code
> > (eg net/sched/em_meta.c) cheerily read avenrun[] without locking.
> 
> I don't care about the reader side anyway, the lock is just there to
> protect the calc_load update from two cpus, but that's probably
> paranoia.
> 
> Though, there is a theoretical race between 2 cpus which might want to
> update avenrun_ticks in the NOHZ case, but thinking more about it we
> can just prevent this by clever usage of the atomic ops on
> avenrun_ticks.
> 
> Thanks,
> 
> 	tglx
> 
> ----------->
> Subject: timer: move calc_load to softirq
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Sat, 2 May 2009 19:43:41 +0200
> 
> xtime_lock is held write locked across calc_load() which iterates over
> all online CPUs. That can cause long latencies for xtime_lock readers
> on large SMP systems. The load average calculation is an rough
> estimate anyway so there is no real need to protect the readers
> vs. the update. It's not a problem when the avenrun array is updated
> while a reader copies the values.
> 
> Move the calculation to the softirq and reduce the xtime_lock write
> locked section. This also reduces the interrupts off section.
> 
> Inspired by an inital patch from Dimitri Sivanich.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Acked-by: Dimitri Sivanich <sivanich@sgi.com>

> ---
>  kernel/time/timekeeping.c |    2 -
>  kernel/timer.c            |   57 ++++++++++++++++++++++++++++++++--------------
>  2 files changed, 41 insertions(+), 18 deletions(-)
> 
> Index: linux-2.6/kernel/time/timekeeping.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/timekeeping.c
> +++ linux-2.6/kernel/time/timekeeping.c
> @@ -22,7 +22,7 @@
>  
>  /*
>   * This read-write spinlock protects us from races in SMP while
> - * playing with xtime and avenrun.
> + * playing with xtime.
>   */
>  __cacheline_aligned_in_smp DEFINE_SEQLOCK(xtime_lock);
>  
> Index: linux-2.6/kernel/timer.c
> ===================================================================
> --- linux-2.6.orig/kernel/timer.c
> +++ linux-2.6/kernel/timer.c
> @@ -1127,12 +1127,13 @@ void update_process_times(int user_tick)
>   * imply that avenrun[] is the standard name for this kind of thing.
>   * Nothing else seems to be standardized: the fractional size etc
>   * all seem to differ on different machines.
> - *
> - * Requires xtime_lock to access.
>   */
>  unsigned long avenrun[3];
>  EXPORT_SYMBOL(avenrun);
>  
> +static atomic_t avenrun_ticks;
> +static DEFINE_PER_CPU(int, avenrun_calculate);
> +
>  static unsigned long
>  calc_load(unsigned long load, unsigned long exp, unsigned long active)
>  {
> @@ -1143,23 +1144,44 @@ calc_load(unsigned long load, unsigned l
>  
>  /*
>   * calc_load - given tick count, update the avenrun load estimates.
> - * This is called while holding a write_lock on xtime_lock.
>   */
> -static void calc_global_load(unsigned long ticks)
> +static void calc_global_load(void)
>  {
> -	unsigned long active_tasks; /* fixed-point */
> -	static int count = LOAD_FREQ;
> +	unsigned long active_tasks = nr_active() * FIXED_1;
>  
> -	count -= ticks;
> -	if (unlikely(count < 0)) {
> -		active_tasks = nr_active() * FIXED_1;
> -		do {
> -			avenrun[0] = calc_load(avenrun[0], EXP_1, active_tasks);
> -			avenrun[1] = calc_load(avenrun[1], EXP_5, active_tasks);
> -			avenrun[2] = calc_load(avenrun[2], EXP_15, active_tasks);
> -			count += LOAD_FREQ;
> -		} while (count < 0);
> -	}
> +	avenrun[0] = calc_load(avenrun[0], EXP_1, active_tasks);
> +	avenrun[1] = calc_load(avenrun[1], EXP_5, active_tasks);
> +	avenrun[2] = calc_load(avenrun[2], EXP_15, active_tasks);
> +}
> +
> +/*
> + * Check whether do_timer has set avenrun_calculate. The variable is
> + * cpu local so we avoid cache line bouncing of avenrun_ticks.
> + */
> +static void check_calc_load(void)
> +{
> +	int ticks, *calc = &__get_cpu_var(avenrun_calculate);
> +
> +	if (!*calc)
> +		return;
> +
> +	ticks = atomic_sub_return(LOAD_FREQ, &avenrun_ticks);
> +	if (ticks >= 0)
> +		calc_global_load();
> +	else
> +		atomic_add(LOAD_FREQ, &avenrun_ticks);
> +	*calc = 0;
> +}
> +
> +/*
> + * Update avenrun_ticks and trigger the load calculation when the
> + * result is >= LOAD_FREQ.
> + */
> +static void calc_load_update(unsigned long ticks)
> +{
> +	ticks = atomic_add_return(ticks, &avenrun_ticks);
> +	if (ticks >= LOAD_FREQ)
> +		__get_cpu_var(avenrun_calculate) = 1;
>  }
>  
>  /*
> @@ -1169,6 +1191,7 @@ static void run_timer_softirq(struct sof
>  {
>  	struct tvec_base *base = __get_cpu_var(tvec_bases);
>  
> +	check_calc_load();
>  	hrtimer_run_pending();
>  
>  	if (time_after_eq(jiffies, base->timer_jiffies))
> @@ -1192,7 +1215,7 @@ void run_local_timers(void)
>  static inline void update_times(unsigned long ticks)
>  {
>  	update_wall_time();
> -	calc_global_load(ticks);
> +	calc_load_update(ticks);
>  }
>  
>  /*
>