public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: yinghai@kernel.org, mingo@elte.hu, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: Problem: scaling of /proc/stat on large systems
Date: Mon, 4 Oct 2010 09:34:15 -0500	[thread overview]
Message-ID: <20101004143414.GA4261@sgi.com> (raw)
In-Reply-To: <20100930140901.037f9dc7.kamezawa.hiroyu@jp.fujitsu.com>

On Thu, Sep 30, 2010 at 02:09:01PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 29 Sep 2010 07:22:06 -0500
> Jack Steiner <steiner@sgi.com> wrote:


I was able to run on the 4096p system over the weekend. The patch is a 
definite improvement & partially fixes the problem:

A "cat /proc/stat >/dev/null" improved:

        OLD:    real    12.627s
        NEW:    real     2.459


A large part of the remaining overhead is in the second summation 
 of irq information:


    static int show_stat(struct seq_file *p, void *v)
        ...
        /* sum again ? it could be updated? */
        for_each_irq_nr(j) {
                per_irq_sum = 0;
                for_each_possible_cpu(i)
                        per_irq_sum += kstat_irqs_cpu(j, i);

                seq_printf(p, " %u", per_irq_sum);
        }

Can this be fixed using the same approach as in the current patch?


--- jack

> 
> > I'm looking for suggestions on how to fix a scaling problem with access to
> > /proc/stat.
> > 
> > On a large x86_64 system (4096p, 256 nodes, 5530 IRQs), access to
> > /proc/stat takes too long -  more than 12 sec:
> > 
> > 	# time cat /proc/stat >/dev/null
> > 	real	12.630s
> > 	user	 0.000s
> > 	sys	12.629s
> > 
> > This affects top, ps (some variants), w, glibc (sysconf) and much more.
> > 
> > 
> > One of the items reported in /proc/stat is a total count of interrupts that
> > have been received. This calculation requires summation of the interrupts
> > received on each cpu (kstat_irqs_cpu()).
> > 
> > The data is kept in per-cpu arrays linked to each irq_desc. On a
> > 4096p/5530IRQ system summing this data requires accessing ~90MB.
> > 
> Wow.
> 
> > 
> > Deleting the summation of the kstat_irqs_cpu data eliminates the high
> > access time but is an API breakage that I assume is unacceptible.
> > 
> > Another possibility would be using delayed work (similar to vmstat_update)
> > that periodically sums the data into a single array. The disadvantage in
> > this approach is that there would be a delay between receipt of an
> > interrupt & it's count appearing /proc/stat. Is this an issue for anyone?
> > Another disadvantage is that it adds to the overall "noise" introduced by
> > kernel threads.
> > 
> > Is there a better approach to take?
> > 
> 
> Hmm, this ? 
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> /proc/stat shows the total number of all interrupts to each cpu. But when
> the number of IRQs are very large, it take very long time and 'cat /proc/stat'
> takes more than 10 secs. This is because sum of all irq events are counted
> when /proc/stat is read. This patch adds "sum of all irq" counter percpu
> and reduce read costs.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  fs/proc/stat.c              |    4 +---
>  include/linux/kernel_stat.h |   14 ++++++++++++--
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> Index: mmotm-0922/fs/proc/stat.c
> ===================================================================
> --- mmotm-0922.orig/fs/proc/stat.c
> +++ mmotm-0922/fs/proc/stat.c
> @@ -52,9 +52,7 @@ static int show_stat(struct seq_file *p,
>  		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
>  		guest_nice = cputime64_add(guest_nice,
>  			kstat_cpu(i).cpustat.guest_nice);
> -		for_each_irq_nr(j) {
> -			sum += kstat_irqs_cpu(j, i);
> -		}
> +		sum = kstat_cpu_irqs_sum(i);
>  		sum += arch_irq_stat_cpu(i);
>  
>  		for (j = 0; j < NR_SOFTIRQS; j++) {
> Index: mmotm-0922/include/linux/kernel_stat.h
> ===================================================================
> --- mmotm-0922.orig/include/linux/kernel_stat.h
> +++ mmotm-0922/include/linux/kernel_stat.h
> @@ -33,6 +33,7 @@ struct kernel_stat {
>  #ifndef CONFIG_GENERIC_HARDIRQS
>         unsigned int irqs[NR_IRQS];
>  #endif
> +	unsigned long irqs_sum;
>  	unsigned int softirqs[NR_SOFTIRQS];
>  };
>  
> @@ -54,6 +55,7 @@ static inline void kstat_incr_irqs_this_
>  					    struct irq_desc *desc)
>  {
>  	kstat_this_cpu.irqs[irq]++;
> +	kstat_this_cpu.irqs_sum++;
>  }
>  
>  static inline unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
> @@ -65,8 +67,9 @@ static inline unsigned int kstat_irqs_cp
>  extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
>  #define kstat_irqs_this_cpu(DESC) \
>  	((DESC)->kstat_irqs[smp_processor_id()])
> -#define kstat_incr_irqs_this_cpu(irqno, DESC) \
> -	((DESC)->kstat_irqs[smp_processor_id()]++)
> +#define kstat_incr_irqs_this_cpu(irqno, DESC) do {\
> +	((DESC)->kstat_irqs[smp_processor_id()]++);\
> +	kstat_this_cpu.irqs_sum++;} while (0)
>  
>  #endif
>  
> @@ -94,6 +97,13 @@ static inline unsigned int kstat_irqs(un
>  	return sum;
>  }
>  
> +/*
> + * Number of interrupts per cpu, since bootup
> + */
> +static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu)
> +{
> +	return kstat_cpu(cpu).irqs_sum;
> +}
>  
>  /*
>   * Lock/unlock the current runqueue - to extract task statistics:
> 

  reply	other threads:[~2010-10-04 14:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-29 12:22 Problem: scaling of /proc/stat on large systems Jack Steiner
2010-09-30  5:09 ` KAMEZAWA Hiroyuki
2010-10-04 14:34   ` Jack Steiner [this message]
2010-10-05  1:36     ` KAMEZAWA Hiroyuki
2010-10-05  8:19       ` KAMEZAWA Hiroyuki
2010-10-08 16:35         ` Jack Steiner
2010-10-12  0:09           ` KAMEZAWA Hiroyuki
2010-10-12  0:22             ` Andrew Morton
2010-10-12  1:02               ` KAMEZAWA Hiroyuki
2010-10-12  2:37               ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter KAMEZAWA Hiroyuki
2010-10-12  2:39                 ` [PATCH 2/2] improve footprint of kstat_irqs() for large system's /proc/stat KAMEZAWA Hiroyuki
2010-10-12  3:05                 ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter Yinghai Lu
2010-10-12  3:11                   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101004143414.GA4261@sgi.com \
    --to=steiner@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox