From: Jack Steiner <steiner@sgi.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: yinghai@kernel.org, mingo@elte.hu, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: Problem: scaling of /proc/stat on large systems
Date: Mon, 4 Oct 2010 09:34:15 -0500 [thread overview]
Message-ID: <20101004143414.GA4261@sgi.com> (raw)
In-Reply-To: <20100930140901.037f9dc7.kamezawa.hiroyu@jp.fujitsu.com>
On Thu, Sep 30, 2010 at 02:09:01PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 29 Sep 2010 07:22:06 -0500
> Jack Steiner <steiner@sgi.com> wrote:
I was able to run on the 4096p system over the weekend. The patch is a
definite improvement & partially fixes the problem:
A "cat /proc/stat >/dev/null" improved:
OLD: real 12.627s
NEW: real 2.459
A large part of the remaining overhead is in the second summation
of irq information:
static int show_stat(struct seq_file *p, void *v)
...
/* sum again ? it could be updated? */
for_each_irq_nr(j) {
per_irq_sum = 0;
for_each_possible_cpu(i)
per_irq_sum += kstat_irqs_cpu(j, i);
seq_printf(p, " %u", per_irq_sum);
}
Can this be fixed using the same approach as in the current patch?
--- jack
>
> > I'm looking for suggestions on how to fix a scaling problem with access to
> > /proc/stat.
> >
> > On a large x86_64 system (4096p, 256 nodes, 5530 IRQs), access to
> > /proc/stat takes too long - more than 12 sec:
> >
> > # time cat /proc/stat >/dev/null
> > real 12.630s
> > user 0.000s
> > sys 12.629s
> >
> > This affects top, ps (some variants), w, glibc (sysconf) and much more.
> >
> >
> > One of the items reported in /proc/stat is a total count of interrupts that
> > have been received. This calculation requires summation of the interrupts
> > received on each cpu (kstat_irqs_cpu()).
> >
> > The data is kept in per-cpu arrays linked to each irq_desc. On a
> > 4096p/5530IRQ system summing this data requires accessing ~90MB.
> >
> Wow.
>
> >
> > Deleting the summation of the kstat_irqs_cpu data eliminates the high
> > access time but is an API breakage that I assume is unacceptible.
> >
> > Another possibility would be using delayed work (similar to vmstat_update)
> > that periodically sums the data into a single array. The disadvantage in
> > this approach is that there would be a delay between receipt of an
> > interrupt & it's count appearing /proc/stat. Is this an issue for anyone?
> > Another disadvantage is that it adds to the overall "noise" introduced by
> > kernel threads.
> >
> > Is there a better approach to take?
> >
>
> Hmm, this ?
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> /proc/stat shows the total number of all interrupts to each cpu. But when
> the number of IRQs are very large, it take very long time and 'cat /proc/stat'
> takes more than 10 secs. This is because sum of all irq events are counted
> when /proc/stat is read. This patch adds "sum of all irq" counter percpu
> and reduce read costs.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> fs/proc/stat.c | 4 +---
> include/linux/kernel_stat.h | 14 ++++++++++++--
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> Index: mmotm-0922/fs/proc/stat.c
> ===================================================================
> --- mmotm-0922.orig/fs/proc/stat.c
> +++ mmotm-0922/fs/proc/stat.c
> @@ -52,9 +52,7 @@ static int show_stat(struct seq_file *p,
> guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
> guest_nice = cputime64_add(guest_nice,
> kstat_cpu(i).cpustat.guest_nice);
> - for_each_irq_nr(j) {
> - sum += kstat_irqs_cpu(j, i);
> - }
> + sum = kstat_cpu_irqs_sum(i);
> sum += arch_irq_stat_cpu(i);
>
> for (j = 0; j < NR_SOFTIRQS; j++) {
> Index: mmotm-0922/include/linux/kernel_stat.h
> ===================================================================
> --- mmotm-0922.orig/include/linux/kernel_stat.h
> +++ mmotm-0922/include/linux/kernel_stat.h
> @@ -33,6 +33,7 @@ struct kernel_stat {
> #ifndef CONFIG_GENERIC_HARDIRQS
> unsigned int irqs[NR_IRQS];
> #endif
> + unsigned long irqs_sum;
> unsigned int softirqs[NR_SOFTIRQS];
> };
>
> @@ -54,6 +55,7 @@ static inline void kstat_incr_irqs_this_
> struct irq_desc *desc)
> {
> kstat_this_cpu.irqs[irq]++;
> + kstat_this_cpu.irqs_sum++;
> }
>
> static inline unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
> @@ -65,8 +67,9 @@ static inline unsigned int kstat_irqs_cp
> extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
> #define kstat_irqs_this_cpu(DESC) \
> ((DESC)->kstat_irqs[smp_processor_id()])
> -#define kstat_incr_irqs_this_cpu(irqno, DESC) \
> - ((DESC)->kstat_irqs[smp_processor_id()]++)
> +#define kstat_incr_irqs_this_cpu(irqno, DESC) do {\
> + ((DESC)->kstat_irqs[smp_processor_id()]++);\
> + kstat_this_cpu.irqs_sum++;} while (0)
>
> #endif
>
> @@ -94,6 +97,13 @@ static inline unsigned int kstat_irqs(un
> return sum;
> }
>
> +/*
> + * Number of interrupts per cpu, since bootup
> + */
> +static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu)
> +{
> + return kstat_cpu(cpu).irqs_sum;
> +}
>
> /*
> * Lock/unlock the current runqueue - to extract task statistics:
>
next prev parent reply other threads:[~2010-10-04 14:34 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-29 12:22 Problem: scaling of /proc/stat on large systems Jack Steiner
2010-09-30 5:09 ` KAMEZAWA Hiroyuki
2010-10-04 14:34 ` Jack Steiner [this message]
2010-10-05 1:36 ` KAMEZAWA Hiroyuki
2010-10-05 8:19 ` KAMEZAWA Hiroyuki
2010-10-08 16:35 ` Jack Steiner
2010-10-12 0:09 ` KAMEZAWA Hiroyuki
2010-10-12 0:22 ` Andrew Morton
2010-10-12 1:02 ` KAMEZAWA Hiroyuki
2010-10-12 2:37 ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter KAMEZAWA Hiroyuki
2010-10-12 2:39 ` [PATCH 2/2] improve footprint of kstat_irqs() for large system's /proc/stat KAMEZAWA Hiroyuki
2010-10-12 3:05 ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter Yinghai Lu
2010-10-12 3:11 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101004143414.GA4261@sgi.com \
--to=steiner@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.