public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Reading /proc/stat is slooow
@ 2005-12-02 17:42 Andreas Schwab
  2005-12-05 22:21 ` Luck, Tony
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Andreas Schwab @ 2005-12-02 17:42 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

At some point after 2.6.13 reading /proc/stat has become too slow.  This
is most visible with xosview, which spends up to 80% in kernel space.  I
have attached a simple program that you can run with "strace -r" to see
that the second read of /proc/stat which eventually returns EOF takes too
much time.  If you remove the loop around read the same amount of time is
now spent in close.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

[-- Attachment #2: stat.c --]
[-- Type: text/plain, Size: 300 bytes --]

#include <unistd.h>
#include <fcntl.h>

static char buf[8192];

int
main (int argc, char **argv)
{
  const char *file = argc > 1 ? argv[1] : "/proc/stat";
  while (1)
    {
      int fd = open (file, O_RDONLY);
      while (read (fd, buf, 8192) > 0);
      close (fd);
      usleep (300000);
    }
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reading /proc/stat is slooow
  2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
@ 2005-12-05 22:21 ` Luck, Tony
  2005-12-05 23:36 ` Andreas Schwab
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Luck, Tony @ 2005-12-05 22:21 UTC (permalink / raw)
  To: linux-ia64

On Fri, Dec 02, 2005 at 06:42:49PM +0100, Andreas Schwab wrote:
> At some point after 2.6.13 reading /proc/stat has become too slow.  This
> is most visible with xosview, which spends up to 80% in kernel space.  I
> have attached a simple program that you can run with "strace -r" to see
> that the second read of /proc/stat which eventually returns EOF takes too
> much time.  If you remove the loop around read the same amount of time is
> now spent in close.


How slow is sloow?  Or did this get fixed already?  Here's some
strace -r output from 2.6.15-rc5 (on 2cpu HP rx2620, 1.3GHz):

     0.000163 open("/proc/stat", O_RDONLY) = 3
     0.000166 read(3, "cpu  1695 1 5317 1124699 4124 6 "..., 8192) = 723
     0.000221 read(3, "", 8192)         = 0
     0.000135 close(3)                  = 0
     0.000141 nanosleep({0, 300000000}, NULL) = 0
     0.303585 open("/proc/stat", O_RDONLY) = 3
     0.000135 read(3, "cpu  1695 1 5317 1124851 4124 6 "..., 8192) = 723
     0.000214 read(3, "", 8192)         = 0
     0.000132 close(3)                  = 0
     0.000130 nanosleep({0, 300000000}, NULL) = 0

So yes, the read that returns the EOF takes way longer than the read
that returns the actual data, but is that the amount of sloowness that
you see?  The whole "open, read, read, close" sequence is < 1ms.

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reading /proc/stat is slooow
  2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
  2005-12-05 22:21 ` Luck, Tony
@ 2005-12-05 23:36 ` Andreas Schwab
  2005-12-05 23:46 ` Andreas Schwab
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Andreas Schwab @ 2005-12-05 23:36 UTC (permalink / raw)
  To: linux-ia64

"Luck, Tony" <tony.luck@intel.com> writes:

> How slow is sloow?

About 20 miliseconds, ten times of what I see with 2.6.13.  It appears to
be that the generation of /proc/stat doesn't scale well any more with the
number of configured CPUs (CONFIG_NR_CPUS is 512 here).  Processes that
monitor /proc/stat add a significant amount of load to the system.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reading /proc/stat is slooow
  2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
  2005-12-05 22:21 ` Luck, Tony
  2005-12-05 23:36 ` Andreas Schwab
@ 2005-12-05 23:46 ` Andreas Schwab
  2005-12-06  0:19 ` Luck, Tony
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Andreas Schwab @ 2005-12-05 23:46 UTC (permalink / raw)
  To: linux-ia64

"Luck, Tony" <tony.luck@intel.com> writes:

> On Fri, Dec 02, 2005 at 06:42:49PM +0100, Andreas Schwab wrote:
>> At some point after 2.6.13 reading /proc/stat has become too slow.  This
>> is most visible with xosview, which spends up to 80% in kernel space.  I
>> have attached a simple program that you can run with "strace -r" to see

Actually I meant to use "strace -T", which displays the time spent _in_
the syscall.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reading /proc/stat is slooow
  2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
                   ` (2 preceding siblings ...)
  2005-12-05 23:46 ` Andreas Schwab
@ 2005-12-06  0:19 ` Luck, Tony
  2005-12-06  1:21 ` Luck, Tony
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Luck, Tony @ 2005-12-06  0:19 UTC (permalink / raw)
  To: linux-ia64

On Tue, Dec 06, 2005 at 12:36:20AM +0100, Andreas Schwab wrote:
> About 20 miliseconds, ten times of what I see with 2.6.13.  It appears to
> be that the generation of /proc/stat doesn't scale well any more with the
> number of configured CPUs (CONFIG_NR_CPUS is 512 here).  Processes that
> monitor /proc/stat add a significant amount of load to the system.

I just tried CONFIG_NR_CPUS\x1024 ... and I don't see anything
like a 20ms delay on 2.6.15-rc5.  How many actual cpus on your
machine (I only have two on the machine where I tested)?

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Reading /proc/stat is slooow
  2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
                   ` (3 preceding siblings ...)
  2005-12-06  0:19 ` Luck, Tony
@ 2005-12-06  1:21 ` Luck, Tony
  2005-12-06  6:15 ` Luck, Tony
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Luck, Tony @ 2005-12-06  1:21 UTC (permalink / raw)
  To: linux-ia64

>I just tried CONFIG_NR_CPUS\x1024 ... and I don't see anything
>like a 20ms delay on 2.6.15-rc5.  How many actual cpus on your
>machine (I only have two on the machine where I tested)?

Ah ... but I didn't have CONFIG_HOTPLUG_CPU set ... so my
cpu_possible_map will have only had the two bits set for the
two cpus that exist in my system.  So I was only allocating
a small buffer in stat_open() [which is my best guess for why
you see more time ... with NR_CPUSQ2 you are kmalloc'ing
a big hunk of memory].

With cpu hotplug and NR_CPUS\x1024 I now see 53ms for the first
of the reads from /proc/stat (as reported by strace -T).

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Reading /proc/stat is slooow
  2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
                   ` (4 preceding siblings ...)
  2005-12-06  1:21 ` Luck, Tony
@ 2005-12-06  6:15 ` Luck, Tony
  2005-12-06  9:24 ` Zou, Nanhai
  2005-12-06  9:57 ` Reading /proc/stat is slooow Andreas Schwab
  7 siblings, 0 replies; 12+ messages in thread
From: Luck, Tony @ 2005-12-06  6:15 UTC (permalink / raw)
  To: linux-ia64

The huge bulk of the time is consumed in this loop:

#if !defined(CONFIG_PPC64) && !defined(CONFIG_ALPHA)
        for (i = 0; i < NR_IRQS; i++)
                seq_printf(p, " %u", kstat_irqs(i));
#endif

But why is this worse now than in 2.6.13?  At the top of
show_stat() we already looked at all of the kstat_cpu(i).irqs[j]
elements [but with the loops reversed: outer loop is all cpus,
inner loop counts 0..NR_IRQ].

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Reading /proc/stat is slooow
  2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
                   ` (5 preceding siblings ...)
  2005-12-06  6:15 ` Luck, Tony
@ 2005-12-06  9:24 ` Zou, Nanhai
  2005-12-06 16:58   ` Luck, Tony
  2005-12-06  9:57 ` Reading /proc/stat is slooow Andreas Schwab
  7 siblings, 1 reply; 12+ messages in thread
From: Zou, Nanhai @ 2005-12-06  9:24 UTC (permalink / raw)
  To: linux-ia64


> -----Original Message-----
> From: linux-ia64-owner@vger.kernel.org
> [mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Luck, Tony
> Sent: 2005Äê12ÔÂ6ÈÕ 14:16
> To: Andreas Schwab
> Cc: linux-ia64@vger.kernel.org
> Subject: RE: Reading /proc/stat is slooow
> 
> The huge bulk of the time is consumed in this loop:
> 
> #if !defined(CONFIG_PPC64) && !defined(CONFIG_ALPHA)
>         for (i = 0; i < NR_IRQS; i++)
>                 seq_printf(p, " %u", kstat_irqs(i));
> #endif
> 
> But why is this worse now than in 2.6.13?  At the top of
> show_stat() we already looked at all of the kstat_cpu(i).irqs[j]
> elements [but with the loops reversed: outer loop is all cpus,
> inner loop counts 0..NR_IRQ].

  I think the reason of second loop much slower than the first one is page coloring.
  For the first loop, at least cache line is hot in the inner loop.
For the second loop, things will be much worse because percpu data offset is 64k....
 But I have no idea of why 2.6.13 is much faster here.

Zou Nan hai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reading /proc/stat is slooow
  2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
                   ` (6 preceding siblings ...)
  2005-12-06  9:24 ` Zou, Nanhai
@ 2005-12-06  9:57 ` Andreas Schwab
  7 siblings, 0 replies; 12+ messages in thread
From: Andreas Schwab @ 2005-12-06  9:57 UTC (permalink / raw)
  To: linux-ia64

"Luck, Tony" <tony.luck@intel.com> writes:

>>I just tried CONFIG_NR_CPUS\x1024 ... and I don't see anything
>>like a 20ms delay on 2.6.15-rc5.  How many actual cpus on your
>>machine (I only have two on the machine where I tested)?
>
> Ah ... but I didn't have CONFIG_HOTPLUG_CPU set

Ah, that's the difference.  I didn't have CONFIG_HOTPLUG_CPU set in 2.6.13
either.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reading /proc/stat is slooow
  2005-12-06  9:24 ` Zou, Nanhai
@ 2005-12-06 16:58   ` Luck, Tony
  2005-12-06 17:10     ` Andreas Schwab
  0 siblings, 1 reply; 12+ messages in thread
From: Luck, Tony @ 2005-12-06 16:58 UTC (permalink / raw)
  To: Zou, Nanhai; +Cc: Andreas Schwab, linux-ia64, linux-kernel

On Tue, Dec 06, 2005 at 05:24:16PM +0800, Zou, Nanhai wrote:
>   I think the reason of second loop much slower than the first one is page coloring.
>   For the first loop, at least cache line is hot in the inner loop.
> For the second loop, things will be much worse because percpu data offset is 64k....

Two possible fixes:
1) Compute the per irq sums during the first (cache-friendly) loop.  This has
the downside that we need to allocate NR_IRQS*sizeof(int) to save the sums
until we need them.  This might not be popular for other architectures who
don't have a big problem here as they don't allow such large values for NR_CPUS.

2) The problem loop is already #ifdef'd out for PPC64 and ALPHA.  We could add
IA64 to that exclusive club and just not include the per irq sums.  Since kstat_irqs()
computes the sums in an "int", they will wrap frequently on a large system
(512 cpus * default 250Hz = 128000 ... which wraps a 32-bit unsigned in 9 hours
and 19 minutes) ... so their usefulness is questionable.  Does xosview use
the per-irq values?

>  But I have no idea of why 2.6.13 is much faster here.
Andreas didn't have CPU_HOTPLUG turned on in his 2.6.13 build.  Which means that
the "for_each_cpu" loop inside kstat_cpus() only touched the percpu area for the
cpus on his system (with CPU_HOTPLUG=n cpu_possible map is equal to cpu_present).

proof of concept patch for option #1 (drops time from 52ms to 3ms with NR_CPUS\x1024):

--- a/fs/proc/proc_misc.c	2005-12-05 17:09:25.000000000 -0800
+++ b/fs/proc/proc_misc.c	2005-12-06 08:47:43.000000000 -0800
@@ -343,6 +343,7 @@
 	unsigned long jif;
 	cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
 	u64 sum = 0;
+	int *perirqsums;
 
 	user = nice = system = idle = iowait  		irq = softirq = steal = cputime64_zero;
@@ -350,6 +351,7 @@
 	if (wall_to_monotonic.tv_nsec)
 		--jif;
 
+	perirqsums = kcalloc(NR_IRQS, sizeof (*perirqsums), GFP_KERNEL);
 	for_each_cpu(i) {
 		int j;
 
@@ -361,8 +363,10 @@
 		irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
 		softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
 		steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
-		for (j = 0 ; j < NR_IRQS ; j++)
+		for (j = 0 ; j < NR_IRQS ; j++) {
 			sum += kstat_cpu(i).irqs[j];
+			perirqsums[j] += kstat_cpu(i).irqs[j];
+		}
 	}
 
 	seq_printf(p, "cpu  %llu %llu %llu %llu %llu %llu %llu %llu\n",
@@ -400,8 +404,9 @@
 
 #if !defined(CONFIG_PPC64) && !defined(CONFIG_ALPHA)
 	for (i = 0; i < NR_IRQS; i++)
-		seq_printf(p, " %u", kstat_irqs(i));
+		seq_printf(p, " %u", perirqsums[i]);
 #endif
+	kfree(perirqsums);
 
 	seq_printf(p,
 		"\nctxt %llu\n"

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reading /proc/stat is slooow
  2005-12-06 16:58   ` Luck, Tony
@ 2005-12-06 17:10     ` Andreas Schwab
  2005-12-07 19:54       ` [PATCH] Drop per-irq counters from /proc/stat (Was: Reading /proc/stat is slooow) Luck, Tony
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Schwab @ 2005-12-06 17:10 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Zou, Nanhai, linux-ia64, linux-kernel

"Luck, Tony" <tony.luck@intel.com> writes:

> 2) The problem loop is already #ifdef'd out for PPC64 and ALPHA.  We could add
> IA64 to that exclusive club and just not include the per irq sums.  Since kstat_irqs()
> computes the sums in an "int", they will wrap frequently on a large system
> (512 cpus * default 250Hz = 128000 ... which wraps a 32-bit unsigned in 9 hours
> and 19 minutes) ... so their usefulness is questionable.  Does xosview use
> the per-irq values?

It doesn't use them, it uses /proc/interrupts instead.  So IMHO this would
be the preferred solution.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] Drop per-irq counters from /proc/stat (Was: Reading /proc/stat is slooow)
  2005-12-06 17:10     ` Andreas Schwab
@ 2005-12-07 19:54       ` Luck, Tony
  0 siblings, 0 replies; 12+ messages in thread
From: Luck, Tony @ 2005-12-07 19:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ia64, Andreas Schwab

On Tue, Dec 06, 2005 at 06:10:40PM +0100, Andreas Schwab wrote:
> "Luck, Tony" <tony.luck@intel.com> writes:
> 
> > 2) The problem loop is already #ifdef'd out for PPC64 and ALPHA.  We could add
> > IA64 to that exclusive club and just not include the per irq sums.  Since kstat_irqs()
> > computes the sums in an "int", they will wrap frequently on a large system
> > (512 cpus * default 250Hz = 128000 ... which wraps a 32-bit unsigned in 9 hours
> > and 19 minutes) ... so their usefulness is questionable.  Does xosview use
> > the per-irq values?
> 
> It doesn't use them, it uses /proc/interrupts instead.  So IMHO this would
> be the preferred solution.

Anyone know of any applications that *DO* depend on the per-irq counters
in /proc/stat?

I could just add IA64 to the ugly #ifdef:

-#if !defined(CONFIG_PPC64) && !defined(CONFIG_ALPHA)
+#if !defined(CONFIG_PPC64) && !defined(CONFIG_ALPHA) && !defined(CONFIG_IA64)

But perhaps it might be simpler to skip straight to this:

[PATCH] Drop per-irq counters from /proc/stat

It is very expensive to produce the per-irq counter values in /proc/stat
(the code does a very cache unfriendly walk across all percpu structures
for each irq).  Some architectures (PPC64, ALPHA) don't provide this
information in this file.  Applications which do want this can derive it
by adding the percpu values in /proc/interrupts.

Signed-off-by: Tony Luck <tony.luck@intel.com>

---

diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
index 5b6b0b6..64d96bf 100644
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -396,15 +396,10 @@ static int show_stat(struct seq_file *p,
 			(unsigned long long)cputime64_to_clock_t(softirq),
 			(unsigned long long)cputime64_to_clock_t(steal));
 	}
-	seq_printf(p, "intr %llu", (unsigned long long)sum);
-
-#if !defined(CONFIG_PPC64) && !defined(CONFIG_ALPHA)
-	for (i = 0; i < NR_IRQS; i++)
-		seq_printf(p, " %u", kstat_irqs(i));
-#endif
+	seq_printf(p, "intr %llu\n", (unsigned long long)sum);
 
 	seq_printf(p,
-		"\nctxt %llu\n"
+		"ctxt %llu\n"
 		"btime %lu\n"
 		"processes %lu\n"
 		"procs_running %lu\n"


-Tony

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-12-07 19:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-02 17:42 Reading /proc/stat is slooow Andreas Schwab
2005-12-05 22:21 ` Luck, Tony
2005-12-05 23:36 ` Andreas Schwab
2005-12-05 23:46 ` Andreas Schwab
2005-12-06  0:19 ` Luck, Tony
2005-12-06  1:21 ` Luck, Tony
2005-12-06  6:15 ` Luck, Tony
2005-12-06  9:24 ` Zou, Nanhai
2005-12-06 16:58   ` Luck, Tony
2005-12-06 17:10     ` Andreas Schwab
2005-12-07 19:54       ` [PATCH] Drop per-irq counters from /proc/stat (Was: Reading /proc/stat is slooow) Luck, Tony
2005-12-06  9:57 ` Reading /proc/stat is slooow Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox