public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Correct nr_processes() when CPUs have been unplugged
@ 2009-11-03 10:11 Ian Campbell
  2009-11-03 15:51 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Ian Campbell @ 2009-11-03 10:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Rusty Russell, linux-kernel

nr_processes() returns the sum of the per cpu counter process_counts for
all online CPUs. This counter is incremented for the current CPU on
fork() and decremented for the current CPU on exit(). Since a process
does not necessarily fork and exit on the same CPU the process_count for
an individual CPU can be either positive or negative and effectively has
no meaning in isolation.

Therefore calculating the sum of process_counts over only the online
CPUs omits the processes which were started or stopped on any CPU which
has since been unplugged. Only the sum of process_counts across all
possible CPUs has meaning.

The only caller of nr_processes() is proc_root_getattr() which
calculates the number of links to /proc as
        stat->nlink = proc_root.nlink + nr_processes();

You don't have to be all that unlucky for the nr_processes() to return a
negative value leading to a negative number of links (or rather, an
apparently enormous number of links). If this happens then you can get
failures where things like "ls /proc" start to fail because they got an
-EOVERFLOW from some stat() call.

Example with some debugging inserted to show what goes on:
        # ps haux|wc -l
        nr_processes: CPU0:     90
        nr_processes: CPU1:     1030
        nr_processes: CPU2:     -900
        nr_processes: CPU3:     -136
        nr_processes: TOTAL:    84
        proc_root_getattr. nlink 12 + nr_processes() 84 = 96
        84
        # echo 0 >/sys/devices/system/cpu/cpu1/online
        # ps haux|wc -l
        nr_processes: CPU0:     85
        nr_processes: CPU2:     -901
        nr_processes: CPU3:     -137
        nr_processes: TOTAL:    -953
        proc_root_getattr. nlink 12 + nr_processes() -953 = -941
        75
        # stat /proc/
        nr_processes: CPU0:     84
        nr_processes: CPU2:     -901
        nr_processes: CPU3:     -137
        nr_processes: TOTAL:    -954
        proc_root_getattr. nlink 12 + nr_processes() -954 = -942
          File: `/proc/'
          Size: 0               Blocks: 0          IO Block: 1024   directory
        Device: 3h/3d   Inode: 1           Links: 4294966354
        Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
        Access: 2009-11-03 09:06:55.000000000 +0000
        Modify: 2009-11-03 09:06:55.000000000 +0000
        Change: 2009-11-03 09:06:55.000000000 +0000

I'm not 100% convinced that the per_cpu regions remain valid for offline
CPUs, although my testing suggests that they do. If not then I think the
correct solution would be to aggregate the process_count for a given CPU
into a global base value in cpu_down().

This bug appears to pre-date the transition to git and it looks like it
may even have been present in linux-2.6.0-test7-bk3 since it looks like
the code Rusty patched in http://lwn.net/Articles/64773/ was already
wrong.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff --git a/kernel/fork.c b/kernel/fork.c
index 4b36858..7af7217 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -89,7 +89,7 @@ int nr_processes(void)
 	int cpu;
 	int total = 0;
 
-	for_each_online_cpu(cpu)
+	for_each_possible_cpu(cpu)
 		total += per_cpu(process_counts, cpu);
 
 	return total;



^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-11-05  0:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-03 10:11 [PATCH] Correct nr_processes() when CPUs have been unplugged Ian Campbell
2009-11-03 15:51 ` Linus Torvalds
2009-11-03 16:07 ` Ingo Molnar
2009-11-03 18:34   ` Christoph Lameter
2009-11-04  6:09     ` Paul E. McKenney
2009-11-04 19:37       ` Christoph Lameter
2009-11-04 19:44         ` Paul E. McKenney
2009-11-04 10:42     ` Ingo Molnar
2009-11-05  0:43     ` Rusty Russell
2009-11-04 11:10   ` Ian Campbell
2009-11-04  8:34 ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox