* [PATCH] Correct nr_processes() when CPUs have been unplugged
@ 2009-11-03 10:11 Ian Campbell
2009-11-03 15:51 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Ian Campbell @ 2009-11-03 10:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Rusty Russell, linux-kernel
nr_processes() returns the sum of the per cpu counter process_counts for
all online CPUs. This counter is incremented for the current CPU on
fork() and decremented for the current CPU on exit(). Since a process
does not necessarily fork and exit on the same CPU the process_count for
an individual CPU can be either positive or negative and effectively has
no meaning in isolation.
Therefore calculating the sum of process_counts over only the online
CPUs omits the processes which were started or stopped on any CPU which
has since been unplugged. Only the sum of process_counts across all
possible CPUs has meaning.
The only caller of nr_processes() is proc_root_getattr() which
calculates the number of links to /proc as
stat->nlink = proc_root.nlink + nr_processes();
You don't have to be all that unlucky for the nr_processes() to return a
negative value leading to a negative number of links (or rather, an
apparently enormous number of links). If this happens then you can get
failures where things like "ls /proc" start to fail because they got an
-EOVERFLOW from some stat() call.
Example with some debugging inserted to show what goes on:
# ps haux|wc -l
nr_processes: CPU0: 90
nr_processes: CPU1: 1030
nr_processes: CPU2: -900
nr_processes: CPU3: -136
nr_processes: TOTAL: 84
proc_root_getattr. nlink 12 + nr_processes() 84 = 96
84
# echo 0 >/sys/devices/system/cpu/cpu1/online
# ps haux|wc -l
nr_processes: CPU0: 85
nr_processes: CPU2: -901
nr_processes: CPU3: -137
nr_processes: TOTAL: -953
proc_root_getattr. nlink 12 + nr_processes() -953 = -941
75
# stat /proc/
nr_processes: CPU0: 84
nr_processes: CPU2: -901
nr_processes: CPU3: -137
nr_processes: TOTAL: -954
proc_root_getattr. nlink 12 + nr_processes() -954 = -942
File: `/proc/'
Size: 0 Blocks: 0 IO Block: 1024 directory
Device: 3h/3d Inode: 1 Links: 4294966354
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2009-11-03 09:06:55.000000000 +0000
Modify: 2009-11-03 09:06:55.000000000 +0000
Change: 2009-11-03 09:06:55.000000000 +0000
I'm not 100% convinced that the per_cpu regions remain valid for offline
CPUs, although my testing suggests that they do. If not then I think the
correct solution would be to aggregate the process_count for a given CPU
into a global base value in cpu_down().
This bug appears to pre-date the transition to git and it looks like it
may even have been present in linux-2.6.0-test7-bk3 since it looks like
the code Rusty patched in http://lwn.net/Articles/64773/ was already
wrong.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff --git a/kernel/fork.c b/kernel/fork.c
index 4b36858..7af7217 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -89,7 +89,7 @@ int nr_processes(void)
int cpu;
int total = 0;
- for_each_online_cpu(cpu)
+ for_each_possible_cpu(cpu)
total += per_cpu(process_counts, cpu);
return total;
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-03 10:11 [PATCH] Correct nr_processes() when CPUs have been unplugged Ian Campbell @ 2009-11-03 15:51 ` Linus Torvalds 2009-11-03 16:07 ` Ingo Molnar 2009-11-04 8:34 ` Rusty Russell 2 siblings, 0 replies; 11+ messages in thread From: Linus Torvalds @ 2009-11-03 15:51 UTC (permalink / raw) To: Ian Campbell; +Cc: Andrew Morton, Rusty Russell, linux-kernel On Tue, 3 Nov 2009, Ian Campbell wrote: > > Therefore calculating the sum of process_counts over only the online > CPUs omits the processes which were started or stopped on any CPU which > has since been unplugged. Only the sum of process_counts across all > possible CPUs has meaning. I think your patch is fine, but I also suspect that we could/should instead fix it by doing something like per_cpu(process_counts, this_cpu) += per_cpu(process_counts, down_cpu); per_cpu(process_counts, down_cpu) = 0; in the CPU off-lining case after having quieted 'down_cpu'. That way we could always use online CPU's rather than possible CPU's. I think that just sounds nicer. So I'll apply the patch, but I get the feeling it could be done better. Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-03 10:11 [PATCH] Correct nr_processes() when CPUs have been unplugged Ian Campbell 2009-11-03 15:51 ` Linus Torvalds @ 2009-11-03 16:07 ` Ingo Molnar 2009-11-03 18:34 ` Christoph Lameter 2009-11-04 11:10 ` Ian Campbell 2009-11-04 8:34 ` Rusty Russell 2 siblings, 2 replies; 11+ messages in thread From: Ingo Molnar @ 2009-11-03 16:07 UTC (permalink / raw) To: Ian Campbell, Tejun Heo, Paul E. McKenney Cc: Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel * Ian Campbell <Ian.Campbell@citrix.com> wrote: > I'm not 100% convinced that the per_cpu regions remain valid for > offline CPUs, although my testing suggests that they do. If not then I > think the correct solution would be to aggregate the process_count for > a given CPU into a global base value in cpu_down(). Sidenote: percpu areas currently are kept allocated on x86. That might change in the future though, especially with virtual systems where the possible range of CPUs can be very high - without us necessarily wanting to pay the percpu area allocation price for it. I.e. dynamic deallocation of percpu areas is something that could happen in the future. > This bug appears to pre-date the transition to git and it looks like > it may even have been present in linux-2.6.0-test7-bk3 since it looks > like the code Rusty patched in http://lwn.net/Articles/64773/ was > already wrong. Nice one. I'm wondering why it was not discovered for such a long time. That count can go out of sync easily, and we frequently offline cpus during suspend/resume, and /proc lookup failures will be noticed in general. How come nobody ran into this? And i'm wondering how you have run into this - running cpu hotplug stress-tests with Xen guests - or via pure code review? Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-03 16:07 ` Ingo Molnar @ 2009-11-03 18:34 ` Christoph Lameter 2009-11-04 6:09 ` Paul E. McKenney ` (2 more replies) 2009-11-04 11:10 ` Ian Campbell 1 sibling, 3 replies; 11+ messages in thread From: Christoph Lameter @ 2009-11-03 18:34 UTC (permalink / raw) To: Ingo Molnar Cc: Ian Campbell, Tejun Heo, Paul E. McKenney, Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel On Tue, 3 Nov 2009, Ingo Molnar wrote: > Sidenote: percpu areas currently are kept allocated on x86. They must be kept allocated for all possible cpus. Arch code cannot decide to not allocate per cpu areas. Search for "for_each_possible_cpu" in the source tree if you want more detail. > That might change in the future though, especially with virtual systems > where the possible range of CPUs can be very high - without us > necessarily wanting to pay the percpu area allocation price for it. I.e. > dynamic deallocation of percpu areas is something that could happen in > the future. Could be good but would not be as easy as you may think since core code assumes that possible cpus have per cpu areas configured. There will be the need for additional notifiers and more complex locking if we want to have percpu areas for online cpus only. Per cpu areas are permanent at this point which is a good existence guarantee that avoids all sorts of complex scenarios. > Nice one. I'm wondering why it was not discovered for such a long time. Cpu hotplug is rarely used (what you listed are rare and unusual cases) and therefore online cpus == possible cpus == present cpus. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-03 18:34 ` Christoph Lameter @ 2009-11-04 6:09 ` Paul E. McKenney 2009-11-04 19:37 ` Christoph Lameter 2009-11-04 10:42 ` Ingo Molnar 2009-11-05 0:43 ` Rusty Russell 2 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2009-11-04 6:09 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Ian Campbell, Tejun Heo, Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel On Tue, Nov 03, 2009 at 01:34:32PM -0500, Christoph Lameter wrote: > On Tue, 3 Nov 2009, Ingo Molnar wrote: > > > Sidenote: percpu areas currently are kept allocated on x86. > > They must be kept allocated for all possible cpus. Arch code cannot decide > to not allocate per cpu areas. > > Search for "for_each_possible_cpu" in the source tree if you want more > detail. Here are a few in my area: kernel/rcutorture.c srcu_torture_stats 523 for_each_possible_cpu(cpu) { kernel/rcutorture.c rcu_torture_printk 800 for_each_possible_cpu(cpu) { kernel/rcutorture.c rcu_torture_init 1127 for_each_possible_cpu(cpu) { kernel/rcutree.c RCU_DATA_PTR_INIT 1518 for_each_possible_cpu(i) { \ kernel/rcutree_trace.c PRINT_RCU_DATA 73 for_each_possible_cpu(_p_r_d_i) \ kernel/rcutree_trace.c print_rcu_pendings 237 for_each_possible_cpu(cpu) { kernel/srcu.c srcu_readers_active_idx 64 for_each_possible_cpu(cpu) > > That might change in the future though, especially with virtual systems > > where the possible range of CPUs can be very high - without us > > necessarily wanting to pay the percpu area allocation price for it. I.e. > > dynamic deallocation of percpu areas is something that could happen in > > the future. > > Could be good but would not be as easy as you may think since core code > assumes that possible cpus have per cpu areas configured. There will be > the need for additional notifiers and more complex locking if we want to > have percpu areas for online cpus only. Per cpu areas are permanent at > this point which is a good existence guarantee that avoids all sorts of > complex scenarios. Indeed. I will pick on the last one. I would need to track all the srcu_struct structures. Each such structure would need an additional counter. In the CPU_DYING notifier, SRCU would need to traverse all srcu_struct structures, zeroing the dying CPU's count and adding the old value to the additional counter. This is safe because CPU_DYING happens in stop_machine_run() context. Then srcu_readers_active_idx() would need to initialize "sum" to the additional counter rather than to zero. Not all -that- bad, and similar strategies could likely be put in place for the other six offenders in RCU. Another class of problems would be from code that did not actually access an offline CPU's per-CPU variables, but instead implicitly expected the values to remain across an offline-online event pair. The various rcu_data per-CPU structures would need some fixups when the CPU came back online. One way to approach this would be have two types of per-CPU variable, one type with current semantics, and another type that can go away when the corresponding CPU goes offline. This latter type probably needs to be set back to the initial values when the corresponding CPU comes back online. Of course, given an "easy way out", one might expect most people to opt for the old-style per-CPU variables. On the other hand, how much work do we want to do to save (say) four bytes? > > Nice one. I'm wondering why it was not discovered for such a long time. > > Cpu hotplug is rarely used (what you listed are rare and unusual cases) > and therefore online cpus == possible cpus == present cpus. Though it is not unusual for "possible cpus" to be quite a bit larger than "online cpus"... Thanx, Paul ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-04 6:09 ` Paul E. McKenney @ 2009-11-04 19:37 ` Christoph Lameter 2009-11-04 19:44 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Christoph Lameter @ 2009-11-04 19:37 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Ian Campbell, Tejun Heo, Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel On Tue, 3 Nov 2009, Paul E. McKenney wrote: > > Cpu hotplug is rarely used (what you listed are rare and unusual cases) > > and therefore online cpus == possible cpus == present cpus. > > Though it is not unusual for "possible cpus" to be quite a bit larger > than "online cpus"... The only reason for possible cpus to be larger than online cpus is if you can actually plugin more processors while the system is running. Typically the larger possible cpus seems to be some sort of Vendor BIOS screwup where the vendor advertises how many processors the maximum configuration of a certain type of hardware would suppport if one would buy that config. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-04 19:37 ` Christoph Lameter @ 2009-11-04 19:44 ` Paul E. McKenney 0 siblings, 0 replies; 11+ messages in thread From: Paul E. McKenney @ 2009-11-04 19:44 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Ian Campbell, Tejun Heo, Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel On Wed, Nov 04, 2009 at 02:37:06PM -0500, Christoph Lameter wrote: > On Tue, 3 Nov 2009, Paul E. McKenney wrote: > > > > Cpu hotplug is rarely used (what you listed are rare and unusual cases) > > > and therefore online cpus == possible cpus == present cpus. > > > > Though it is not unusual for "possible cpus" to be quite a bit larger > > than "online cpus"... > > The only reason for possible cpus to be larger than online cpus is if you > can actually plugin more processors while the system is running. Typically > the larger possible cpus seems to be some sort of Vendor BIOS screwup > where the vendor advertises how many processors the maximum configuration > of a certain type of hardware would suppport if one would buy that config. Or if you are running as a guest OS on top of some sort of hypervisor. Thanx, Paul ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-03 18:34 ` Christoph Lameter 2009-11-04 6:09 ` Paul E. McKenney @ 2009-11-04 10:42 ` Ingo Molnar 2009-11-05 0:43 ` Rusty Russell 2 siblings, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2009-11-04 10:42 UTC (permalink / raw) To: Christoph Lameter Cc: Ian Campbell, Tejun Heo, Paul E. McKenney, Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel * Christoph Lameter <cl@linux-foundation.org> wrote: > On Tue, 3 Nov 2009, Ingo Molnar wrote: > > > Sidenote: percpu areas currently are kept allocated on x86. > > They must be kept allocated for all possible cpus. Arch code cannot > decide to not allocate per cpu areas. > > Search for "for_each_possible_cpu" in the source tree if you want more > detail. This has come up in the past. On-demand deallocation of percpu areas is a future possibility that might be useful - see previous percpu discussions on lkml for details. All in one, the conclusion is that we dont want it yet (on-demand allocation of them is perfectly fine for now) - i just wanted to mention that this is a _current_ situation and an implementational choice, which judgement could change in the future. The more 'virtual' our abstraction of CPUs becomes, the less the notion of permanent allocation will be acceptable as time goes on. Dynamic allocation concepts are natural and the new, generic percpu allocator makes it more feasible to eventually deallocate percpu areas. (Of course various for_each_possible_cpu() assumptions will have to be fixed in that case.) Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-03 18:34 ` Christoph Lameter 2009-11-04 6:09 ` Paul E. McKenney 2009-11-04 10:42 ` Ingo Molnar @ 2009-11-05 0:43 ` Rusty Russell 2 siblings, 0 replies; 11+ messages in thread From: Rusty Russell @ 2009-11-05 0:43 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Ian Campbell, Tejun Heo, Paul E. McKenney, Linus Torvalds, Andrew Morton, linux-kernel On Wed, 4 Nov 2009 05:04:32 am Christoph Lameter wrote: > On Tue, 3 Nov 2009, Ingo Molnar wrote: > > > Sidenote: percpu areas currently are kept allocated on x86. > > They must be kept allocated for all possible cpus. Arch code cannot decide > to not allocate per cpu areas. > > Search for "for_each_possible_cpu" in the source tree if you want more > detail. Yeah, handling onlining/offlining of cpus is a hassle for most code. But I can see us wanting abstractions for counters which handle being per-cpu or per-node and doing the folding etc. automagically. It's best that this be done by looking at all the existing users to see if there's a nice API which would cover 90%. Cheers, Rusty. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-03 16:07 ` Ingo Molnar 2009-11-03 18:34 ` Christoph Lameter @ 2009-11-04 11:10 ` Ian Campbell 1 sibling, 0 replies; 11+ messages in thread From: Ian Campbell @ 2009-11-04 11:10 UTC (permalink / raw) To: Ingo Molnar Cc: Tejun Heo, Paul E. McKenney, Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel On Tue, 2009-11-03 at 16:07 +0000, Ingo Molnar wrote: > > > This bug appears to pre-date the transition to git and it looks > > like it may even have been present in linux-2.6.0-test7-bk3 since > > it looks like the code Rusty patched in > > http://lwn.net/Articles/64773/ was already wrong. > > Nice one. I'm wondering why it was not discovered for such a long > time. That count can go out of sync easily, and we frequently offline > cpus during suspend/resume, and /proc lookup failures will be noticed > in general. I think most people probably don't run for all that long with CPUs unplugged, in the suspend/resume case the unplugs are fairly transient and apart from the suspend/resume itself the system is fairly idle while the CPUs are not plugged. Note that once you plug all the CPUs back in the problem goes away again. I can't imagine very many things pay any attention to st_nlinks for /proc anyway, so as long as the stat itself succeeds things will trundle on. > How come nobody ran into this? And i'm wondering how you have run > into this - running cpu hotplug stress-tests with Xen guests - or via > pure code review? We run our Xen domain 0 with a single VCPU by unplugging the others on boot. We only actually noticed the issue when we switched our installer to do the same for consistency. The installer uses uclibc and IIRC (the original discovery was a little while ago) it was using an older variant of stat(2) which doesn't have a st_nlinks field wide enough to represent the bogus value and so returned -EOVERFLOW. My guess is that most systems these days have a libc which uses a newer variant of stat(2) which is able to represent the large (but invalid) value so stat() succeeds and since nothing ever actually looks at the st_nlink field (at least for /proc) things keep working. Ian. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Correct nr_processes() when CPUs have been unplugged 2009-11-03 10:11 [PATCH] Correct nr_processes() when CPUs have been unplugged Ian Campbell 2009-11-03 15:51 ` Linus Torvalds 2009-11-03 16:07 ` Ingo Molnar @ 2009-11-04 8:34 ` Rusty Russell 2 siblings, 0 replies; 11+ messages in thread From: Rusty Russell @ 2009-11-04 8:34 UTC (permalink / raw) To: Ian Campbell; +Cc: Linus Torvalds, Andrew Morton, linux-kernel On Tue, 3 Nov 2009 08:41:14 pm Ian Campbell wrote: > nr_processes() returns the sum of the per cpu counter process_counts for > all online CPUs. This counter is incremented for the current CPU on > fork() and decremented for the current CPU on exit(). Since a process > does not necessarily fork and exit on the same CPU the process_count for > an individual CPU can be either positive or negative and effectively has > no meaning in isolation. > > Therefore calculating the sum of process_counts over only the online > CPUs omits the processes which were started or stopped on any CPU which > has since been unplugged. Only the sum of process_counts across all > possible CPUs has meaning. > > The only caller of nr_processes() is proc_root_getattr() which > calculates the number of links to /proc as > stat->nlink = proc_root.nlink + nr_processes(); > > You don't have to be all that unlucky for the nr_processes() to return a > negative value leading to a negative number of links (or rather, an > apparently enormous number of links). If this happens then you can get > failures where things like "ls /proc" start to fail because they got an > -EOVERFLOW from some stat() call. > > Example with some debugging inserted to show what goes on: > # ps haux|wc -l > nr_processes: CPU0: 90 > nr_processes: CPU1: 1030 > nr_processes: CPU2: -900 > nr_processes: CPU3: -136 > nr_processes: TOTAL: 84 > proc_root_getattr. nlink 12 + nr_processes() 84 = 96 > 84 > # echo 0 >/sys/devices/system/cpu/cpu1/online > # ps haux|wc -l > nr_processes: CPU0: 85 > nr_processes: CPU2: -901 > nr_processes: CPU3: -137 > nr_processes: TOTAL: -953 > proc_root_getattr. nlink 12 + nr_processes() -953 = -941 > 75 > # stat /proc/ > nr_processes: CPU0: 84 > nr_processes: CPU2: -901 > nr_processes: CPU3: -137 > nr_processes: TOTAL: -954 > proc_root_getattr. nlink 12 + nr_processes() -954 = -942 > File: `/proc/' > Size: 0 Blocks: 0 IO Block: 1024 directory > Device: 3h/3d Inode: 1 Links: 4294966354 > Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) > Access: 2009-11-03 09:06:55.000000000 +0000 > Modify: 2009-11-03 09:06:55.000000000 +0000 > Change: 2009-11-03 09:06:55.000000000 +0000 > > I'm not 100% convinced that the per_cpu regions remain valid for offline > CPUs, although my testing suggests that they do. Yep. And so code should usually start with for_each_possible_cpu() then: > If not then I think the > correct solution would be to aggregate the process_count for a given CPU > into a global base value in cpu_down(). If it proves to be an issue. Acked-by: Rusty Russell <rusty@rustcorp.com.au> Thanks! Rusty. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-11-05 0:43 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-03 10:11 [PATCH] Correct nr_processes() when CPUs have been unplugged Ian Campbell 2009-11-03 15:51 ` Linus Torvalds 2009-11-03 16:07 ` Ingo Molnar 2009-11-03 18:34 ` Christoph Lameter 2009-11-04 6:09 ` Paul E. McKenney 2009-11-04 19:37 ` Christoph Lameter 2009-11-04 19:44 ` Paul E. McKenney 2009-11-04 10:42 ` Ingo Molnar 2009-11-05 0:43 ` Rusty Russell 2009-11-04 11:10 ` Ian Campbell 2009-11-04 8:34 ` Rusty Russell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox