* meminfo Committed_AS underflows
@ 2009-04-14 19:33 Dave Hansen
2009-04-15 2:04 ` KOSAKI Motohiro
2009-04-15 8:33 ` Alan Cox
0 siblings, 2 replies; 12+ messages in thread
From: Dave Hansen @ 2009-04-14 19:33 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel@vger.kernel.org, Eric B Munson, Mel Gorman,
Christoph Lameter
I have a set of ppc64 machines that seem to spontaneously get underflows
in /proc/meminfo's Committed_AS field:
# while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
1 Committed_AS: 18446744073709323392 kB
11 Committed_AS: 18446744073709455488 kB
6 Committed_AS: 35136 kB
5 Committed_AS: 18446744073709454400 kB
7 Committed_AS: 35904 kB
3 Committed_AS: 18446744073709453248 kB
2 Committed_AS: 34752 kB
9 Committed_AS: 18446744073709453248 kB
8 Committed_AS: 34752 kB
3 Committed_AS: 18446744073709320960 kB
7 Committed_AS: 18446744073709454080 kB
3 Committed_AS: 18446744073709320960 kB
5 Committed_AS: 18446744073709454080 kB
6 Committed_AS: 18446744073709320960 kB
As you can see, it bounces in and out of it. I think the problem is
here:
#define ACCT_THRESHOLD max(16, NR_CPUS * 2)
...
void vm_acct_memory(long pages)
{
long *local;
preempt_disable();
local = &__get_cpu_var(committed_space);
*local += pages;
if (*local > ACCT_THRESHOLD || *local < -ACCT_THRESHOLD) {
atomic_long_add(*local, &vm_committed_space);
*local = 0;
}
preempt_enable();
}
Plus, some joker set CONFIG_NR_CPUS=1024.
nr_cpus (1024) * 2 * page_size (64k) = 128MB. That means each cpu can
skew the counter by 128MB. With 1024 CPUs that means that we can have
~128GB of outstanding percpu accounting that meminfo doesn't see. Let's
say we do vm_acct_memory(128MB-1) on 1023 of the CPUs, then on the other
CPU, we do vm_acct_memory(-128GB).
The 1023 cpus won't ever hit the ACCT_THRESHOLD. The 1 CPU that did
will decrement the global 'vm_committed_space' by ~128 GB. Underflow.
Yay. This happens on a much smaller scale now.
Should we be protecting meminfo so that it spits slightly more sane
numbers out to the user?
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: meminfo Committed_AS underflows 2009-04-14 19:33 meminfo Committed_AS underflows Dave Hansen @ 2009-04-15 2:04 ` KOSAKI Motohiro 2009-04-15 3:34 ` Balbir Singh 2009-04-15 4:12 ` KOSAKI Motohiro 2009-04-15 8:33 ` Alan Cox 1 sibling, 2 replies; 12+ messages in thread From: KOSAKI Motohiro @ 2009-04-15 2:04 UTC (permalink / raw) To: Dave Hansen Cc: kosaki.motohiro, linux-mm, linux-kernel@vger.kernel.org, Eric B Munson, Mel Gorman, Christoph Lameter > I have a set of ppc64 machines that seem to spontaneously get underflows > in /proc/meminfo's Committed_AS field: > > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c > 1 Committed_AS: 18446744073709323392 kB > 11 Committed_AS: 18446744073709455488 kB > 6 Committed_AS: 35136 kB > 5 Committed_AS: 18446744073709454400 kB > 7 Committed_AS: 35904 kB > 3 Committed_AS: 18446744073709453248 kB > 2 Committed_AS: 34752 kB > 9 Committed_AS: 18446744073709453248 kB > 8 Committed_AS: 34752 kB > 3 Committed_AS: 18446744073709320960 kB > 7 Committed_AS: 18446744073709454080 kB > 3 Committed_AS: 18446744073709320960 kB > 5 Committed_AS: 18446744073709454080 kB > 6 Committed_AS: 18446744073709320960 kB > > As you can see, it bounces in and out of it. I think the problem is > here: > > #define ACCT_THRESHOLD max(16, NR_CPUS * 2) > ... > void vm_acct_memory(long pages) > { > long *local; > > preempt_disable(); > local = &__get_cpu_var(committed_space); > *local += pages; > if (*local > ACCT_THRESHOLD || *local < -ACCT_THRESHOLD) { > atomic_long_add(*local, &vm_committed_space); > *local = 0; > } > preempt_enable(); > } > > Plus, some joker set CONFIG_NR_CPUS=1024. > > nr_cpus (1024) * 2 * page_size (64k) = 128MB. That means each cpu can > skew the counter by 128MB. With 1024 CPUs that means that we can have > ~128GB of outstanding percpu accounting that meminfo doesn't see. Let's > say we do vm_acct_memory(128MB-1) on 1023 of the CPUs, then on the other > CPU, we do vm_acct_memory(-128GB). > > The 1023 cpus won't ever hit the ACCT_THRESHOLD. The 1 CPU that did > will decrement the global 'vm_committed_space' by ~128 GB. Underflow. > Yay. This happens on a much smaller scale now. > > Should we be protecting meminfo so that it spits slightly more sane > numbers out to the user? Can you try to this patch? (Oh well, I can't reproduce this underflow on my small machine) =============== Dave Hansen reported committed_AS field can underfolow. > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c > 1 Committed_AS: 18446744073709323392 kB > 11 Committed_AS: 18446744073709455488 kB > 6 Committed_AS: 35136 kB > 5 Committed_AS: 18446744073709454400 kB > 7 Committed_AS: 35904 kB > 3 Committed_AS: 18446744073709453248 kB > 2 Committed_AS: 34752 kB > 9 Committed_AS: 18446744073709453248 kB > 8 Committed_AS: 34752 kB > 3 Committed_AS: 18446744073709320960 kB > 7 Committed_AS: 18446744073709454080 kB > 3 Committed_AS: 18446744073709320960 kB > 5 Committed_AS: 18446744073709454080 kB > 6 Committed_AS: 18446744073709320960 kB Because NR_CPU can be greater than 1000. and meminfo_proc_show() doesn't have underflow check. this patch have two change. 1. Change NR_CPU to nr_online_cpus() vm_acct_memory() isn't fast-path. then cpumask_weight() calculation isn't so expensive and the parameter for scalability issue should consider number of _physical_ cpus. not theoretical maximum number. 2. Add under-flow check to meminfo_proc_show(). Almost field in /proc/meminfo have underflow check. but Committed_AS is significant exeption. it should do. Reported-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- fs/proc/meminfo.c | 2 ++ mm/swap.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) Index: b/fs/proc/meminfo.c =================================================================== --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -36,6 +36,8 @@ static int meminfo_proc_show(struct seq_ si_meminfo(&i); si_swapinfo(&i); committed = atomic_long_read(&vm_committed_space); + if (committed < 0) + committed = 0; allowed = ((totalram_pages - hugetlb_total_pages()) * sysctl_overcommit_ratio / 100) + total_swap_pages; Index: b/mm/swap.c =================================================================== --- a/mm/swap.c +++ b/mm/swap.c @@ -519,7 +519,7 @@ EXPORT_SYMBOL(pagevec_lookup_tag); * We tolerate a little inaccuracy to avoid ping-ponging the counter between * CPUs */ -#define ACCT_THRESHOLD max(16, NR_CPUS * 2) +#define ACCT_THRESHOLD max_t(long, 16, num_online_cpus() * 2) static DEFINE_PER_CPU(long, committed_space); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-15 2:04 ` KOSAKI Motohiro @ 2009-04-15 3:34 ` Balbir Singh 2009-04-15 4:10 ` KOSAKI Motohiro 2009-04-15 4:12 ` KOSAKI Motohiro 1 sibling, 1 reply; 12+ messages in thread From: Balbir Singh @ 2009-04-15 3:34 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Dave Hansen, linux-mm, linux-kernel@vger.kernel.org, Eric B Munson, Mel Gorman, Christoph Lameter * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 11:04:59]: > committed = atomic_long_read(&vm_committed_space); > + if (committed < 0) > + committed = 0; Isn't this like pushing the problem under the rug? > allowed = ((totalram_pages - hugetlb_total_pages()) > * sysctl_overcommit_ratio / 100) + total_swap_pages; > > Index: b/mm/swap.c > =================================================================== > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -519,7 +519,7 @@ EXPORT_SYMBOL(pagevec_lookup_tag); > * We tolerate a little inaccuracy to avoid ping-ponging the counter between > * CPUs > */ > -#define ACCT_THRESHOLD max(16, NR_CPUS * 2) > +#define ACCT_THRESHOLD max_t(long, 16, num_online_cpus() * 2) > Hmm.. this is a one time expansion, free of CPU hotplug. Should we use nr_cpu_ids or num_possible_cpus()? -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-15 3:34 ` Balbir Singh @ 2009-04-15 4:10 ` KOSAKI Motohiro 2009-04-15 8:47 ` Balbir Singh 0 siblings, 1 reply; 12+ messages in thread From: KOSAKI Motohiro @ 2009-04-15 4:10 UTC (permalink / raw) To: balbir Cc: kosaki.motohiro, Dave Hansen, linux-mm, linux-kernel@vger.kernel.org, Eric B Munson, Mel Gorman, Christoph Lameter > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 11:04:59]: > > > committed = atomic_long_read(&vm_committed_space); > > + if (committed < 0) > > + committed = 0; > > Isn't this like pushing the problem under the rug? global_page_state() already has same logic. IOW almost meminfo filed has this one (except Commited_AS). > > allowed = ((totalram_pages - hugetlb_total_pages()) > > * sysctl_overcommit_ratio / 100) + total_swap_pages; > > > > Index: b/mm/swap.c > > =================================================================== > > --- a/mm/swap.c > > +++ b/mm/swap.c > > @@ -519,7 +519,7 @@ EXPORT_SYMBOL(pagevec_lookup_tag); > > * We tolerate a little inaccuracy to avoid ping-ponging the counter between > > * CPUs > > */ > > -#define ACCT_THRESHOLD max(16, NR_CPUS * 2) > > +#define ACCT_THRESHOLD max_t(long, 16, num_online_cpus() * 2) > > > > Hmm.. this is a one time expansion, free of CPU hotplug. > > Should we use nr_cpu_ids or num_possible_cpus()? #define num_online_cpus() cpumask_weight(cpu_online_mask) #define num_possible_cpus() cpumask_weight(cpu_possible_mask) num_possible_cpus() have the same calculation cost. nr_cpu_ids isn't proper value. it point to valid cpu-id range, no related number of online nor possible cpus. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-15 4:10 ` KOSAKI Motohiro @ 2009-04-15 8:47 ` Balbir Singh 2009-04-27 20:27 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Balbir Singh @ 2009-04-15 8:47 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Dave Hansen, linux-mm, linux-kernel@vger.kernel.org, Eric B Munson, Mel Gorman, Christoph Lameter * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 13:10:06]: > > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 11:04:59]: > > > > > committed = atomic_long_read(&vm_committed_space); > > > + if (committed < 0) > > > + committed = 0; > > > > Isn't this like pushing the problem under the rug? > > global_page_state() already has same logic. > IOW almost meminfo filed has this one (except Commited_AS). > OK > > > > allowed = ((totalram_pages - hugetlb_total_pages()) > > > * sysctl_overcommit_ratio / 100) + total_swap_pages; > > > > > > Index: b/mm/swap.c > > > =================================================================== > > > --- a/mm/swap.c > > > +++ b/mm/swap.c > > > @@ -519,7 +519,7 @@ EXPORT_SYMBOL(pagevec_lookup_tag); > > > * We tolerate a little inaccuracy to avoid ping-ponging the counter between > > > * CPUs > > > */ > > > -#define ACCT_THRESHOLD max(16, NR_CPUS * 2) > > > +#define ACCT_THRESHOLD max_t(long, 16, num_online_cpus() * 2) > > > > > > > Hmm.. this is a one time expansion, free of CPU hotplug. > > > > Should we use nr_cpu_ids or num_possible_cpus()? > > #define num_online_cpus() cpumask_weight(cpu_online_mask) > #define num_possible_cpus() cpumask_weight(cpu_possible_mask) > > num_possible_cpus() have the same calculation cost. > nr_cpu_ids isn't proper value. > it point to valid cpu-id range, no related number of online nor possible cpus. > Since the value is just a basis for thresholds, num_online_cpus() might work. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-15 8:47 ` Balbir Singh @ 2009-04-27 20:27 ` Andrew Morton 2009-04-28 3:07 ` KOSAKI Motohiro 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2009-04-27 20:27 UTC (permalink / raw) To: balbir; +Cc: kosaki.motohiro, dave, linux-mm, linux-kernel, ebmunson, mel, cl On Wed, 15 Apr 2009 14:17:13 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 13:10:06]: > > > > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 11:04:59]: > > > > > > > committed = atomic_long_read(&vm_committed_space); > > > > + if (committed < 0) > > > > + committed = 0; > > > Is there a reason why we can't use a boring old percpu_counter for vm_committed_space? That way the meminfo code can just use percpu_counter_read_positive(). Or perhaps just percpu_counter_read(). The percpu_counter code does a better job of handling large cpu counts than the mysteriously-duplicative open-coded stuff we have there. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-27 20:27 ` Andrew Morton @ 2009-04-28 3:07 ` KOSAKI Motohiro 2009-04-28 3:27 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: KOSAKI Motohiro @ 2009-04-28 3:07 UTC (permalink / raw) To: Andrew Morton Cc: kosaki.motohiro, balbir, dave, linux-mm, linux-kernel, ebmunson, mel, cl > On Wed, 15 Apr 2009 14:17:13 +0530 > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 13:10:06]: > > > > > > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 11:04:59]: > > > > > > > > > committed = atomic_long_read(&vm_committed_space); > > > > > + if (committed < 0) > > > > > + committed = 0; > > > > > > Is there a reason why we can't use a boring old percpu_counter for > vm_committed_space? That way the meminfo code can just use > percpu_counter_read_positive(). > > Or perhaps just percpu_counter_read(). The percpu_counter code does a > better job of handling large cpu counts than the > mysteriously-duplicative open-coded stuff we have there. At that time, I thought smallest patch is better because it can send -stable tree easily. but maybe I was wrong. it made bikeshed discussion :( ok, I'm going to right way. ========================================= Subject: [PATCH] fix Committed_AS underfolow on large NR_CPUS environment As reported by Dave Hansen, the Committed_AS field can underflow in certain situations: > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c > 1 Committed_AS: 18446744073709323392 kB > 11 Committed_AS: 18446744073709455488 kB > 6 Committed_AS: 35136 kB > 5 Committed_AS: 18446744073709454400 kB > 7 Committed_AS: 35904 kB > 3 Committed_AS: 18446744073709453248 kB > 2 Committed_AS: 34752 kB > 9 Committed_AS: 18446744073709453248 kB > 8 Committed_AS: 34752 kB > 3 Committed_AS: 18446744073709320960 kB > 7 Committed_AS: 18446744073709454080 kB > 3 Committed_AS: 18446744073709320960 kB > 5 Committed_AS: 18446744073709454080 kB > 6 Committed_AS: 18446744073709320960 kB Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does not check for underflow. But NR_CPUS proportional isn't good calculation. In general, possibility of lock contention is proportional to the number of online cpus, not theorical maximum cpus (NR_CPUS). the current kernel has generic percpu-counter stuff. using it is right way. it makes code simplify and percpu_counter_read_positive() don't make underflow issue. Reported-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Eric B Munson <ebmunson@us.ibm.com> --- fs/proc/meminfo.c | 2 +- include/linux/mman.h | 9 +++------ mm/mmap.c | 12 ++++++------ mm/nommu.c | 13 +++++++------ mm/swap.c | 46 ---------------------------------------------- 5 files changed, 17 insertions(+), 65 deletions(-) Index: b/fs/proc/meminfo.c =================================================================== --- a/fs/proc/meminfo.c 2009-04-28 11:36:43.000000000 +0900 +++ b/fs/proc/meminfo.c 2009-04-28 11:36:54.000000000 +0900 @@ -35,7 +35,7 @@ static int meminfo_proc_show(struct seq_ #define K(x) ((x) << (PAGE_SHIFT - 10)) si_meminfo(&i); si_swapinfo(&i); - committed = atomic_long_read(&vm_committed_space); + committed = percpu_counter_read_positive(&vm_committed_as); allowed = ((totalram_pages - hugetlb_total_pages()) * sysctl_overcommit_ratio / 100) + total_swap_pages; Index: b/mm/mmap.c =================================================================== --- a/mm/mmap.c 2009-04-28 11:36:43.000000000 +0900 +++ b/mm/mmap.c 2009-04-28 11:36:54.000000000 +0900 @@ -85,7 +85,7 @@ EXPORT_SYMBOL(vm_get_page_prot); int sysctl_overcommit_memory = OVERCOMMIT_GUESS; /* heuristic overcommit */ int sysctl_overcommit_ratio = 50; /* default is 50% */ int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; -atomic_long_t vm_committed_space = ATOMIC_LONG_INIT(0); +struct percpu_counter vm_committed_as; /* * Check that a process has enough memory to allocate a new virtual @@ -179,11 +179,7 @@ int __vm_enough_memory(struct mm_struct if (mm) allowed -= mm->total_vm / 32; - /* - * cast `allowed' as a signed long because vm_committed_space - * sometimes has a negative value - */ - if (atomic_long_read(&vm_committed_space) < (long)allowed) + if (percpu_counter_read_positive(&vm_committed_as) < allowed) return 0; error: vm_unacct_memory(pages); @@ -2481,4 +2477,8 @@ void mm_drop_all_locks(struct mm_struct */ void __init mmap_init(void) { + int ret; + + ret = percpu_counter_init(&vm_committed_as, 0); + VM_BUG_ON(ret); } Index: b/mm/nommu.c =================================================================== --- a/mm/nommu.c 2009-04-28 11:36:43.000000000 +0900 +++ b/mm/nommu.c 2009-04-28 11:36:54.000000000 +0900 @@ -62,7 +62,7 @@ void *high_memory; struct page *mem_map; unsigned long max_mapnr; unsigned long num_physpages; -atomic_long_t vm_committed_space = ATOMIC_LONG_INIT(0); +struct percpu_counter vm_committed_as; int sysctl_overcommit_memory = OVERCOMMIT_GUESS; /* heuristic overcommit */ int sysctl_overcommit_ratio = 50; /* default is 50% */ int sysctl_max_map_count = DEFAULT_MAX_MAP_COUNT; @@ -463,6 +463,10 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) */ void __init mmap_init(void) { + int ret; + + ret = percpu_counter_init(&vm_committed_as, 0); + VM_BUG_ON(ret); vm_region_jar = KMEM_CACHE(vm_region, SLAB_PANIC); } @@ -1847,12 +1851,9 @@ int __vm_enough_memory(struct mm_struct if (mm) allowed -= mm->total_vm / 32; - /* - * cast `allowed' as a signed long because vm_committed_space - * sometimes has a negative value - */ - if (atomic_long_read(&vm_committed_space) < (long)allowed) + if (percpu_counter_read_positive(&vm_committed_as) < allowed) return 0; + error: vm_unacct_memory(pages); Index: b/mm/swap.c =================================================================== --- a/mm/swap.c 2009-04-28 11:36:43.000000000 +0900 +++ b/mm/swap.c 2009-04-28 11:36:54.000000000 +0900 @@ -491,49 +491,6 @@ unsigned pagevec_lookup_tag(struct pagev EXPORT_SYMBOL(pagevec_lookup_tag); -#ifdef CONFIG_SMP -/* - * We tolerate a little inaccuracy to avoid ping-ponging the counter between - * CPUs - */ -#define ACCT_THRESHOLD max(16, NR_CPUS * 2) - -static DEFINE_PER_CPU(long, committed_space); - -void vm_acct_memory(long pages) -{ - long *local; - - preempt_disable(); - local = &__get_cpu_var(committed_space); - *local += pages; - if (*local > ACCT_THRESHOLD || *local < -ACCT_THRESHOLD) { - atomic_long_add(*local, &vm_committed_space); - *local = 0; - } - preempt_enable(); -} - -#ifdef CONFIG_HOTPLUG_CPU - -/* Drop the CPU's cached committed space back into the central pool. */ -static int cpu_swap_callback(struct notifier_block *nfb, - unsigned long action, - void *hcpu) -{ - long *committed; - - committed = &per_cpu(committed_space, (long)hcpu); - if (action == CPU_DEAD || action == CPU_DEAD_FROZEN) { - atomic_long_add(*committed, &vm_committed_space); - *committed = 0; - drain_cpu_pagevecs((long)hcpu); - } - return NOTIFY_OK; -} -#endif /* CONFIG_HOTPLUG_CPU */ -#endif /* CONFIG_SMP */ - /* * Perform any setup for the swap system */ @@ -554,7 +511,4 @@ void __init swap_setup(void) * Right now other parts of the system means that we * _really_ don't want to cluster much more */ -#ifdef CONFIG_HOTPLUG_CPU - hotcpu_notifier(cpu_swap_callback, 0); -#endif } Index: b/include/linux/mman.h =================================================================== --- a/include/linux/mman.h 2009-04-28 11:32:47.000000000 +0900 +++ b/include/linux/mman.h 2009-04-28 11:38:42.000000000 +0900 @@ -12,21 +12,18 @@ #ifdef __KERNEL__ #include <linux/mm.h> +#include <linux/percpu_counter.h> #include <asm/atomic.h> extern int sysctl_overcommit_memory; extern int sysctl_overcommit_ratio; -extern atomic_long_t vm_committed_space; +extern struct percpu_counter vm_committed_as; -#ifdef CONFIG_SMP -extern void vm_acct_memory(long pages); -#else static inline void vm_acct_memory(long pages) { - atomic_long_add(pages, &vm_committed_space); + percpu_counter_add(&vm_committed_as, pages); } -#endif static inline void vm_unacct_memory(long pages) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-28 3:07 ` KOSAKI Motohiro @ 2009-04-28 3:27 ` Andrew Morton 2009-04-28 4:25 ` KOSAKI Motohiro 2009-04-28 8:17 ` Dave Hansen 0 siblings, 2 replies; 12+ messages in thread From: Andrew Morton @ 2009-04-28 3:27 UTC (permalink / raw) To: KOSAKI Motohiro Cc: balbir, dave, linux-mm, linux-kernel, ebmunson, mel, cl, stable On Tue, 28 Apr 2009 12:07:59 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > On Wed, 15 Apr 2009 14:17:13 +0530 > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 13:10:06]: > > > > > > > > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-04-15 11:04:59]: > > > > > > > > > > > committed = atomic_long_read(&vm_committed_space); > > > > > > + if (committed < 0) > > > > > > + committed = 0; > > > > > > > > > Is there a reason why we can't use a boring old percpu_counter for > > vm_committed_space? That way the meminfo code can just use > > percpu_counter_read_positive(). > > > > Or perhaps just percpu_counter_read(). The percpu_counter code does a > > better job of handling large cpu counts than the > > mysteriously-duplicative open-coded stuff we have there. > > At that time, I thought smallest patch is better because it can send -stable > tree easily. > but maybe I was wrong. it made bikeshed discussion :( Yes, I know what you mean. But otoh it's a good idea to keep -stable in sync with mainline - it means that -stable can merge things which have had a suitable amount of testing. > ok, I'm going to right way. > > > ========================================= > Subject: [PATCH] fix Committed_AS underfolow on large NR_CPUS environment > > As reported by Dave Hansen, the Committed_AS field can underflow in certain > situations: > > > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c > > 1 Committed_AS: 18446744073709323392 kB > > 11 Committed_AS: 18446744073709455488 kB > > 6 Committed_AS: 35136 kB > > 5 Committed_AS: 18446744073709454400 kB > > 7 Committed_AS: 35904 kB > > 3 Committed_AS: 18446744073709453248 kB > > 2 Committed_AS: 34752 kB > > 9 Committed_AS: 18446744073709453248 kB > > 8 Committed_AS: 34752 kB > > 3 Committed_AS: 18446744073709320960 kB > > 7 Committed_AS: 18446744073709454080 kB > > 3 Committed_AS: 18446744073709320960 kB > > 5 Committed_AS: 18446744073709454080 kB > > 6 Committed_AS: 18446744073709320960 kB > > Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does not check > for underflow. > > But NR_CPUS proportional isn't good calculation. In general, possibility of > lock contention is proportional to the number of online cpus, not theorical > maximum cpus (NR_CPUS). > the current kernel has generic percpu-counter stuff. using it is right way. > it makes code simplify and percpu_counter_read_positive() don't make underflow issue. > > > Reported-by: Dave Hansen <dave@linux.vnet.ibm.com> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Cc: Eric B Munson <ebmunson@us.ibm.com> > --- > fs/proc/meminfo.c | 2 +- > include/linux/mman.h | 9 +++------ > mm/mmap.c | 12 ++++++------ > mm/nommu.c | 13 +++++++------ > mm/swap.c | 46 ---------------------------------------------- > 5 files changed, 17 insertions(+), 65 deletions(-) Well that was nice. There's potential here for weird performance regressions, so I think that if we do this in mainline, we should wait a while (a few weeks?) before backporting it. Do we know how long this bug has existed for? Quite a while, I expect? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-28 3:27 ` Andrew Morton @ 2009-04-28 4:25 ` KOSAKI Motohiro 2009-04-28 8:17 ` Dave Hansen 1 sibling, 0 replies; 12+ messages in thread From: KOSAKI Motohiro @ 2009-04-28 4:25 UTC (permalink / raw) To: Andrew Morton Cc: kosaki.motohiro, balbir, dave, linux-mm, linux-kernel, ebmunson, mel, cl, stable > > > > > > > committed = atomic_long_read(&vm_committed_space); > > > > > > > + if (committed < 0) > > > > > > > + committed = 0; > > > > > > > > > > > > Is there a reason why we can't use a boring old percpu_counter for > > > vm_committed_space? That way the meminfo code can just use > > > percpu_counter_read_positive(). > > > > > > Or perhaps just percpu_counter_read(). The percpu_counter code does a > > > better job of handling large cpu counts than the > > > mysteriously-duplicative open-coded stuff we have there. > > > > At that time, I thought smallest patch is better because it can send -stable > > tree easily. > > but maybe I was wrong. it made bikeshed discussion :( > > Yes, I know what you mean. But otoh it's a good idea to keep -stable > in sync with mainline - it means that -stable can merge things which > have had a suitable amount of testing. Agreed. > > Reported-by: Dave Hansen <dave@linux.vnet.ibm.com> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > Cc: Eric B Munson <ebmunson@us.ibm.com> > > --- > > fs/proc/meminfo.c | 2 +- > > include/linux/mman.h | 9 +++------ > > mm/mmap.c | 12 ++++++------ > > mm/nommu.c | 13 +++++++------ > > mm/swap.c | 46 ---------------------------------------------- > > 5 files changed, 17 insertions(+), 65 deletions(-) > > Well that was nice. > > There's potential here for weird performance regressions, so I think > that if we do this in mainline, we should wait a while (a few weeks?) > before backporting it. > > Do we know how long this bug has existed for? Quite a while, I expect? ACCT_THRESHOLD was introduced bit-keeper age. powerpc default NR_CPUS is still less 128. % grep NR_CPUS * cell_defconfig:CONFIG_NR_CPUS=4 celleb_defconfig:CONFIG_NR_CPUS=4 chrp32_defconfig:CONFIG_NR_CPUS=4 g5_defconfig:CONFIG_NR_CPUS=4 iseries_defconfig:CONFIG_NR_CPUS=32 maple_defconfig:CONFIG_NR_CPUS=4 mpc86xx_defconfig:CONFIG_NR_CPUS=2 pasemi_defconfig:CONFIG_NR_CPUS=2 ppc64_defconfig:CONFIG_NR_CPUS=32 ps3_defconfig:CONFIG_NR_CPUS=2 pseries_defconfig:CONFIG_NR_CPUS=128 powerpc maximum NR_CPUS was increased following commit. I'm not sure about one year is short or long. ========================================================== commit 90035fe378c7459ba19c43c63d5f878284224ce4 Author: Tony Breeds <tony@bakeyournoodle.com> Date: Thu Apr 24 13:43:49 2008 +1000 [POWERPC] Raise the upper limit of NR_CPUS and move the pacas into the BSS This adds the required functionality to fill in all pacas at runtime. With NR_CPUS=1024 text data bss dec hex filename 137 1704032 0 1704169 1a00e9 arch/powerpc/kernel/paca.o :Before 121 1179744 524288 1704153 1a00d9 arch/powerpc/kernel/paca.o :After Also remove unneeded #includes from arch/powerpc/kernel/paca.c Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 5fc7fac..f7efaa9 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -220,8 +220,8 @@ config SMP If you don't know what to do here, say N. config NR_CPUS - int "Maximum number of CPUs (2-128)" - range 2 128 + int "Maximum number of CPUs (2-1024)" + range 2 1024 depends on SMP default "32" if PPC64 default "4" -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-28 3:27 ` Andrew Morton 2009-04-28 4:25 ` KOSAKI Motohiro @ 2009-04-28 8:17 ` Dave Hansen 1 sibling, 0 replies; 12+ messages in thread From: Dave Hansen @ 2009-04-28 8:17 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, balbir, linux-mm, linux-kernel, ebmunson, mel, cl, stable On Mon, 2009-04-27 at 20:27 -0700, Andrew Morton wrote: > There's potential here for weird performance regressions, so I think > that if we do this in mainline, we should wait a while (a few weeks?) > before backporting it. > > Do we know how long this bug has existed for? Quite a while, I > expect? Yeah, we didn't notice it until a recent enterprise distro got a config with NR_CPUS=1024. That opened the window up in a major way because of the way we calculated the threshold. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-15 2:04 ` KOSAKI Motohiro 2009-04-15 3:34 ` Balbir Singh @ 2009-04-15 4:12 ` KOSAKI Motohiro 1 sibling, 0 replies; 12+ messages in thread From: KOSAKI Motohiro @ 2009-04-15 4:12 UTC (permalink / raw) To: Dave Hansen Cc: kosaki.motohiro, linux-mm, linux-kernel@vger.kernel.org, Eric B Munson, Mel Gorman, Christoph Lameter fix stupid mistake. Changelog: since v1 - change the type of commited value from ulong to long ========================= Subject: [PATCH v2] fix Commited_AS underflow Dave Hansen reported committed_AS field can underfolow. > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c > 1 Committed_AS: 18446744073709323392 kB > 11 Committed_AS: 18446744073709455488 kB > 6 Committed_AS: 35136 kB > 5 Committed_AS: 18446744073709454400 kB > 7 Committed_AS: 35904 kB > 3 Committed_AS: 18446744073709453248 kB > 2 Committed_AS: 34752 kB > 9 Committed_AS: 18446744073709453248 kB > 8 Committed_AS: 34752 kB > 3 Committed_AS: 18446744073709320960 kB > 7 Committed_AS: 18446744073709454080 kB > 3 Committed_AS: 18446744073709320960 kB > 5 Committed_AS: 18446744073709454080 kB > 6 Committed_AS: 18446744073709320960 kB Because NR_CPU can be greater than 1000. and meminfo_proc_show() doesn't have underflow check. this patch have two change. 1. Change NR_CPU to nr_online_cpus() vm_acct_memory() isn't fast-path. then cpumask_weight() calculation isn't so expensive and the parameter for scalability issue should consider number of _physical_ cpus. not theoretical maximum number. 2. Add under-flow check to meminfo_proc_show(). Almost field in /proc/meminfo have underflow check. but Committed_AS is significant exeption. it should do. Reported-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- fs/proc/meminfo.c | 4 +++- mm/swap.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) Index: b/fs/proc/meminfo.c =================================================================== --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -22,7 +22,7 @@ void __attribute__((weak)) arch_report_m static int meminfo_proc_show(struct seq_file *m, void *v) { struct sysinfo i; - unsigned long committed; + long committed; unsigned long allowed; struct vmalloc_info vmi; long cached; @@ -36,6 +36,8 @@ static int meminfo_proc_show(struct seq_ si_meminfo(&i); si_swapinfo(&i); committed = atomic_long_read(&vm_committed_space); + if (committed < 0) + committed = 0; allowed = ((totalram_pages - hugetlb_total_pages()) * sysctl_overcommit_ratio / 100) + total_swap_pages; Index: b/mm/swap.c =================================================================== --- a/mm/swap.c +++ b/mm/swap.c @@ -519,7 +519,7 @@ EXPORT_SYMBOL(pagevec_lookup_tag); * We tolerate a little inaccuracy to avoid ping-ponging the counter between * CPUs */ -#define ACCT_THRESHOLD max(16, NR_CPUS * 2) +#define ACCT_THRESHOLD max_t(long, 16, num_online_cpus() * 2) static DEFINE_PER_CPU(long, committed_space); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: meminfo Committed_AS underflows 2009-04-14 19:33 meminfo Committed_AS underflows Dave Hansen 2009-04-15 2:04 ` KOSAKI Motohiro @ 2009-04-15 8:33 ` Alan Cox 1 sibling, 0 replies; 12+ messages in thread From: Alan Cox @ 2009-04-15 8:33 UTC (permalink / raw) To: Dave Hansen Cc: linux-mm, linux-kernel@vger.kernel.org, Eric B Munson, Mel Gorman, Christoph Lameter > The 1023 cpus won't ever hit the ACCT_THRESHOLD. The 1 CPU that did > will decrement the global 'vm_committed_space' by ~128 GB. Underflow. > Yay. This happens on a much smaller scale now. > > Should we be protecting meminfo so that it spits slightly more sane > numbers out to the user? Yes. It used to be accurate but the changes were put in as the memory accounting value was actually starting to show up in profiles as it bounced around the CPUs - perhaps the constraint should instead be a worst case error as percentage of total system memory ? Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-04-28 8:17 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-14 19:33 meminfo Committed_AS underflows Dave Hansen 2009-04-15 2:04 ` KOSAKI Motohiro 2009-04-15 3:34 ` Balbir Singh 2009-04-15 4:10 ` KOSAKI Motohiro 2009-04-15 8:47 ` Balbir Singh 2009-04-27 20:27 ` Andrew Morton 2009-04-28 3:07 ` KOSAKI Motohiro 2009-04-28 3:27 ` Andrew Morton 2009-04-28 4:25 ` KOSAKI Motohiro 2009-04-28 8:17 ` Dave Hansen 2009-04-15 4:12 ` KOSAKI Motohiro 2009-04-15 8:33 ` Alan Cox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).