* Re: [PATCH V3] Fix Committed_AS underflow [not found] <1240218590-16714-1-git-send-email-ebmunson@us.ibm.com> @ 2009-04-20 9:18 ` KOSAKI Motohiro 2009-04-20 16:15 ` Dave Hansen 1 sibling, 0 replies; 5+ messages in thread From: KOSAKI Motohiro @ 2009-04-20 9:18 UTC (permalink / raw) To: Eric B Munson; +Cc: kosaki.motohiro, dave, linux-mm, linux-kernel, mel, cl > This patch makes a small change to an earlier patch by Kosaki Motohiro. > The threshold calculation was changed to avoid the overhead of calculating > the number of online cpus each time the threshold is needed. IIRC, Mel have the patch of kill num_online_cups() calculation cost. but I can accept your patch because mel patch haven't merged mainline yet. Thanks :) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V3] Fix Committed_AS underflow [not found] <1240218590-16714-1-git-send-email-ebmunson@us.ibm.com> 2009-04-20 9:18 ` [PATCH V3] Fix Committed_AS underflow KOSAKI Motohiro @ 2009-04-20 16:15 ` Dave Hansen 2009-04-20 19:49 ` Dave Hansen 1 sibling, 1 reply; 5+ messages in thread From: Dave Hansen @ 2009-04-20 16:15 UTC (permalink / raw) To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote: > 1. Change NR_CPUS to min(64, NR_CPUS) > This will limit the amount of possible skew on kernels compiled for very > large SMP machines. 64 is an arbitrary number selected to limit the worst > of the skew without using more cache lines. min(64, NR_CPUS) is used > instead of nr_online_cpus() because nr_online_cpus() requires a shared > cache line and a call to hweight to make the calculation. Its runtime > overhead and keeping this counter accurate showed up in profiles and it's > possible that nr_online_cpus() would also show. -- Dave ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V3] Fix Committed_AS underflow 2009-04-20 16:15 ` Dave Hansen @ 2009-04-20 19:49 ` Dave Hansen 2009-04-21 1:41 ` KOSAKI Motohiro [not found] ` <20090423163148.GB5044@us.ibm.com> 0 siblings, 2 replies; 5+ messages in thread From: Dave Hansen @ 2009-04-20 19:49 UTC (permalink / raw) To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl On Mon, 2009-04-20 at 09:15 -0700, Dave Hansen wrote: > On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote: > > 1. Change NR_CPUS to min(64, NR_CPUS) > > This will limit the amount of possible skew on kernels compiled for very > > large SMP machines. 64 is an arbitrary number selected to limit the worst > > of the skew without using more cache lines. min(64, NR_CPUS) is used > > instead of nr_online_cpus() because nr_online_cpus() requires a shared > > cache line and a call to hweight to make the calculation. Its runtime > > overhead and keeping this counter accurate showed up in profiles and it's > > possible that nr_online_cpus() would also show. Wow, that empty reply was really informative, wasn't it? :) My worry with this min(64, NR_CPUS) approach is that you effectively ensure that you're going to be doing a lot more cacheline bouncing, but it isn't quite as explicit. Now, every time there's a mapping (or set of them) created or destroyed that nets greater than 64 pages, you've got to go get a r/w cacheline to a possibly highly contended atomic. With a number this low, you're almost guaranteed to hit it at fork() and exec(). Could you double-check that this doesn't hurt any of the fork() AIM tests? Another thought is that, instead of trying to fix this up in meminfo, we could do this in a way that is guaranteed to never skew the global counter negative: we always keep the *percpu* skew negative. This should be the same as what's in the kernel now: void vm_acct_memory(long pages) { long *local; long local_min = -ACCT_THRESHOLD; long local_max = ACCT_THRESHOLD; long local_goal = 0; preempt_disable(); local = &__get_cpu_var(committed_space); *local += pages; if (*local > local_max || *local < local_min) { atomic_long_add(*local - local_goal, &vm_committed_space); *local = local_goal; } preempt_enable(); } But now consider if we changed the local_* variables a bit: long local_min = -(ACCT_THRESHOLD*2); long local_max = 0 long local_goal = -ACCT_THRESHOLD; We'll get some possibly *large* numbers in meminfo, but it will at least never underflow. -- Dave ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V3] Fix Committed_AS underflow 2009-04-20 19:49 ` Dave Hansen @ 2009-04-21 1:41 ` KOSAKI Motohiro [not found] ` <20090423163148.GB5044@us.ibm.com> 1 sibling, 0 replies; 5+ messages in thread From: KOSAKI Motohiro @ 2009-04-21 1:41 UTC (permalink / raw) To: Dave Hansen Cc: kosaki.motohiro, Eric B Munson, linux-mm, linux-kernel, mel, cl > void vm_acct_memory(long pages) > { > long *local; > long local_min = -ACCT_THRESHOLD; > long local_max = ACCT_THRESHOLD; > long local_goal = 0; > > preempt_disable(); > local = &__get_cpu_var(committed_space); > *local += pages; > if (*local > local_max || *local < local_min) { > atomic_long_add(*local - local_goal, &vm_committed_space); > *local = local_goal; > } > preempt_enable(); > } > > But now consider if we changed the local_* variables a bit: > > long local_min = -(ACCT_THRESHOLD*2); > long local_max = 0 > long local_goal = -ACCT_THRESHOLD; > > We'll get some possibly *large* numbers in meminfo, but it will at least > never underflow. if *local == -(ACCT_THRESHOLD*2), *local - local_goal = -(ACCT_THRESHOLD*2) + ACCT_THRESHOLD = -ACCT_THRESHOLD Then, we still pass negative value to atomic_long_add(). IOW, vm_committed_space still can be negative value. Am I missing anything? ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20090423163148.GB5044@us.ibm.com>]
* Re: [PATCH V3] Fix Committed_AS underflow [not found] ` <20090423163148.GB5044@us.ibm.com> @ 2009-04-23 20:38 ` Dave Hansen 0 siblings, 0 replies; 5+ messages in thread From: Dave Hansen @ 2009-04-23 20:38 UTC (permalink / raw) To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl On Thu, 2009-04-23 at 17:31 +0100, Eric B Munson wrote: > On Mon, 20 Apr 2009, Dave Hansen wrote: > > > On Mon, 2009-04-20 at 09:15 -0700, Dave Hansen wrote: > > > On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote: > > > > 1. Change NR_CPUS to min(64, NR_CPUS) > > > > This will limit the amount of possible skew on kernels compiled for very > > > > large SMP machines. 64 is an arbitrary number selected to limit the worst > > > > of the skew without using more cache lines. min(64, NR_CPUS) is used > > > > instead of nr_online_cpus() because nr_online_cpus() requires a shared > > > > cache line and a call to hweight to make the calculation. Its runtime > > > > overhead and keeping this counter accurate showed up in profiles and it's > > > > possible that nr_online_cpus() would also show. > > > > Wow, that empty reply was really informative, wasn't it? :) > > > > My worry with this min(64, NR_CPUS) approach is that you effectively > > ensure that you're going to be doing a lot more cacheline bouncing, but > > it isn't quite as explicit. > > Unfortunately this is a choice we have to make, do we want to avoid cache > line bouncing of fork-heavy workloads using more than 64 pages or bad > information being used for overcommit decisions? On SLES11, a new bash shell has ~9MB of mapped space. A cat(1) has ~6.7MB. The ACCT_THRESHOLD=64*2 pages with a 64k page is 8MB. So, you're basically *guaranteed* to go off-cpu every time a shell command is performed. So, I'd say 8MB is a bit on the low side. Note that even with your suggested patch, we can still have 8GB of skew on a 1024-way machine since there are still num_online_cpus()*8MB. This is better than what we have now, certainly. > > Now, every time there's a mapping (or set of them) created or destroyed > > that nets greater than 64 pages, you've got to go get a r/w cacheline to > > a possibly highly contended atomic. With a number this low, you're > > almost guaranteed to hit it at fork() and exec(). Could you > > double-check that this doesn't hurt any of the fork() AIM tests? > > It is unlikely that the aim9 benchmarks would show if this patch was a > problem because it forks in a tight loop and in a process that is not > necessarily beig enough to hit ACCT_THRESHOLD, likely on a single CPU. > In order to show any problems here we need a fork heavy workload with > many threads on many CPUs. That's true. I'd suspect that aim is somewhere around the size of 'cat'. It's a best-case scenario. But, if you see degradation on the best-case workload, we'll certainly see issues on worse workloads. Also, I used fork as an obvious example here. This issue will present itself any time there is a mapping or set of mappings with a net change in size of 8MB. There's another alternative. We could temporarily add statistics to count how many times we go over ACCT_THRESHOLD to see just how bad the situation is. For instance, 64 causes 342.43 times as many touches of the global counter as 8192 (or whatever it is). You could also make ACCT_THRESHOLD a tunable, at least for the duration of this testing. There are also ways you can bias the global counter in controlled ways. You could ensure, for instance, that a local counter never gets a positive bias, causing the global one to be biased negatively. That would guarantee no underflows, although it would still be possible to see vm_committed_space in an temporary *inflated* state. -- Dave ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-04-23 20:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1240218590-16714-1-git-send-email-ebmunson@us.ibm.com>
2009-04-20 9:18 ` [PATCH V3] Fix Committed_AS underflow KOSAKI Motohiro
2009-04-20 16:15 ` Dave Hansen
2009-04-20 19:49 ` Dave Hansen
2009-04-21 1:41 ` KOSAKI Motohiro
[not found] ` <20090423163148.GB5044@us.ibm.com>
2009-04-23 20:38 ` Dave Hansen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox