* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory [not found] <20111012160202.GA18666@sgi.com> @ 2011-10-12 19:01 ` Andrew Morton 2011-10-12 19:57 ` Christoph Lameter [not found] ` <CADE8fzrdMOBF1RyyEpMVi8aKcgOVKRQSKi0=c1Qvh3p6hHcXRA@mail.gmail.com> 0 siblings, 2 replies; 25+ messages in thread From: Andrew Morton @ 2011-10-12 19:01 UTC (permalink / raw) To: Dimitri Sivanich; +Cc: linux-kernel, linux-mm, Christoph Lameter On Wed, 12 Oct 2011 11:02:02 -0500 Dimitri Sivanich <sivanich@sgi.com> wrote: > Tmpfs I/O throughput testing on UV systems has shown writeback contention > between multiple writer threads (even when each thread writes to a separate > tmpfs mount point). > > A large part of this is caused by cacheline contention reading the vm_stat > array in the __vm_enough_memory check. > > The attached test patch illustrates a possible avenue for improvement in this > area. By locally caching the values read from vm_stat (and refreshing the > values after 2 seconds), I was able to improve tmpfs writeback performance from > ~300 MB/sec to ~700 MB/sec with 120 threads writing data simultaneously to > files on separate tmpfs mount points (tested on 3.1.0-rc9). > > Note that this patch is simply to illustrate the gains that can be made here. > What I'm looking for is some guidance on an acceptable way to accomplish the > task of reducing contention in this area, either by caching these values in a > way similar to the attached patch, or by some other mechanism if this is > unacceptable. Yes, the global vm_stat[] array is a problem - I'm surprised it's hung around for this long. Altering the sysctl_overcommit_memory mode will hide the problem, but that's no good. I think we've discussed switching vm_stat[] to a contention-avoiding counter scheme. Simply using <percpu_counter.h> would be the simplest approach. They'll introduce inaccuracies but hopefully any problems from that will be minor for the global page counters. otoh, I think we've been round this loop before and I don't recall why nothing happened. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-12 19:01 ` [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory Andrew Morton @ 2011-10-12 19:57 ` Christoph Lameter 2011-10-13 15:06 ` Mel Gorman 2011-10-13 15:23 ` Dimitri Sivanich [not found] ` <CADE8fzrdMOBF1RyyEpMVi8aKcgOVKRQSKi0=c1Qvh3p6hHcXRA@mail.gmail.com> 1 sibling, 2 replies; 25+ messages in thread From: Christoph Lameter @ 2011-10-12 19:57 UTC (permalink / raw) To: Andrew Morton; +Cc: Dimitri Sivanich, linux-kernel, linux-mm, Mel Gorman On Wed, 12 Oct 2011, Andrew Morton wrote: > > Note that this patch is simply to illustrate the gains that can be made here. > > What I'm looking for is some guidance on an acceptable way to accomplish the > > task of reducing contention in this area, either by caching these values in a > > way similar to the attached patch, or by some other mechanism if this is > > unacceptable. > > Yes, the global vm_stat[] array is a problem - I'm surprised it's hung > around for this long. Altering the sysctl_overcommit_memory mode will > hide the problem, but that's no good. The global vm_stat array is keeping the state for the zone. It would be even more expensive to calculate this at every point where we need such data. > I think we've discussed switching vm_stat[] to a contention-avoiding > counter scheme. Simply using <percpu_counter.h> would be the simplest > approach. They'll introduce inaccuracies but hopefully any problems > from that will be minor for the global page counters. We already have a contention avoiding scheme for counter updates in vmstat.c. The problem here is that vm_stat is frequently read. Updates from other cpus that fold counter updates in a deferred way into the global statistics cause cacheline eviction. The updates occur too frequent in this load. > otoh, I think we've been round this loop before and I don't recall why > nothing happened. The update behavior can be tuned using /proc/sys/vm/stat_interval. Increase the interval to reduce the folding into the global counter (set maybe to 10?). This will reduce contention. The other approach is to increase the allowed delta per zone if frequent updates occur via the overflow checks in vmstat.c. See calculate_*_threshold there. Note that the deltas are current reduced for memory pressure situations (after recent patches by Mel). This will cause a significant increase in vm_stat cacheline contention compared to earlier kernels. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-12 19:57 ` Christoph Lameter @ 2011-10-13 15:06 ` Mel Gorman 2011-10-13 15:59 ` Andi Kleen 2011-10-13 15:23 ` Dimitri Sivanich 1 sibling, 1 reply; 25+ messages in thread From: Mel Gorman @ 2011-10-13 15:06 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, Dimitri Sivanich, linux-kernel, linux-mm On Wed, Oct 12, 2011 at 02:57:53PM -0500, Christoph Lameter wrote: > > I think we've discussed switching vm_stat[] to a contention-avoiding > > counter scheme. Simply using <percpu_counter.h> would be the simplest > > approach. They'll introduce inaccuracies but hopefully any problems > > from that will be minor for the global page counters. > > We already have a contention avoiding scheme for counter updates in > vmstat.c. The problem here is that vm_stat is frequently read. Updates > from other cpus that fold counter updates in a deferred way into the > global statistics cause cacheline eviction. The updates occur too frequent > in this load. > There is also a correctness issue to be concerned with. In the patch, there is a two second window during which the counters are not being read. This increases the risk that the system gets too overcommitted when overcommit_memory == OVERCOMMIT_GUESS. If vm_enough_memory is being heavily hit as well, it implies that this workload is mmap-intensive which is pretty inefficient in itself. I guess it would also apply to workloads that are malloc-intensive for large buffers but I'd expect the cache line bounces to only dominate if there was little or no computation on the resulting buffers. As a result, I wonder how realistic is this test workload and who useful fixing this problem is in general? > > otoh, I think we've been round this loop before and I don't recall why > > nothing happened. > > The update behavior can be tuned using /proc/sys/vm/stat_interval. > Increase the interval to reduce the folding into the global counter (set > maybe to 10?). This will reduce contention. Unless the thresholds for per-cpu drift are being hit. If they are allocating and freeing pages in large numbers for example, we'll be calling __mod_zone_page_state(NR_FREE_PAGES) in large batches, overflowing the counters, calling zone_page_state_add() and dirtying the global vm_stat that way. In that case, increasing stat_interval alone is not the answer. > The other approach is to > increase the allowed delta per zone if frequent updates occur via the > overflow checks in vmstat.c. See calculate_*_threshold there. > If this approach is taken, be careful that threshold is an s8 so it is limited in size. > Note that the deltas are current reduced for memory pressure situations > (after recent patches by Mel). This will cause a significant increase in > vm_stat cacheline contention compared to earlier kernels. > That statement is misleading. The thresholds are reduced while kswapd is awake to avoid the possibility of all memory being allocated and the machine livelocking. If the system is under enough pressure for kswapd to be awake for prolonged periods of time, the overhead of cache line bouncing while updating vm_stat is going to be a lesser concern. I like the idea of the threshold being scaled under normal circumstances depending on the size of the central counter. Conceivably it could be done as part of refresh_cpu_vm_stats() using the old value of the central counter while walking each per_cpu_pageset. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-13 15:06 ` Mel Gorman @ 2011-10-13 15:59 ` Andi Kleen 0 siblings, 0 replies; 25+ messages in thread From: Andi Kleen @ 2011-10-13 15:59 UTC (permalink / raw) To: Mel Gorman Cc: Christoph Lameter, Andrew Morton, Dimitri Sivanich, linux-kernel, linux-mm Mel Gorman <mel@csn.ul.ie> writes: > > If vm_enough_memory is being heavily hit as well, it implies that this > workload is mmap-intensive which is pretty inefficient in itself. I Saw it with tmpfs originally. No need to be mmap intensive. Just do lots of IOs on tmpfs. > guess it would also apply to workloads that are malloc-intensive for > large buffers but I'd expect the cache line bounces to only dominate if > there was little or no computation on the resulting buffers. I think you severly underestimate the costs of bouncing cache lines on >2S. > As a result, I wonder how realistic is this test workload and who useful > fixing this problem is in general? It's kind of bad if tmpfs doesn't scale. -Andi -- ak@linux.intel.com -- Speaking for myself only -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-12 19:57 ` Christoph Lameter 2011-10-13 15:06 ` Mel Gorman @ 2011-10-13 15:23 ` Dimitri Sivanich 2011-10-13 15:54 ` Christoph Lameter 1 sibling, 1 reply; 25+ messages in thread From: Dimitri Sivanich @ 2011-10-13 15:23 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman On Wed, Oct 12, 2011 at 02:57:53PM -0500, Christoph Lameter wrote: > On Wed, 12 Oct 2011, Andrew Morton wrote: > > > > Note that this patch is simply to illustrate the gains that can be made here. > > > What I'm looking for is some guidance on an acceptable way to accomplish the > > > task of reducing contention in this area, either by caching these values in a > > > way similar to the attached patch, or by some other mechanism if this is > > > unacceptable. > > > > Yes, the global vm_stat[] array is a problem - I'm surprised it's hung > > around for this long. Altering the sysctl_overcommit_memory mode will > > hide the problem, but that's no good. > > The global vm_stat array is keeping the state for the zone. It would be > even more expensive to calculate this at every point where we need such > data. > > > I think we've discussed switching vm_stat[] to a contention-avoiding > > counter scheme. Simply using <percpu_counter.h> would be the simplest > > approach. They'll introduce inaccuracies but hopefully any problems > > from that will be minor for the global page counters. > > We already have a contention avoiding scheme for counter updates in > vmstat.c. The problem here is that vm_stat is frequently read. Updates > from other cpus that fold counter updates in a deferred way into the > global statistics cause cacheline eviction. The updates occur too frequent > in this load. The test I did slowed down the reads by __vm_enough_memory by caching the values and updating them every two seconds (in the OVERCOMMIT_GUESS area). > > > otoh, I think we've been round this loop before and I don't recall why > > nothing happened. > > The update behavior can be tuned using /proc/sys/vm/stat_interval. > Increase the interval to reduce the folding into the global counter (set > maybe to 10?). This will reduce contention. The other approach is to Increasing this interval to 10 (or even 100) had no effect on the vm_stat contention on a 640 cpu test system, so vmstat_update() is not the culprit. > increase the allowed delta per zone if frequent updates occur via the > overflow checks in vmstat.c. See calculate_*_threshold there. I tried changing the threshold in both directions, with slower throughput in both cases. > > Note that the deltas are current reduced for memory pressure situations > (after recent patches by Mel). This will cause a significant increase in > vm_stat cacheline contention compared to earlier kernels. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-13 15:23 ` Dimitri Sivanich @ 2011-10-13 15:54 ` Christoph Lameter 2011-10-13 20:50 ` Andrew Morton 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2011-10-13 15:54 UTC (permalink / raw) To: Dimitri Sivanich; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman On Thu, 13 Oct 2011, Dimitri Sivanich wrote: > > increase the allowed delta per zone if frequent updates occur via the > > overflow checks in vmstat.c. See calculate_*_threshold there. > > I tried changing the threshold in both directions, with slower throughput in > both cases. If that is the case check for the vm_stat cacheline being shared with another hot kernel variable variable. Maybe that causes cacheline eviction. If there are no updates occurring for a while (due to increased deltas and/or vmstat updates) then the vm_stat cacheline should be able to stay in shared mode in multiple processors and the performance should increase. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-13 15:54 ` Christoph Lameter @ 2011-10-13 20:50 ` Andrew Morton 2011-10-13 21:02 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Andrew Morton @ 2011-10-13 20:50 UTC (permalink / raw) To: Christoph Lameter; +Cc: Dimitri Sivanich, linux-kernel, linux-mm, Mel Gorman On Thu, 13 Oct 2011 10:54:30 -0500 (CDT) Christoph Lameter <cl@gentwo.org> wrote: > On Thu, 13 Oct 2011, Dimitri Sivanich wrote: > > > > increase the allowed delta per zone if frequent updates occur via the > > > overflow checks in vmstat.c. See calculate_*_threshold there. > > > > I tried changing the threshold in both directions, with slower throughput in > > both cases. > > If that is the case check for the vm_stat cacheline being shared with > another hot kernel variable variable. Maybe that causes cacheline > eviction. yup. `nm -n vmlinux'. > If there are no updates occurring for a while (due to increased deltas > and/or vmstat updates) then the vm_stat cacheline should be able to stay > in shared mode in multiple processors and the performance should increase. > We could cacheline align vm_stat[]. But the thing is pretty small - we couild put each entry in its own cacheline. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-13 20:50 ` Andrew Morton @ 2011-10-13 21:02 ` Christoph Lameter 2011-10-13 21:24 ` Andrew Morton 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2011-10-13 21:02 UTC (permalink / raw) To: Andrew Morton; +Cc: Dimitri Sivanich, linux-kernel, linux-mm, Mel Gorman On Thu, 13 Oct 2011, Andrew Morton wrote: > > If there are no updates occurring for a while (due to increased deltas > > and/or vmstat updates) then the vm_stat cacheline should be able to stay > > in shared mode in multiple processors and the performance should increase. > > > > We could cacheline align vm_stat[]. But the thing is pretty small - we > couild put each entry in its own cacheline. Which in turn would increase the cache footprint of some key kernel functions (because they need multiple vm_stat entries) and cause eviction of other cachelines that then reduce overall system performance again. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-13 21:02 ` Christoph Lameter @ 2011-10-13 21:24 ` Andrew Morton 2011-10-14 12:25 ` Dimitri Sivanich 0 siblings, 1 reply; 25+ messages in thread From: Andrew Morton @ 2011-10-13 21:24 UTC (permalink / raw) To: Christoph Lameter; +Cc: Dimitri Sivanich, linux-kernel, linux-mm, Mel Gorman On Thu, 13 Oct 2011 16:02:58 -0500 (CDT) Christoph Lameter <cl@gentwo.org> wrote: > On Thu, 13 Oct 2011, Andrew Morton wrote: > > > > If there are no updates occurring for a while (due to increased deltas > > > and/or vmstat updates) then the vm_stat cacheline should be able to stay > > > in shared mode in multiple processors and the performance should increase. > > > > > > > We could cacheline align vm_stat[]. But the thing is pretty small - we > > couild put each entry in its own cacheline. > > Which in turn would increase the cache footprint of some key kernel > functions (because they need multiple vm_stat entries) and cause eviction > of other cachelines that then reduce overall system performance again. Sure, but we gain performance by not having different CPUs treading on each other when they update different vmstat fields. Sometimes one effect will win and other times the other effect will win. Some engineering is needed.. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-13 21:24 ` Andrew Morton @ 2011-10-14 12:25 ` Dimitri Sivanich 2011-10-14 13:50 ` Dimitri Sivanich 0 siblings, 1 reply; 25+ messages in thread From: Dimitri Sivanich @ 2011-10-14 12:25 UTC (permalink / raw) To: Andrew Morton; +Cc: Christoph Lameter, linux-kernel, linux-mm, Mel Gorman On Thu, Oct 13, 2011 at 02:24:34PM -0700, Andrew Morton wrote: > On Thu, 13 Oct 2011 16:02:58 -0500 (CDT) > Christoph Lameter <cl@gentwo.org> wrote: > > > On Thu, 13 Oct 2011, Andrew Morton wrote: > > > > > > If there are no updates occurring for a while (due to increased deltas > > > > and/or vmstat updates) then the vm_stat cacheline should be able to stay > > > > in shared mode in multiple processors and the performance should increase. > > > > > > > > > > We could cacheline align vm_stat[]. But the thing is pretty small - we > > > couild put each entry in its own cacheline. > > > > Which in turn would increase the cache footprint of some key kernel > > functions (because they need multiple vm_stat entries) and cause eviction > > of other cachelines that then reduce overall system performance again. > > Sure, but we gain performance by not having different CPUs treading on > each other when they update different vmstat fields. Sometimes one > effect will win and other times the other effect will win. Some > engineering is needed.. I think the first step is to determine the role (if any) that false sharing may be playing in this, since that's a simpler fix (cacheline align and pad the array). Then, if necessary, will look at contention issues within the array. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-14 12:25 ` Dimitri Sivanich @ 2011-10-14 13:50 ` Dimitri Sivanich 2011-10-14 13:57 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Dimitri Sivanich @ 2011-10-14 13:50 UTC (permalink / raw) To: Andrew Morton, Christoph Lameter, linux-kernel, linux-mm, Mel Gorman On Fri, Oct 14, 2011 at 07:25:06AM -0500, Dimitri Sivanich wrote: > On Thu, Oct 13, 2011 at 02:24:34PM -0700, Andrew Morton wrote: > > On Thu, 13 Oct 2011 16:02:58 -0500 (CDT) > > Christoph Lameter <cl@gentwo.org> wrote: > > > > > On Thu, 13 Oct 2011, Andrew Morton wrote: > > > > > > > > If there are no updates occurring for a while (due to increased deltas > > > > > and/or vmstat updates) then the vm_stat cacheline should be able to stay > > > > > in shared mode in multiple processors and the performance should increase. > > > > > > > > > > > > > We could cacheline align vm_stat[]. But the thing is pretty small - we > > > > couild put each entry in its own cacheline. > > > > > > Which in turn would increase the cache footprint of some key kernel > > > functions (because they need multiple vm_stat entries) and cause eviction > > > of other cachelines that then reduce overall system performance again. > > > > Sure, but we gain performance by not having different CPUs treading on > > each other when they update different vmstat fields. Sometimes one > > effect will win and other times the other effect will win. Some > > engineering is needed.. > > I think the first step is to determine the role (if any) that false sharing may be playing in this, since that's a simpler fix (cacheline align and pad the array). > Testing on a smaller machine with 46 writer threads in parallel (my original test used 120). Looks as though cache-aligning and padding the end of the vm_stat array results in a ~150 MB/sec speedup. This is a nice improvement for only 46 writer threads, though it's not the full ~250 MB/sec speedup I get from setting OVERCOMMIT_NEVER. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-14 13:50 ` Dimitri Sivanich @ 2011-10-14 13:57 ` Christoph Lameter 2011-10-14 14:19 ` Dimitri Sivanich 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2011-10-14 13:57 UTC (permalink / raw) To: Dimitri Sivanich; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman On Fri, 14 Oct 2011, Dimitri Sivanich wrote: > Testing on a smaller machine with 46 writer threads in parallel (my original > test used 120). > > Looks as though cache-aligning and padding the end of the vm_stat array > results in a ~150 MB/sec speedup. This is a nice improvement for only 46 > writer threads, though it's not the full ~250 MB/sec speedup I get from > setting OVERCOMMIT_NEVER. Add to this the increase in the deltas for the ZVCs and change the stat interval to 10 sec? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-14 13:57 ` Christoph Lameter @ 2011-10-14 14:19 ` Dimitri Sivanich 2011-10-14 14:34 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Dimitri Sivanich @ 2011-10-14 14:19 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman On Fri, Oct 14, 2011 at 08:57:16AM -0500, Christoph Lameter wrote: > On Fri, 14 Oct 2011, Dimitri Sivanich wrote: > > > Testing on a smaller machine with 46 writer threads in parallel (my original > > test used 120). > > > > Looks as though cache-aligning and padding the end of the vm_stat array > > results in a ~150 MB/sec speedup. This is a nice improvement for only 46 > > writer threads, though it's not the full ~250 MB/sec speedup I get from > > setting OVERCOMMIT_NEVER. > > Add to this the increase in the deltas for the ZVCs and change the stat > interval to 10 sec? Increasing the ZVC deltas (threshold value in calculate*threshold == 125) does -seem- to give a small speedup in this case (maybe as much as 50 MB/sec?). Changing the stat interval to 10 seconds still has no effect, with or without the ZVC delta change. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-14 14:19 ` Dimitri Sivanich @ 2011-10-14 14:34 ` Christoph Lameter 2011-10-14 15:18 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2011-10-14 14:34 UTC (permalink / raw) To: Dimitri Sivanich; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman On Fri, 14 Oct 2011, Dimitri Sivanich wrote: > Increasing the ZVC deltas (threshold value in calculate*threshold == 125) > does -seem- to give a small speedup in this case (maybe as much as 50 MB/sec?). Hmm... The question is how much do the VM paths used for the critical path increment the vmstat counters on average per second? If we end up with hundred of updates per second from each thread then we still have a problem that can only be addressed by increasing the deltas beyond 125 meaning the fieldwidth must be increased to support 16 bit counters. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-14 14:34 ` Christoph Lameter @ 2011-10-14 15:18 ` Christoph Lameter 2011-10-14 16:16 ` Dimitri Sivanich 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2011-10-14 15:18 UTC (permalink / raw) To: Dimitri Sivanich; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman Also the whole thing could be optimized by concentrating updates to the vm_stat array at one point in time. If any local per cpu differential overflows then update all the counters in the same cacheline for which we have per cpu differentials. That will defer another acquisition of the cacheline for the next delta overflowing. After an update all the per cpu differentials would be zero. This could be added to zone_page_state_add.... Something like this patch? (Restriction of the updates to the same cacheline missing. Just does everything and the zone_page_state may need uninlining now) --- include/linux/vmstat.h | 19 ++++++++++++++++--- mm/vmstat.c | 10 ++++------ 2 files changed, 20 insertions(+), 9 deletions(-) Index: linux-2.6/include/linux/vmstat.h =================================================================== --- linux-2.6.orig/include/linux/vmstat.h 2011-10-14 09:58:03.000000000 -0500 +++ linux-2.6/include/linux/vmstat.h 2011-10-14 10:08:00.000000000 -0500 @@ -90,10 +90,23 @@ static inline void vm_events_fold_cpu(in extern atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; static inline void zone_page_state_add(long x, struct zone *zone, - enum zone_stat_item item) + enum zone_stat_item item, s8 new_value) { - atomic_long_add(x, &zone->vm_stat[item]); - atomic_long_add(x, &vm_stat[item]); + enum zone_stat_item i; + + for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { + long y; + + if (i == item) + y = this_cpu_xchg(zone->pageset->vm_stat_diff[i], new_value) + x; + else + y = this_cpu_xchg(zone->pageset->vm_stat_diff[i], 0); + + if (y) { + atomic_long_add(y, &zone->vm_stat[item]); + atomic_long_add(y, &vm_stat[item]); + } + } } static inline unsigned long global_page_state(enum zone_stat_item item) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c 2011-10-14 10:04:20.000000000 -0500 +++ linux-2.6/mm/vmstat.c 2011-10-14 10:08:39.000000000 -0500 @@ -221,7 +221,7 @@ void __mod_zone_page_state(struct zone * t = __this_cpu_read(pcp->stat_threshold); if (unlikely(x > t || x < -t)) { - zone_page_state_add(x, zone, item); + zone_page_state_add(x, zone, item, 0); x = 0; } __this_cpu_write(*p, x); @@ -262,8 +262,7 @@ void __inc_zone_state(struct zone *zone, if (unlikely(v > t)) { s8 overstep = t >> 1; - zone_page_state_add(v + overstep, zone, item); - __this_cpu_write(*p, -overstep); + zone_page_state_add(v + overstep, zone, item, -overstep); } } @@ -284,8 +283,7 @@ void __dec_zone_state(struct zone *zone, if (unlikely(v < - t)) { s8 overstep = t >> 1; - zone_page_state_add(v - overstep, zone, item); - __this_cpu_write(*p, overstep); + zone_page_state_add(v - overstep, zone, item, overstep); } } @@ -343,7 +341,7 @@ static inline void mod_state(struct zone } while (this_cpu_cmpxchg(*p, o, n) != o); if (z) - zone_page_state_add(z, zone, item); + zone_page_state_add(z, zone, item, 0); } void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-14 15:18 ` Christoph Lameter @ 2011-10-14 16:16 ` Dimitri Sivanich 2011-10-18 13:48 ` Dimitri Sivanich 0 siblings, 1 reply; 25+ messages in thread From: Dimitri Sivanich @ 2011-10-14 16:16 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman On Fri, Oct 14, 2011 at 10:18:24AM -0500, Christoph Lameter wrote: > Also the whole thing could be optimized by concentrating updates to the > vm_stat array at one point in time. If any local per cpu differential > overflows then update all the counters in the same cacheline for which we have per cpu > differentials. > > That will defer another acquisition of the cacheline for the next delta > overflowing. After an update all the per cpu differentials would be zero. > > This could be added to zone_page_state_add.... > > > Something like this patch? (Restriction of the updates to the same > cacheline missing. Just does everything and the zone_page_state may need > uninlining now) This patch doesn't have much, if any, effect, at least in the 46 writer thread case (NR_VM_EVENT_ITEMS-->NR_VM_ZONE_STAT_ITEMS allowed it to boot :) ). I applied this with the change to align vm_stat. So far cache alignment of vm_data and increasing ZVC delta has the greatest effect. > > --- > include/linux/vmstat.h | 19 ++++++++++++++++--- > mm/vmstat.c | 10 ++++------ > 2 files changed, 20 insertions(+), 9 deletions(-) > > Index: linux-2.6/include/linux/vmstat.h > =================================================================== > --- linux-2.6.orig/include/linux/vmstat.h 2011-10-14 09:58:03.000000000 -0500 > +++ linux-2.6/include/linux/vmstat.h 2011-10-14 10:08:00.000000000 -0500 > @@ -90,10 +90,23 @@ static inline void vm_events_fold_cpu(in > extern atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; > > static inline void zone_page_state_add(long x, struct zone *zone, > - enum zone_stat_item item) > + enum zone_stat_item item, s8 new_value) > { > - atomic_long_add(x, &zone->vm_stat[item]); > - atomic_long_add(x, &vm_stat[item]); > + enum zone_stat_item i; > + > + for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { > + long y; > + > + if (i == item) > + y = this_cpu_xchg(zone->pageset->vm_stat_diff[i], new_value) + x; > + else > + y = this_cpu_xchg(zone->pageset->vm_stat_diff[i], 0); > + > + if (y) { > + atomic_long_add(y, &zone->vm_stat[item]); > + atomic_long_add(y, &vm_stat[item]); > + } > + } > } > > static inline unsigned long global_page_state(enum zone_stat_item item) > Index: linux-2.6/mm/vmstat.c > =================================================================== > --- linux-2.6.orig/mm/vmstat.c 2011-10-14 10:04:20.000000000 -0500 > +++ linux-2.6/mm/vmstat.c 2011-10-14 10:08:39.000000000 -0500 > @@ -221,7 +221,7 @@ void __mod_zone_page_state(struct zone * > t = __this_cpu_read(pcp->stat_threshold); > > if (unlikely(x > t || x < -t)) { > - zone_page_state_add(x, zone, item); > + zone_page_state_add(x, zone, item, 0); > x = 0; > } > __this_cpu_write(*p, x); > @@ -262,8 +262,7 @@ void __inc_zone_state(struct zone *zone, > if (unlikely(v > t)) { > s8 overstep = t >> 1; > > - zone_page_state_add(v + overstep, zone, item); > - __this_cpu_write(*p, -overstep); > + zone_page_state_add(v + overstep, zone, item, -overstep); > } > } > > @@ -284,8 +283,7 @@ void __dec_zone_state(struct zone *zone, > if (unlikely(v < - t)) { > s8 overstep = t >> 1; > > - zone_page_state_add(v - overstep, zone, item); > - __this_cpu_write(*p, overstep); > + zone_page_state_add(v - overstep, zone, item, overstep); > } > } > > @@ -343,7 +341,7 @@ static inline void mod_state(struct zone > } while (this_cpu_cmpxchg(*p, o, n) != o); > > if (z) > - zone_page_state_add(z, zone, item); > + zone_page_state_add(z, zone, item, 0); > } > > void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-14 16:16 ` Dimitri Sivanich @ 2011-10-18 13:48 ` Dimitri Sivanich 2011-10-18 14:36 ` Christoph Lameter 2011-10-18 15:48 ` Andi Kleen 0 siblings, 2 replies; 25+ messages in thread From: Dimitri Sivanich @ 2011-10-18 13:48 UTC (permalink / raw) To: linux-kernel, linux-mm; +Cc: Christoph Lameter, Andrew Morton, Mel Gorman On Fri, Oct 14, 2011 at 11:16:03AM -0500, Dimitri Sivanich wrote: > On Fri, Oct 14, 2011 at 10:18:24AM -0500, Christoph Lameter wrote: > > Also the whole thing could be optimized by concentrating updates to the > > vm_stat array at one point in time. If any local per cpu differential > > overflows then update all the counters in the same cacheline for which we have per cpu > > differentials. > > > > That will defer another acquisition of the cacheline for the next delta > > overflowing. After an update all the per cpu differentials would be zero. > > > > This could be added to zone_page_state_add.... > > > > > > Something like this patch? (Restriction of the updates to the same > > cacheline missing. Just does everything and the zone_page_state may need > > uninlining now) > > This patch doesn't have much, if any, effect, at least in the 46 writer thread > case (NR_VM_EVENT_ITEMS-->NR_VM_ZONE_STAT_ITEMS allowed it to boot :) ). > I applied this with the change to align vm_stat. > > So far cache alignment of vm_data and increasing ZVC delta has the greatest > effect. After further testing, substantial increases in ZVC delta along with cache alignment of the vm_stat array bring the tmpfs writeback throughput numbers to about where they are with vm.overcommit_memory==OVERCOMMIT_NEVER. I still need to determine how high the ZVC delta needs to be to achieve this performance, but it is greater than 125. Would it make sense to have the ZVC delta be tuneable (via /proc/sys/vm?), keeping the same default behavior as what we currently have? If the thresholds get set higher, it could be that some values that don't normally have as big a delta may not get updated frequently enough. Should we maybe update all values everytime a threshold is hit, as the patch below was intending? Note that having each counter in a separate cacheline does not have much, if any, effect. > > > > > --- > > include/linux/vmstat.h | 19 ++++++++++++++++--- > > mm/vmstat.c | 10 ++++------ > > 2 files changed, 20 insertions(+), 9 deletions(-) > > > > Index: linux-2.6/include/linux/vmstat.h > > =================================================================== > > --- linux-2.6.orig/include/linux/vmstat.h 2011-10-14 09:58:03.000000000 -0500 > > +++ linux-2.6/include/linux/vmstat.h 2011-10-14 10:08:00.000000000 -0500 > > @@ -90,10 +90,23 @@ static inline void vm_events_fold_cpu(in > > extern atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; > > > > static inline void zone_page_state_add(long x, struct zone *zone, > > - enum zone_stat_item item) > > + enum zone_stat_item item, s8 new_value) > > { > > - atomic_long_add(x, &zone->vm_stat[item]); > > - atomic_long_add(x, &vm_stat[item]); > > + enum zone_stat_item i; > > + > > + for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { > > + long y; > > + > > + if (i == item) > > + y = this_cpu_xchg(zone->pageset->vm_stat_diff[i], new_value) + x; > > + else > > + y = this_cpu_xchg(zone->pageset->vm_stat_diff[i], 0); > > + > > + if (y) { > > + atomic_long_add(y, &zone->vm_stat[item]); > > + atomic_long_add(y, &vm_stat[item]); > > + } > > + } > > } > > > > static inline unsigned long global_page_state(enum zone_stat_item item) > > Index: linux-2.6/mm/vmstat.c > > =================================================================== > > --- linux-2.6.orig/mm/vmstat.c 2011-10-14 10:04:20.000000000 -0500 > > +++ linux-2.6/mm/vmstat.c 2011-10-14 10:08:39.000000000 -0500 > > @@ -221,7 +221,7 @@ void __mod_zone_page_state(struct zone * > > t = __this_cpu_read(pcp->stat_threshold); > > > > if (unlikely(x > t || x < -t)) { > > - zone_page_state_add(x, zone, item); > > + zone_page_state_add(x, zone, item, 0); > > x = 0; > > } > > __this_cpu_write(*p, x); > > @@ -262,8 +262,7 @@ void __inc_zone_state(struct zone *zone, > > if (unlikely(v > t)) { > > s8 overstep = t >> 1; > > > > - zone_page_state_add(v + overstep, zone, item); > > - __this_cpu_write(*p, -overstep); > > + zone_page_state_add(v + overstep, zone, item, -overstep); > > } > > } > > > > @@ -284,8 +283,7 @@ void __dec_zone_state(struct zone *zone, > > if (unlikely(v < - t)) { > > s8 overstep = t >> 1; > > > > - zone_page_state_add(v - overstep, zone, item); > > - __this_cpu_write(*p, overstep); > > + zone_page_state_add(v - overstep, zone, item, overstep); > > } > > } > > > > @@ -343,7 +341,7 @@ static inline void mod_state(struct zone > > } while (this_cpu_cmpxchg(*p, o, n) != o); > > > > if (z) > > - zone_page_state_add(z, zone, item); > > + zone_page_state_add(z, zone, item, 0); > > } > > > > void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-18 13:48 ` Dimitri Sivanich @ 2011-10-18 14:36 ` Christoph Lameter 2011-10-18 15:48 ` Andi Kleen 1 sibling, 0 replies; 25+ messages in thread From: Christoph Lameter @ 2011-10-18 14:36 UTC (permalink / raw) To: Dimitri Sivanich; +Cc: linux-kernel, linux-mm, Andrew Morton, Mel Gorman On Tue, 18 Oct 2011, Dimitri Sivanich wrote: > After further testing, substantial increases in ZVC delta along with cache alignment > of the vm_stat array bring the tmpfs writeback throughput numbers to about where > they are with vm.overcommit_memory==OVERCOMMIT_NEVER. I still need to determine how > high the ZVC delta needs to be to achieve this performance, but it is greater than 125. Sounds like this is the way to go then. > Would it make sense to have the ZVC delta be tuneable (via /proc/sys/vm?), keeping the > same default behavior as what we currently have? I think so. > If the thresholds get set higher, it could be that some values that don't normally have > as big a delta may not get updated frequently enough. Should we maybe update all values > everytime a threshold is hit, as the patch below was intending? Mel can probably chime in on the accuracy needed for reclaim etc. We already have an automatic reduction of the delta if the vm gets into problems. > Note that having each counter in a separate cacheline does not have much, if any, > effect. It may have a good effect if you group the counters according to their uses into different cachelines. Counters that are typically updates together need to be close to each other. Also you could modify my patch to only update counters in the same cacheline. I think doing all counters caused the problems with that patch because we now touch multiple cachelines and increase the cache footprint of critical vm functions. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-18 13:48 ` Dimitri Sivanich 2011-10-18 14:36 ` Christoph Lameter @ 2011-10-18 15:48 ` Andi Kleen 2011-10-19 1:16 ` David Rientjes 1 sibling, 1 reply; 25+ messages in thread From: Andi Kleen @ 2011-10-18 15:48 UTC (permalink / raw) To: Dimitri Sivanich Cc: linux-kernel, linux-mm, Christoph Lameter, Andrew Morton, Mel Gorman Dimitri Sivanich <sivanich@sgi.com> writes: > > Would it make sense to have the ZVC delta be tuneable (via /proc/sys/vm?), keeping the > same default behavior as what we currently have? Tunable is bad. We don't really want a "hundreds of lines magic shell script to make large systems perform". Please find a way to auto tune. -Andi -- ak@linux.intel.com -- Speaking for myself only -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-18 15:48 ` Andi Kleen @ 2011-10-19 1:16 ` David Rientjes 2011-10-19 14:54 ` Dimitri Sivanich 0 siblings, 1 reply; 25+ messages in thread From: David Rientjes @ 2011-10-19 1:16 UTC (permalink / raw) To: Andi Kleen Cc: Dimitri Sivanich, linux-kernel, linux-mm, Christoph Lameter, Andrew Morton, Mel Gorman On Tue, 18 Oct 2011, Andi Kleen wrote: > > Would it make sense to have the ZVC delta be tuneable (via /proc/sys/vm?), keeping the > > same default behavior as what we currently have? > > Tunable is bad. We don't really want a "hundreds of lines magic shell script to > make large systems perform". Please find a way to auto tune. > Agreed, and I think even if we had a tunable that it would result in potentially erradic VM performance because some areas depend on "fairly accurate" ZVCs and it wouldn't be clear that you're trading other unknown VM issues that will affect your workload because you've increased the deltas. Let's try to avoid having to ask "what is your ZVC delta tunable set at?" when someone reports a bug about reclaim stopping preemptively. That said, perhaps we need higher deltas by default and then hints in key areas in the form of sync_stats_if_delta_above(x) calls that would do zone_page_state_add() only when that kind of precision is actually needed. For public interfaces, that would be very easy to audit to see what the level of precision is when parsing the data. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-19 1:16 ` David Rientjes @ 2011-10-19 14:54 ` Dimitri Sivanich 2011-10-19 15:31 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Dimitri Sivanich @ 2011-10-19 14:54 UTC (permalink / raw) To: David Rientjes Cc: Andi Kleen, linux-kernel, linux-mm, Christoph Lameter, Andrew Morton, Mel Gorman On Tue, Oct 18, 2011 at 06:16:21PM -0700, David Rientjes wrote: > On Tue, 18 Oct 2011, Andi Kleen wrote: > > > > Would it make sense to have the ZVC delta be tuneable (via /proc/sys/vm?), keeping the > > > same default behavior as what we currently have? > > > > Tunable is bad. We don't really want a "hundreds of lines magic shell script to > > make large systems perform". Please find a way to auto tune. > > > > Agreed, and I think even if we had a tunable that it would result in > potentially erradic VM performance because some areas depend on "fairly > accurate" ZVCs and it wouldn't be clear that you're trading other unknown > VM issues that will affect your workload because you've increased the > deltas. Let's try to avoid having to ask "what is your ZVC delta tunable > set at?" when someone reports a bug about reclaim stopping preemptively. Yes, I'm inclined to agree. > > That said, perhaps we need higher deltas by default and then hints in key > areas in the form of sync_stats_if_delta_above(x) calls that would do > zone_page_state_add() only when that kind of precision is actually needed. > For public interfaces, that would be very easy to audit to see what the > level of precision is when parsing the data. I did some manual tuning to see what deltas would be needed to achieve the greatest tmpfs writeback performance on a system with 640 cpus and 64 nodes: For 120 threads writing in parallel (each to it's own mountpoint), the threshold needs to be on the order of 1000. At a threshold of 750, I start to see a slowdown of 50-60 MB/sec. For 400 threads writing in parallel, the threshold needs to be on the order of 2000 (although we're off by about 40 MB/sec at that point). The necessary deltas in these cases are quite a bit higher than the current 125 maximum (see calculate*threshold in mm/vmstat.c). I like the idea of having certain areas triggering vm_stat sync, as long as we know what those key areas are and how often they might be called. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-19 14:54 ` Dimitri Sivanich @ 2011-10-19 15:31 ` Christoph Lameter 2011-10-24 14:59 ` Dimitri Sivanich 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2011-10-19 15:31 UTC (permalink / raw) To: Dimitri Sivanich Cc: David Rientjes, Andi Kleen, linux-kernel, linux-mm, Andrew Morton, Mel Gorman On Wed, 19 Oct 2011, Dimitri Sivanich wrote: > For 120 threads writing in parallel (each to it's own mountpoint), the > threshold needs to be on the order of 1000. At a threshold of 750, I > start to see a slowdown of 50-60 MB/sec. > > For 400 threads writing in parallel, the threshold needs to be on the order > of 2000 (although we're off by about 40 MB/sec at that point). > > The necessary deltas in these cases are quite a bit higher than the current > 125 maximum (see calculate*threshold in mm/vmstat.c). > > I like the idea of having certain areas triggering vm_stat sync, as long > as we know what those key areas are and how often they might be called. You could potentially reduce the maximum necessary by applying my earlier patch (but please reduce the counters touched to the current cacheline). That should reduce the number of updates in the global cacheline and allow you to reduce the very high deltas that you have to deal with now. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-19 15:31 ` Christoph Lameter @ 2011-10-24 14:59 ` Dimitri Sivanich 0 siblings, 0 replies; 25+ messages in thread From: Dimitri Sivanich @ 2011-10-24 14:59 UTC (permalink / raw) To: Christoph Lameter Cc: David Rientjes, Andi Kleen, linux-kernel, linux-mm, Andrew Morton, Mel Gorman On Wed, Oct 19, 2011 at 10:31:54AM -0500, Christoph Lameter wrote: > On Wed, 19 Oct 2011, Dimitri Sivanich wrote: > > > For 120 threads writing in parallel (each to it's own mountpoint), the > > threshold needs to be on the order of 1000. At a threshold of 750, I > > start to see a slowdown of 50-60 MB/sec. > > > > For 400 threads writing in parallel, the threshold needs to be on the order > > of 2000 (although we're off by about 40 MB/sec at that point). > > > > The necessary deltas in these cases are quite a bit higher than the current > > 125 maximum (see calculate*threshold in mm/vmstat.c). > > > > I like the idea of having certain areas triggering vm_stat sync, as long > > as we know what those key areas are and how often they might be called. > > You could potentially reduce the maximum necessary by applying my earlier > patch (but please reduce the counters touched to the current cacheline). > That should reduce the number of updates in the global cacheline and allow > you to reduce the very high deltas that you have to deal with now. I tried updating whole, single vm_stat cachelines as you suggest, but that made little if any difference in tmpfs writeback performance. The same higher threshold values were still necessary to significantly reduce the contention seen in __vm_enough_memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CADE8fzrdMOBF1RyyEpMVi8aKcgOVKRQSKi0=c1Qvh3p6hHcXRA@mail.gmail.com>]
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory [not found] ` <CADE8fzrdMOBF1RyyEpMVi8aKcgOVKRQSKi0=c1Qvh3p6hHcXRA@mail.gmail.com> @ 2011-10-13 0:07 ` Tim Chen 2011-10-13 14:15 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Tim Chen @ 2011-10-13 0:07 UTC (permalink / raw) To: Andrew Morton Cc: Dimitri Sivanich, linux-kernel, linux-mm, Christoph Lameter, ak Andrew Morton wrote: > Yes, the global vm_stat[] array is a problem - I'm surprised it's hung > around for this long. Altering the sysctl_overcommit_memory mode will > hide the problem, but that's no good. > > I think we've discussed switching vm_stat[] to a contention-avoiding > counter scheme. Simply using <percpu_counter.h> would be the simplest > approach. They'll introduce inaccuracies but hopefully any problems > from that will be minor for the global page counters. > > otoh, I think we've been round this loop before and I don't recall why > nothing happened. Yeah, we have had this discussion on vm_enough_memory before. https://lkml.org/lkml/2011/1/26/473 The current version of per cpu counter was not really suitable because the batch size is not appropriate. I've tried to use per cpu counter with batch size adjusted in my attempt. Andrew has suggested having an elastic batch size that's proportional to the size of the central counter but I haven't gotten around to try that out. Tim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory 2011-10-13 0:07 ` Tim Chen @ 2011-10-13 14:15 ` Christoph Lameter 0 siblings, 0 replies; 25+ messages in thread From: Christoph Lameter @ 2011-10-13 14:15 UTC (permalink / raw) To: Tim Chen; +Cc: Andrew Morton, Dimitri Sivanich, linux-kernel, linux-mm, ak On Wed, 12 Oct 2011, Tim Chen wrote: > Yeah, we have had this discussion on vm_enough_memory before. > > https://lkml.org/lkml/2011/1/26/473 > > The current version of per cpu counter was not really suitable because > the batch size is not appropriate. I've tried to use per cpu counter > with batch size adjusted in my attempt. Andrew has suggested having an > elastic batch size that's proportional to the size of the central > counter but I haven't gotten around to try that out. These counter are already managed as a ZVC counter. It may be easiest to adjust the batching parameters for those to solve this issue. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2011-10-24 14:59 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20111012160202.GA18666@sgi.com> 2011-10-12 19:01 ` [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory Andrew Morton 2011-10-12 19:57 ` Christoph Lameter 2011-10-13 15:06 ` Mel Gorman 2011-10-13 15:59 ` Andi Kleen 2011-10-13 15:23 ` Dimitri Sivanich 2011-10-13 15:54 ` Christoph Lameter 2011-10-13 20:50 ` Andrew Morton 2011-10-13 21:02 ` Christoph Lameter 2011-10-13 21:24 ` Andrew Morton 2011-10-14 12:25 ` Dimitri Sivanich 2011-10-14 13:50 ` Dimitri Sivanich 2011-10-14 13:57 ` Christoph Lameter 2011-10-14 14:19 ` Dimitri Sivanich 2011-10-14 14:34 ` Christoph Lameter 2011-10-14 15:18 ` Christoph Lameter 2011-10-14 16:16 ` Dimitri Sivanich 2011-10-18 13:48 ` Dimitri Sivanich 2011-10-18 14:36 ` Christoph Lameter 2011-10-18 15:48 ` Andi Kleen 2011-10-19 1:16 ` David Rientjes 2011-10-19 14:54 ` Dimitri Sivanich 2011-10-19 15:31 ` Christoph Lameter 2011-10-24 14:59 ` Dimitri Sivanich [not found] ` <CADE8fzrdMOBF1RyyEpMVi8aKcgOVKRQSKi0=c1Qvh3p6hHcXRA@mail.gmail.com> 2011-10-13 0:07 ` Tim Chen 2011-10-13 14:15 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).