Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dimitri Sivanich <sivanich@sgi.com>
To: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Christoph Lameter <cl@gentwo.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@csn.ul.ie>
Subject: Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory
Date: Wed, 19 Oct 2011 09:54:58 -0500	[thread overview]
Message-ID: <20111019145458.GA9266@sgi.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1110181806570.12850@chino.kir.corp.google.com>

On Tue, Oct 18, 2011 at 06:16:21PM -0700, David Rientjes wrote:
> On Tue, 18 Oct 2011, Andi Kleen wrote:
> 
> > > Would it make sense to have the ZVC delta be tuneable (via /proc/sys/vm?), keeping the
> > > same default behavior as what we currently have?
> > 
> > Tunable is bad. We don't really want a "hundreds of lines magic shell script to
> > make large systems perform". Please find a way to auto tune.
> > 
> 
> Agreed, and I think even if we had a tunable that it would result in 
> potentially erradic VM performance because some areas depend on "fairly 
> accurate" ZVCs and it wouldn't be clear that you're trading other unknown 
> VM issues that will affect your workload because you've increased the 
> deltas.  Let's try to avoid having to ask "what is your ZVC delta tunable 
> set at?" when someone reports a bug about reclaim stopping preemptively.

Yes, I'm inclined to agree.

> 
> That said, perhaps we need higher deltas by default and then hints in key 
> areas in the form of sync_stats_if_delta_above(x) calls that would do 
> zone_page_state_add() only when that kind of precision is actually needed.  
> For public interfaces, that would be very easy to audit to see what the 
> level of precision is when parsing the data.

I did some manual tuning to see what deltas would be needed to achieve the
greatest tmpfs writeback performance on a system with 640 cpus and 64 nodes:

For 120 threads writing in parallel (each to it's own mountpoint), the
threshold needs to be on the order of 1000.  At a threshold of 750, I
start to see a slowdown of 50-60 MB/sec.

For 400 threads writing in parallel, the threshold needs to be on the order
of 2000 (although we're off by about 40 MB/sec at that point).

The necessary deltas in these cases are quite a bit higher than the current
125 maximum (see calculate*threshold in mm/vmstat.c).

I like the idea of having certain areas triggering vm_stat sync, as long
as we know what those key areas are and how often they might be called.

WARNING: multiple messages have this Message-ID (diff)

From: Dimitri Sivanich <sivanich@sgi.com>
To: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Christoph Lameter <cl@gentwo.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@csn.ul.ie>
Subject: Re: [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory
Date: Wed, 19 Oct 2011 09:54:58 -0500	[thread overview]
Message-ID: <20111019145458.GA9266@sgi.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1110181806570.12850@chino.kir.corp.google.com>

On Tue, Oct 18, 2011 at 06:16:21PM -0700, David Rientjes wrote:
> On Tue, 18 Oct 2011, Andi Kleen wrote:
> 
> > > Would it make sense to have the ZVC delta be tuneable (via /proc/sys/vm?), keeping the
> > > same default behavior as what we currently have?
> > 
> > Tunable is bad. We don't really want a "hundreds of lines magic shell script to
> > make large systems perform". Please find a way to auto tune.
> > 
> 
> Agreed, and I think even if we had a tunable that it would result in 
> potentially erradic VM performance because some areas depend on "fairly 
> accurate" ZVCs and it wouldn't be clear that you're trading other unknown 
> VM issues that will affect your workload because you've increased the 
> deltas.  Let's try to avoid having to ask "what is your ZVC delta tunable 
> set at?" when someone reports a bug about reclaim stopping preemptively.

Yes, I'm inclined to agree.

> 
> That said, perhaps we need higher deltas by default and then hints in key 
> areas in the form of sync_stats_if_delta_above(x) calls that would do 
> zone_page_state_add() only when that kind of precision is actually needed.  
> For public interfaces, that would be very easy to audit to see what the 
> level of precision is when parsing the data.

I did some manual tuning to see what deltas would be needed to achieve the
greatest tmpfs writeback performance on a system with 640 cpus and 64 nodes:

For 120 threads writing in parallel (each to it's own mountpoint), the
threshold needs to be on the order of 1000.  At a threshold of 750, I
start to see a slowdown of 50-60 MB/sec.

For 400 threads writing in parallel, the threshold needs to be on the order
of 2000 (although we're off by about 40 MB/sec at that point).

The necessary deltas in these cases are quite a bit higher than the current
125 maximum (see calculate*threshold in mm/vmstat.c).

I like the idea of having certain areas triggering vm_stat sync, as long
as we know what those key areas are and how often they might be called.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-10-19 14:55 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-12 16:02 [PATCH] Reduce vm_stat cacheline contention in __vm_enough_memory Dimitri Sivanich
2011-10-12 19:01 ` Andrew Morton
2011-10-12 19:01   ` Andrew Morton
2011-10-12 19:57   ` Christoph Lameter
2011-10-12 19:57     ` Christoph Lameter
2011-10-13 15:06     ` Mel Gorman
2011-10-13 15:06       ` Mel Gorman
2011-10-13 15:59       ` Andi Kleen
2011-10-13 15:59         ` Andi Kleen
2011-10-13 15:23     ` Dimitri Sivanich
2011-10-13 15:23       ` Dimitri Sivanich
2011-10-13 15:54       ` Christoph Lameter
2011-10-13 15:54         ` Christoph Lameter
2011-10-13 20:50         ` Andrew Morton
2011-10-13 20:50           ` Andrew Morton
2011-10-13 21:02           ` Christoph Lameter
2011-10-13 21:02             ` Christoph Lameter
2011-10-13 21:24             ` Andrew Morton
2011-10-13 21:24               ` Andrew Morton
2011-10-14 12:25               ` Dimitri Sivanich
2011-10-14 12:25                 ` Dimitri Sivanich
2011-10-14 13:50                 ` Dimitri Sivanich
2011-10-14 13:50                   ` Dimitri Sivanich
2011-10-14 13:57                   ` Christoph Lameter
2011-10-14 13:57                     ` Christoph Lameter
2011-10-14 14:19                     ` Dimitri Sivanich
2011-10-14 14:19                       ` Dimitri Sivanich
2011-10-14 14:34                       ` Christoph Lameter
2011-10-14 14:34                         ` Christoph Lameter
2011-10-14 15:18                         ` Christoph Lameter
2011-10-14 15:18                           ` Christoph Lameter
2011-10-14 16:16                           ` Dimitri Sivanich
2011-10-14 16:16                             ` Dimitri Sivanich
2011-10-18 13:48                             ` Dimitri Sivanich
2011-10-18 13:48                               ` Dimitri Sivanich
2011-10-18 14:36                               ` Christoph Lameter
2011-10-18 14:36                                 ` Christoph Lameter
2011-10-18 15:48                               ` Andi Kleen
2011-10-18 15:48                                 ` Andi Kleen
2011-10-19  1:16                                 ` David Rientjes
2011-10-19  1:16                                   ` David Rientjes
2011-10-19 14:54                                   ` Dimitri Sivanich [this message]
2011-10-19 14:54                                     ` Dimitri Sivanich
2011-10-19 15:31                                     ` Christoph Lameter
2011-10-19 15:31                                       ` Christoph Lameter
2011-10-24 14:59                                       ` Dimitri Sivanich
2011-10-24 14:59                                         ` Dimitri Sivanich
     [not found]   ` <CADE8fzrdMOBF1RyyEpMVi8aKcgOVKRQSKi0=c1Qvh3p6hHcXRA@mail.gmail.com>
2011-10-13  0:07     ` Tim Chen
2011-10-13  0:07       ` Tim Chen
2011-10-13 14:15       ` Christoph Lameter
2011-10-13 14:15         ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111019145458.GA9266@sgi.com \
    --to=sivanich@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=cl@gentwo.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.