linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
	Glauber Costa <glommer@gmail.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH -mm v2 3/9] vmscan: shrink slab on memcg pressure
Date: Mon, 10 Nov 2014 15:03:42 +1100	[thread overview]
Message-ID: <20141110040342.GM23575@dastard> (raw)
In-Reply-To: <20141106152135.GA17628@phnom.home.cmpxchg.org>

On Thu, Nov 06, 2014 at 10:21:35AM -0500, Johannes Weiner wrote:
> On Fri, Oct 24, 2014 at 02:37:34PM +0400, Vladimir Davydov wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a384339bf718..2cf6b04a4e0c 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,6 +339,26 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker,
> >  	return freed;
> >  }
> >  
> > +static unsigned long
> > +run_shrinker(struct shrink_control *shrinkctl, struct shrinker *shrinker,
> > +	     unsigned long nr_pages_scanned, unsigned long lru_pages)
> > +{
> > +	unsigned long freed = 0;
> > +
> > +	if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) {
> > +		shrinkctl->nid = 0;
> > +		return shrink_slab_node(shrinkctl, shrinker,
> > +					nr_pages_scanned, lru_pages);
> > +	}
> > +
> > +	for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) {
> > +		if (node_online(shrinkctl->nid))
> > +			freed += shrink_slab_node(shrinkctl, shrinker,
> > +						  nr_pages_scanned, lru_pages);
> > +	}
> > +	return freed;
> > +}
> 
> The slab shrinking logic accumulates the lru pages, as well as the
> nodes_to_scan mask, when going over the zones, only to go over the
> zones here again using the accumulated node information.  Why not just
> invoke the thing per-zone instead in the first place?

It's not iterating zones here - it's iterating nodes. This is the
external interface that other subsystems call to cause reclaim to
occur, and they can specify multiple nodes to scan at once (e.g.
drop-slab()). Hence we have to iterate....

Indeed, shrink_zones() requires this because it can be passed an
arbtrary zonelist that may span multiple nodes. Hence shrink_slab()
can be passed a shrinkctl with multiple nodes set in it's nodemask
and so, again, iteration is required.

If you want other callers from the VM that guarantee only a single
node needs to be scanned (such as __zone_reclaim()) to avoid the
zone iteration, then factor the code such that shrink_slab_node()
can be called directly by those functions.

> Kswapd already
> does that (although it could probably work with the per-zone lru_pages
> and nr_scanned deltas) and direct reclaim should as well.  It would
> simplify the existing code as well as your series a lot.
> 
> > +		/*
> > +		 * For memcg-aware shrinkers iterate over the target memcg
> > +		 * hierarchy and run the shrinker on each kmem-active memcg
> > +		 * found in the hierarchy.
> > +		 */
> > +		shrinkctl->memcg = shrinkctl->target_mem_cgroup;
> > +		do {
> > +			if (!shrinkctl->memcg ||
> > +			    memcg_kmem_is_active(shrinkctl->memcg))
> > +				freed += run_shrinker(shrinkctl, shrinker,
> >  						nr_pages_scanned, lru_pages);
> > -
> > -		}
> > +		} while ((shrinkctl->memcg =
> > +			  mem_cgroup_iter(shrinkctl->target_mem_cgroup,
> > +					  shrinkctl->memcg, NULL)) != NULL);
> 
> More symptoms of the above.  This hierarchy walk is duplicative and
> potentially quite expensive.

Same again - if the "zone" being reclaimed is controlled by a memcg
rather than a node ID, then ensure that shrink_slab_foo() can be
called directly with the correct shrinkctl configuration to avoid
unnecessary iteration.

> The reclaim code walks zonelists according to a nodemask, and within
> each zone it walks lruvecs according to the memcg hierarchy.  The
> shrinkers are wrong in making up an ad-hoc concept of NUMA nodes that
> otherwise does not exist anywhere in the VM.

Hardly. the shrinker API is an *external VM interface*, just like
memory allocation is an external interface.  Node IDs and node masks
are exactly the way memory locality is conveyed to the MM subsystem
during allocation, so reclaim interfaces should match that for
consistency. IOWs, the shrinker matches the "ad-hoc concept of NUMA
nodes" that exists everywhere *outside* the VM.

IOWs, the shrinkers have not "made up" anything - they conform to
the existing VM abstractions that everyone is used to. Yes, the
layers between the core VM LRU reclaim code and the shrinker
infrastructure could do with some improvement and refinement, but
the external interface is consistent with all the other external
locality interfaces the VM provides....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2014-11-10  4:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-24 10:37 [PATCH -mm v2 0/9] Per memcg slab shrinkers Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 1/9] list_lru: introduce list_lru_shrink_{count,walk} Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 2/9] fs: consolidate {nr,free}_cached_objects args in shrink_control Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 3/9] vmscan: shrink slab on memcg pressure Vladimir Davydov
2014-11-06 15:21   ` Johannes Weiner
2014-11-06 15:42     ` Vladimir Davydov
2014-11-10  4:03     ` Dave Chinner [this message]
2014-10-24 10:37 ` [PATCH -mm v2 4/9] memcg: rename some cache id related variables Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 5/9] memcg: add rwsem to sync against memcg_caches arrays relocation Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 6/9] list_lru: get rid of ->active_nodes Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 7/9] list_lru: organize all list_lrus to list Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 8/9] list_lru: introduce per-memcg lists Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 9/9] fs: make shrinker memcg aware Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141110040342.GM23575@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=glommer@gmail.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=vdavydov@parallels.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).