Re: [PATCH -mm v2 3/9] vmscan: shrink slab on memcg pressure

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vladimir Davydov <vdavydov@parallels.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
	Glauber Costa <glommer@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH -mm v2 3/9] vmscan: shrink slab on memcg pressure
Date: Thu, 6 Nov 2014 18:42:04 +0300	[thread overview]
Message-ID: <20141106154204.GG4839@esperanza> (raw)
In-Reply-To: <20141106152135.GA17628@phnom.home.cmpxchg.org>

Hi Johannes,

On Thu, Nov 06, 2014 at 10:21:35AM -0500, Johannes Weiner wrote:
> On Fri, Oct 24, 2014 at 02:37:34PM +0400, Vladimir Davydov wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a384339bf718..2cf6b04a4e0c 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,6 +339,26 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker,
> >  	return freed;
> >  }
> >  
> > +static unsigned long
> > +run_shrinker(struct shrink_control *shrinkctl, struct shrinker *shrinker,
> > +	     unsigned long nr_pages_scanned, unsigned long lru_pages)
> > +{
> > +	unsigned long freed = 0;
> > +
> > +	if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) {
> > +		shrinkctl->nid = 0;
> > +		return shrink_slab_node(shrinkctl, shrinker,
> > +					nr_pages_scanned, lru_pages);
> > +	}
> > +
> > +	for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) {
> > +		if (node_online(shrinkctl->nid))
> > +			freed += shrink_slab_node(shrinkctl, shrinker,
> > +						  nr_pages_scanned, lru_pages);
> > +	}
> > +	return freed;
> > +}
> 
> The slab shrinking logic accumulates the lru pages, as well as the
> nodes_to_scan mask, when going over the zones, only to go over the
> zones here again using the accumulated node information.  Why not just
> invoke the thing per-zone instead in the first place?  Kswapd already
> does that (although it could probably work with the per-zone lru_pages
> and nr_scanned deltas) and direct reclaim should as well.  It would
> simplify the existing code as well as your series a lot.

100% agree. Yet another argument for invoking shrinkers per-zone is soft
(or low?) memory limit reclaim (when it's fixed/rewritten): the current
code would shrink slab of all memory cgroups even if only those that
exceeded the limit were scanned - unfair.

> 
> > +		/*
> > +		 * For memcg-aware shrinkers iterate over the target memcg
> > +		 * hierarchy and run the shrinker on each kmem-active memcg
> > +		 * found in the hierarchy.
> > +		 */
> > +		shrinkctl->memcg = shrinkctl->target_mem_cgroup;
> > +		do {
> > +			if (!shrinkctl->memcg ||
> > +			    memcg_kmem_is_active(shrinkctl->memcg))
> > +				freed += run_shrinker(shrinkctl, shrinker,
> >  						nr_pages_scanned, lru_pages);
> > -
> > -		}
> > +		} while ((shrinkctl->memcg =
> > +			  mem_cgroup_iter(shrinkctl->target_mem_cgroup,
> > +					  shrinkctl->memcg, NULL)) != NULL);
> 
> More symptoms of the above.  This hierarchy walk is duplicative and
> potentially quite expensive.
> 
> > @@ -2381,6 +2414,7 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> >  	gfp_t orig_mask;
> >  	struct shrink_control shrink = {
> >  		.gfp_mask = sc->gfp_mask,
> > +		.target_mem_cgroup = sc->target_mem_cgroup,
> >  	};
> >  	enum zone_type requested_highidx = gfp_zone(sc->gfp_mask);
> >  	bool reclaimable = false;
> > @@ -2400,18 +2434,22 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> >  					gfp_zone(sc->gfp_mask), sc->nodemask) {
> >  		if (!populated_zone(zone))
> >  			continue;
> > +
> > +		if (global_reclaim(sc) &&
> > +		    !cpuset_zone_allowed(zone, GFP_KERNEL | __GFP_HARDWALL))
> > +			continue;
> > +
> > +		lru_pages += global_reclaim(sc) ?
> > +				zone_reclaimable_pages(zone) :
> > +				mem_cgroup_zone_reclaimable_pages(zone,
> > +						sc->target_mem_cgroup);
> > +		node_set(zone_to_nid(zone), shrink.nodes_to_scan);
> 
> And yet another costly hierarchy walk.
> 
> The reclaim code walks zonelists according to a nodemask, and within
> each zone it walks lruvecs according to the memcg hierarchy.  The
> shrinkers are wrong in making up an ad-hoc concept of NUMA nodes that
> otherwise does not exist anywhere in the VM.  Please integrate them
> properly instead of adding more duplication on top.

Will do.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Vladimir Davydov <vdavydov@parallels.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
	Glauber Costa <glommer@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -mm v2 3/9] vmscan: shrink slab on memcg pressure
Date: Thu, 6 Nov 2014 18:42:04 +0300	[thread overview]
Message-ID: <20141106154204.GG4839@esperanza> (raw)
In-Reply-To: <20141106152135.GA17628@phnom.home.cmpxchg.org>

Hi Johannes,

On Thu, Nov 06, 2014 at 10:21:35AM -0500, Johannes Weiner wrote:
> On Fri, Oct 24, 2014 at 02:37:34PM +0400, Vladimir Davydov wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a384339bf718..2cf6b04a4e0c 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,6 +339,26 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker,
> >  	return freed;
> >  }
> >  
> > +static unsigned long
> > +run_shrinker(struct shrink_control *shrinkctl, struct shrinker *shrinker,
> > +	     unsigned long nr_pages_scanned, unsigned long lru_pages)
> > +{
> > +	unsigned long freed = 0;
> > +
> > +	if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) {
> > +		shrinkctl->nid = 0;
> > +		return shrink_slab_node(shrinkctl, shrinker,
> > +					nr_pages_scanned, lru_pages);
> > +	}
> > +
> > +	for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) {
> > +		if (node_online(shrinkctl->nid))
> > +			freed += shrink_slab_node(shrinkctl, shrinker,
> > +						  nr_pages_scanned, lru_pages);
> > +	}
> > +	return freed;
> > +}
> 
> The slab shrinking logic accumulates the lru pages, as well as the
> nodes_to_scan mask, when going over the zones, only to go over the
> zones here again using the accumulated node information.  Why not just
> invoke the thing per-zone instead in the first place?  Kswapd already
> does that (although it could probably work with the per-zone lru_pages
> and nr_scanned deltas) and direct reclaim should as well.  It would
> simplify the existing code as well as your series a lot.

100% agree. Yet another argument for invoking shrinkers per-zone is soft
(or low?) memory limit reclaim (when it's fixed/rewritten): the current
code would shrink slab of all memory cgroups even if only those that
exceeded the limit were scanned - unfair.

> 
> > +		/*
> > +		 * For memcg-aware shrinkers iterate over the target memcg
> > +		 * hierarchy and run the shrinker on each kmem-active memcg
> > +		 * found in the hierarchy.
> > +		 */
> > +		shrinkctl->memcg = shrinkctl->target_mem_cgroup;
> > +		do {
> > +			if (!shrinkctl->memcg ||
> > +			    memcg_kmem_is_active(shrinkctl->memcg))
> > +				freed += run_shrinker(shrinkctl, shrinker,
> >  						nr_pages_scanned, lru_pages);
> > -
> > -		}
> > +		} while ((shrinkctl->memcg =
> > +			  mem_cgroup_iter(shrinkctl->target_mem_cgroup,
> > +					  shrinkctl->memcg, NULL)) != NULL);
> 
> More symptoms of the above.  This hierarchy walk is duplicative and
> potentially quite expensive.
> 
> > @@ -2381,6 +2414,7 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> >  	gfp_t orig_mask;
> >  	struct shrink_control shrink = {
> >  		.gfp_mask = sc->gfp_mask,
> > +		.target_mem_cgroup = sc->target_mem_cgroup,
> >  	};
> >  	enum zone_type requested_highidx = gfp_zone(sc->gfp_mask);
> >  	bool reclaimable = false;
> > @@ -2400,18 +2434,22 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> >  					gfp_zone(sc->gfp_mask), sc->nodemask) {
> >  		if (!populated_zone(zone))
> >  			continue;
> > +
> > +		if (global_reclaim(sc) &&
> > +		    !cpuset_zone_allowed(zone, GFP_KERNEL | __GFP_HARDWALL))
> > +			continue;
> > +
> > +		lru_pages += global_reclaim(sc) ?
> > +				zone_reclaimable_pages(zone) :
> > +				mem_cgroup_zone_reclaimable_pages(zone,
> > +						sc->target_mem_cgroup);
> > +		node_set(zone_to_nid(zone), shrink.nodes_to_scan);
> 
> And yet another costly hierarchy walk.
> 
> The reclaim code walks zonelists according to a nodemask, and within
> each zone it walks lruvecs according to the memcg hierarchy.  The
> shrinkers are wrong in making up an ad-hoc concept of NUMA nodes that
> otherwise does not exist anywhere in the VM.  Please integrate them
> properly instead of adding more duplication on top.

Will do.

Thanks,
Vladimir

next prev parent reply	other threads:[~2014-11-06 15:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-24 10:37 [PATCH -mm v2 0/9] Per memcg slab shrinkers Vladimir Davydov
2014-10-24 10:37 ` Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 1/9] list_lru: introduce list_lru_shrink_{count,walk} Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 2/9] fs: consolidate {nr,free}_cached_objects args in shrink_control Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 3/9] vmscan: shrink slab on memcg pressure Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov
2014-11-06 15:21   ` Johannes Weiner
2014-11-06 15:21     ` Johannes Weiner
2014-11-06 15:42     ` Vladimir Davydov [this message]
2014-11-06 15:42       ` Vladimir Davydov
2014-11-10  4:03     ` Dave Chinner
2014-11-10  4:03       ` Dave Chinner
2014-10-24 10:37 ` [PATCH -mm v2 4/9] memcg: rename some cache id related variables Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 5/9] memcg: add rwsem to sync against memcg_caches arrays relocation Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 6/9] list_lru: get rid of ->active_nodes Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 7/9] list_lru: organize all list_lrus to list Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 8/9] list_lru: introduce per-memcg lists Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov
2014-10-24 10:37 ` [PATCH -mm v2 9/9] fs: make shrinker memcg aware Vladimir Davydov
2014-10-24 10:37   ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141106154204.GG4839@esperanza \
    --to=vdavydov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=glommer@gmail.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.