linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Vladimir Davydov <vdavydov@parallels.com>
Cc: dchinner@redhat.com, hannes@cmpxchg.org, mhocko@suse.cz,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org, devel@openvz.org,
	glommer@openvz.org, glommer@gmail.com,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Balbir Singh <bsingharora@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH v13 10/16] vmscan: shrink slab on memcg pressure
Date: Tue, 10 Dec 2013 13:11:52 +1100	[thread overview]
Message-ID: <20131210021152.GZ31386@dastard> (raw)
In-Reply-To: <24314b9f3b299bac988ea3570f71f9e6919bbc4e.1386571280.git.vdavydov@parallels.com>

On Mon, Dec 09, 2013 at 12:05:51PM +0400, Vladimir Davydov wrote:
> This patch makes direct reclaim path shrink slab not only on global
> memory pressure, but also when we reach the user memory limit of a
> memcg. To achieve that, it makes shrink_slab() walk over the memcg
> hierarchy and run shrinkers marked as memcg-aware on the target memcg
> and all its descendants. The memcg to scan is passed in a shrink_control
> structure; memcg-unaware shrinkers are still called only on global
> memory pressure with memcg=NULL. It is up to the shrinker how to
> organize the objects it is responsible for to achieve per-memcg reclaim.
> 
> The idea lying behind the patch as well as the initial implementation
> belong to Glauber Costa.
...
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -311,6 +311,58 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker,
>  	return freed;
>  }
>  
> +static unsigned long
> +run_shrinker(struct shrink_control *shrinkctl, struct shrinker *shrinker,
> +	     unsigned long nr_pages_scanned, unsigned long lru_pages)
> +{
> +	unsigned long freed = 0;
> +
> +	/*
> +	 * If we don't have a target mem cgroup, we scan them all. Otherwise
> +	 * we will limit our scan to shrinkers marked as memcg aware.
> +	 */
> +	if (!(shrinker->flags & SHRINKER_MEMCG_AWARE) &&
> +	    shrinkctl->target_mem_cgroup != NULL)
> +		return 0;
> +	/*
> +	 * In a hierarchical chain, it might be that not all memcgs are kmem
> +	 * active. kmemcg design mandates that when one memcg is active, its
> +	 * children will be active as well. But it is perfectly possible that
> +	 * its parent is not.
> +	 *
> +	 * We also need to make sure we scan at least once, for the global
> +	 * case. So if we don't have a target memcg, we proceed normally and
> +	 * expect to break in the next round.
> +	 */
> +	shrinkctl->memcg = shrinkctl->target_mem_cgroup;
> +	do {
> +		if (shrinkctl->memcg && !memcg_kmem_is_active(shrinkctl->memcg))
> +			goto next;
> +
> +		if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) {
> +			shrinkctl->nid = 0;
> +			freed += shrink_slab_node(shrinkctl, shrinker,
> +					nr_pages_scanned, lru_pages);
> +			goto next;
> +		}
> +
> +		for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) {
> +			if (node_online(shrinkctl->nid))
> +				freed += shrink_slab_node(shrinkctl, shrinker,
> +						nr_pages_scanned, lru_pages);
> +
> +		}
> +next:
> +		if (!(shrinker->flags & SHRINKER_MEMCG_AWARE))
> +			break;
> +		shrinkctl->memcg = mem_cgroup_iter(shrinkctl->target_mem_cgroup,
> +						   shrinkctl->memcg, NULL);
> +	} while (shrinkctl->memcg);
> +
> +	return freed;
> +}

Ok, I think we need to improve the abstraction here, because I find
this quite messy and hard to follow the code flow differences
between memcg and non-memg shrinker invocations..

> +
>  /*
>   * Call the shrink functions to age shrinkable caches
>   *
> @@ -352,20 +404,10 @@ unsigned long shrink_slab(struct shrink_control *shrinkctl,
>  	}
>  
>  	list_for_each_entry(shrinker, &shrinker_list, list) {
> -		if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) {
> -			shrinkctl->nid = 0;
> -			freed += shrink_slab_node(shrinkctl, shrinker,
> -					nr_pages_scanned, lru_pages);
> -			continue;
> -		}
> -
> -		for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) {
> -			if (node_online(shrinkctl->nid))
> -				freed += shrink_slab_node(shrinkctl, shrinker,
> -						nr_pages_scanned, lru_pages);
> -
> -		}

This code is the "run_shrinker()" helper function, not the entire
memcg loop.

> +		freed += run_shrinker(shrinkctl, shrinker,
> +				      nr_pages_scanned, lru_pages);
>  	}

i.e. the shrinker execution control loop becomes much clearer if
we separate the memcg and non-memcg shrinker execution from the
node awareness of the shrinker like so:

	list_for_each_entry(shrinker, &shrinker_list, list) {

		/*
		 * If we aren't doing targeted memcg shrinking, then run
		 * the shrinker with a global context and move on.
		 */
		if (!shrinkctl->target_mem_cgroup) {
			freed += run_shrinker(shrinkctl, shrinker,
					      nr_pages_scanned, lru_pages);
			continue;
		}

		if (!(shrinker->flags & SHRINKER_MEMCG_AWARE))
			continue;

		/*
		 * memcg shrinking: Iterate the target memcg heirarchy
		 * and run the shrinker on each memcg context that
		 * is found in the heirarchy.
		 */
		shrinkctl->memcg = shrinkctl->target_mem_cgroup;
		do {
			if (memcg_kmem_is_active(shrinkctl->memcg))
				continue;

			freed += run_shrinker(shrinkctl, shrinker,
					      nr_pages_scanned, lru_pages);
		while ((shrinkctl->memcg =
				mem_cgroup_iter(shrinkctl->target_mem_cgroup,
						shrinkctl->memcg, NULL)));
	}

That makes the code much easier to read and clearly demonstrates the
differences betwen non-memcg and memcg shrinking contexts, and
separates them cleanly from the shrinker implementation.  IMO,
that's much nicer than trying to handle all contexts in the one
do-while loop.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-12-10  2:12 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-09  8:05 [PATCH v13 00/16] kmemcg shrinkers Vladimir Davydov
2013-12-09  8:05 ` [PATCH v13 01/16] memcg: make cache index determination more robust Vladimir Davydov
2013-12-09  8:05 ` [PATCH v13 02/16] memcg: consolidate callers of memcg_cache_id Vladimir Davydov
2013-12-09  8:05 ` [PATCH v13 03/16] memcg: move initialization to memcg creation Vladimir Davydov
2013-12-09  8:05 ` [PATCH v13 04/16] memcg: move memcg_caches_array_size() function Vladimir Davydov
2013-12-10  8:04   ` Glauber Costa
2013-12-09  8:05 ` [PATCH v13 05/16] vmscan: move call to shrink_slab() to shrink_zones() Vladimir Davydov
2013-12-10  8:10   ` Glauber Costa
2013-12-09  8:05 ` [PATCH v13 06/16] vmscan: remove shrink_control arg from do_try_to_free_pages() Vladimir Davydov
2013-12-09  8:05 ` [PATCH v13 07/16] vmscan: call NUMA-unaware shrinkers irrespective of nodemask Vladimir Davydov
2013-12-09  8:05 ` [PATCH v13 08/16] mm: list_lru: require shrink_control in count, walk functions Vladimir Davydov
2013-12-10  1:36   ` Dave Chinner
2013-12-09  8:05 ` [PATCH v13 09/16] fs: consolidate {nr,free}_cached_objects args in shrink_control Vladimir Davydov
2013-12-10  1:38   ` Dave Chinner
2013-12-09  8:05 ` [PATCH v13 10/16] vmscan: shrink slab on memcg pressure Vladimir Davydov
2013-12-10  2:11   ` Dave Chinner [this message]
2013-12-09  8:05 ` [PATCH v13 11/16] mm: list_lru: add per-memcg lists Vladimir Davydov
2013-12-10  5:00   ` Dave Chinner
2013-12-10 10:05     ` Vladimir Davydov
2013-12-12  1:40       ` Dave Chinner
2013-12-12  9:50         ` Vladimir Davydov
2013-12-12 20:24           ` Vladimir Davydov
2013-12-14 20:03             ` Vladimir Davydov
2013-12-12 20:48     ` Glauber Costa
2013-12-09  8:05 ` [PATCH v13 12/16] fs: mark list_lru based shrinkers memcg aware Vladimir Davydov
2013-12-10  4:17   ` Dave Chinner
2013-12-11 11:08     ` Steven Whitehouse
2013-12-09  8:05 ` [PATCH v13 13/16] vmscan: take at least one pass with shrinkers Vladimir Davydov
2013-12-10  4:18   ` Dave Chinner
2013-12-10 11:50     ` Vladimir Davydov
2013-12-10 12:38       ` Glauber Costa
2013-12-09  8:05 ` [PATCH v13 14/16] vmpressure: in-kernel notifications Vladimir Davydov
2013-12-10  8:12   ` Glauber Costa
2013-12-09  8:05 ` [PATCH v13 15/16] memcg: reap dead memcgs upon global memory pressure Vladimir Davydov
2013-12-09  8:05 ` [PATCH v13 16/16] memcg: flush memcg items upon memcg destruction Vladimir Davydov
2013-12-10  8:02 ` [PATCH v13 00/16] kmemcg shrinkers Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131210021152.GZ31386@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dchinner@redhat.com \
    --cc=devel@openvz.org \
    --cc=glommer@gmail.com \
    --cc=glommer@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=vdavydov@parallels.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).