linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Vladimir Davydov <vdavydov@parallels.com>
Cc: hannes@cmpxchg.org, mhocko@suse.cz, dchinner@redhat.com,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org, devel@openvz.org,
	glommer@openvz.org, Al Viro <viro@zeniv.linux.org.uk>,
	Balbir Singh <bsingharora@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH v12 12/18] fs: make icache, dcache shrinkers memcg-aware
Date: Tue, 3 Dec 2013 22:45:57 +1100	[thread overview]
Message-ID: <20131203114557.GS10988@dastard> (raw)
In-Reply-To: <8e7582ad42f35cd9a9ea274bd203e2423b944b62.1385974612.git.vdavydov@parallels.com>

On Mon, Dec 02, 2013 at 03:19:47PM +0400, Vladimir Davydov wrote:
> Using the per-memcg LRU infrastructure introduced by previous patches,
> this patch makes dcache and icache shrinkers memcg-aware. To achieve
> that, it converts s_dentry_lru and s_inode_lru from list_lru to
> memcg_list_lru and restricts the reclaim to per-memcg parts of the lists
> in case of memcg pressure.
> 
> Other FS objects are currently ignored and only reclaimed on global
> pressure, because their shrinkers are heavily FS-specific and can't be
> converted to be memcg-aware so easily. However, we can pass on target
> memcg to the FS layer and let it decide if per-memcg objects should be
> reclaimed.

And now you have a big problem, because that means filesystems like
XFS won't reclaim inodes during memcg reclaim.

That is, for XFS, prune_icache_lru() does not free any memory. All
it does is remove all the VFS references to the struct xfs_inode,
which is then reclaimed via the sb->s_op->free_cached_objects()
method.

IOWs, what you've done is broken.

> Note that with this patch applied we lose global LRU order, but it does

We don't have global LRU order today for the filesystem caches.
We have per superblock, per-node LRU reclaim order.

> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -343,18 +343,24 @@ static void dentry_unlink_inode(struct dentry * dentry)
>  #define D_FLAG_VERIFY(dentry,x) WARN_ON_ONCE(((dentry)->d_flags & (DCACHE_LRU_LIST | DCACHE_SHRINK_LIST)) != (x))
>  static void d_lru_add(struct dentry *dentry)
>  {
> +	struct list_lru *lru =
> +		mem_cgroup_kmem_list_lru(&dentry->d_sb->s_dentry_lru, dentry);
> +
>  	D_FLAG_VERIFY(dentry, 0);
>  	dentry->d_flags |= DCACHE_LRU_LIST;
>  	this_cpu_inc(nr_dentry_unused);
> -	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
> +	WARN_ON_ONCE(!list_lru_add(lru, &dentry->d_lru));
>  }

This is what I mean about pushing memcg cruft into places where it
is not necessary. This can be done entirely behind list_lru_add(),
without the caller having to care.

> @@ -970,9 +976,9 @@ dentry_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg)
>  }
>  
>  /**
> - * prune_dcache_sb - shrink the dcache
> - * @sb: superblock
> - * @nr_to_scan : number of entries to try to free
> + * prune_dcache_lru - shrink the dcache
> + * @lru: dentry lru list
> + * @nr_to_scan: number of entries to try to free
>   * @nid: which node to scan for freeable entities
>   *
>   * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is
> @@ -982,14 +988,13 @@ dentry_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg)
>   * This function may fail to free any resources if all the dentries are in
>   * use.
>   */
> -long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan,
> -		     int nid)
> +long prune_dcache_lru(struct list_lru *lru, unsigned long nr_to_scan, int nid)
>  {
>  	LIST_HEAD(dispose);
>  	long freed;
>  
> -	freed = list_lru_walk_node(&sb->s_dentry_lru, nid, dentry_lru_isolate,
> -				       &dispose, &nr_to_scan);
> +	freed = list_lru_walk_node(lru, nid, dentry_lru_isolate,
> +				   &dispose, &nr_to_scan);
>  	shrink_dentry_list(&dispose);
>  	return freed;
>  }

And here, you pass an LRU when what we really need to pass is the
struct shrink_control that contains nr_to_scan, nid, and the memcg
that pruning is targetting.

Because of the tight integration of the LRUs and shrinkers, it makes
sense to pass the shrink control all the way into the list. i.e:

	freed = list_lru_scan(&sb->s_dentry_lru, sc, dentry_lru_isolate,
			      &dispose);

And again, that hides everything to do with memcg based LRUs and
reclaim from the callers. It's clean, simple and hard to get wrong.

> @@ -1029,7 +1034,7 @@ void shrink_dcache_sb(struct super_block *sb)
>  	do {
>  		LIST_HEAD(dispose);
>  
> -		freed = list_lru_walk(&sb->s_dentry_lru,
> +		freed = memcg_list_lru_walk_all(&sb->s_dentry_lru,
>  			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
>  

list_lru_walk() is, by definition, supposed to walk every single
object on the LRU. With memcg awareness, it should be walking all
the memcg lists, too.

> diff --git a/fs/super.c b/fs/super.c
> index cece164..b198da4 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -57,6 +57,8 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
>  				      struct shrink_control *sc)
>  {
>  	struct super_block *sb;
> +	struct list_lru *inode_lru;
> +	struct list_lru *dentry_lru;
>  	long	fs_objects = 0;
>  	long	total_objects;
>  	long	freed = 0;
> @@ -75,11 +77,14 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
>  	if (!grab_super_passive(sb))
>  		return SHRINK_STOP;
>  
> -	if (sb->s_op->nr_cached_objects)
> +	if (sb->s_op->nr_cached_objects && !sc->memcg)
>  		fs_objects = sb->s_op->nr_cached_objects(sb, sc->nid);
>  
> -	inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid);
> -	dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid);
> +	inode_lru = mem_cgroup_list_lru(&sb->s_inode_lru, sc->memcg);
> +	dentry_lru = mem_cgroup_list_lru(&sb->s_dentry_lru, sc->memcg);
> +
> +	inodes = list_lru_count_node(inode_lru, sc->nid);
> +	dentries = list_lru_count_node(dentry_lru, sc->nid);
>  	total_objects = dentries + inodes + fs_objects + 1;

Again: list_lru_count_sc(&sb->s_inode_lru, sc).

And push the scan control down into ->nr_cached_objects, too.

>  	/* proportion the scan between the caches */
> @@ -90,8 +95,8 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
>  	 * prune the dcache first as the icache is pinned by it, then
>  	 * prune the icache, followed by the filesystem specific caches
>  	 */
> -	freed = prune_dcache_sb(sb, dentries, sc->nid);
> -	freed += prune_icache_sb(sb, inodes, sc->nid);
> +	freed = prune_dcache_lru(dentry_lru, dentries, sc->nid);
> +	freed += prune_icache_lru(inode_lru, inodes, sc->nid);

	sc->nr_to_scan = dentries;
	freed += prune_dcache_sb(sb, sc);
	sc->nr_to_scan = inodes;
	freed += prune_icache_sb(sb, sc);
	if (fs_objects) {
		sc->nr_to_scan = mult_frac(sc->nr_to_scan, fs_objects,
					   total_objects);
		freed += sb->s_op->free_cached_objects(sb, sc);
	}

So much simpler, so much nicer. And nothing memcg related in
sight....

> @@ -225,7 +232,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
>  	s->s_shrink.scan_objects = super_cache_scan;
>  	s->s_shrink.count_objects = super_cache_count;
>  	s->s_shrink.batch = 1024;
> -	s->s_shrink.flags = SHRINKER_NUMA_AWARE;
> +	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;

That's basically the only real logic change that should be
necessary to configure memcg LRUs and shrinkers for any user of the
list_lru infrastructure....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-12-03 11:46 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-02 11:19 [PATCH v12 00/18] kmemcg shrinkers Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 01/18] memcg: make cache index determination more robust Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 02/18] memcg: consolidate callers of memcg_cache_id Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 03/18] memcg: move initialization to memcg creation Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 04/18] memcg: move several kmemcg functions upper Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 05/18] fs: do not use destroy_super() in alloc_super() fail path Vladimir Davydov
2013-12-03  9:00   ` Dave Chinner
2013-12-03  9:23     ` Vladimir Davydov
2013-12-03 13:37       ` Al Viro
2013-12-03 13:48         ` Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 06/18] vmscan: rename shrink_slab() args to make it more generic Vladimir Davydov
2013-12-03  9:33   ` Dave Chinner
2013-12-03  9:44     ` Vladimir Davydov
2013-12-03 10:04       ` Dave Chinner
2013-12-02 11:19 ` [PATCH v12 07/18] vmscan: move call to shrink_slab() to shrink_zones() Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 08/18] vmscan: do_try_to_free_pages(): remove shrink_control argument Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 09/18] vmscan: shrink slab on memcg pressure Vladimir Davydov
2013-12-03 10:48   ` Dave Chinner
2013-12-03 12:15     ` Vladimir Davydov
2013-12-04  4:51       ` Dave Chinner
2013-12-04  6:31         ` Vladimir Davydov
2013-12-05  5:01           ` Dave Chinner
2013-12-05  6:57             ` Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 10/18] memcg,list_lru: add per-memcg LRU list infrastructure Vladimir Davydov
2013-12-03 11:18   ` Dave Chinner
2013-12-03 12:29     ` Vladimir Davydov
2013-12-05 21:19       ` Dave Chinner
2013-12-02 11:19 ` [PATCH v12 11/18] memcg,list_lru: add function walking over all lists of a per-memcg LRU Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 12/18] fs: make icache, dcache shrinkers memcg-aware Vladimir Davydov
2013-12-03 11:45   ` Dave Chinner [this message]
2013-12-03 12:34     ` Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 13/18] memcg: per-memcg kmem shrinking Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 14/18] vmscan: take at least one pass with shrinkers Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 15/18] memcg: allow kmem limit to be resized down Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 16/18] vmpressure: in-kernel notifications Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 17/18] memcg: reap dead memcgs upon global memory pressure Vladimir Davydov
2013-12-02 11:19 ` [PATCH v12 18/18] memcg: flush memcg items upon memcg destruction Vladimir Davydov
2013-12-02 11:22 ` [PATCH v12 00/18] kmemcg shrinkers Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131203114557.GS10988@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dchinner@redhat.com \
    --cc=devel@openvz.org \
    --cc=glommer@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=vdavydov@parallels.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).