Re: [patch 2/2] mm: memcg: hierarchical soft limit reclaim

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Ying Han <yinghan@google.com>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 2/2] mm: memcg: hierarchical soft limit reclaim
Date: Fri, 13 Jan 2012 16:50:01 +0100	[thread overview]
Message-ID: <20120113155001.GB1653@cmpxchg.org> (raw)
In-Reply-To: <20120113120406.GC17060@tiehlicka.suse.cz>

On Fri, Jan 13, 2012 at 01:04:06PM +0100, Michal Hocko wrote:
> On Tue 10-01-12 16:02:52, Johannes Weiner wrote:
> > Right now, memcg soft limits are implemented by having a sorted tree
> > of memcgs that are in excess of their limits.  Under global memory
> > pressure, kswapd first reclaims from the biggest excessor and then
> > proceeds to do regular global reclaim.  The result of this is that
> > pages are reclaimed from all memcgs, but more scanning happens against
> > those above their soft limit.
> > 
> > With global reclaim doing memcg-aware hierarchical reclaim by default,
> > this is a lot easier to implement: everytime a memcg is reclaimed
> > from, scan more aggressively (per tradition with a priority of 0) if
> > it's above its soft limit.  With the same end result of scanning
> > everybody, but soft limit excessors a bit more.
> > 
> > Advantages:
> > 
> >   o smoother reclaim: soft limit reclaim is a separate stage before
> >     global reclaim, whose result is not communicated down the line and
> >     so overreclaim of the groups in excess is very likely.  After this
> >     patch, soft limit reclaim is fully integrated into regular reclaim
> >     and each memcg is considered exactly once per cycle.
> > 
> >   o true hierarchy support: soft limits are only considered when
> >     kswapd does global reclaim, but after this patch, targetted
> >     reclaim of a memcg will mind the soft limit settings of its child
> >     groups.
> 
> Yes it makes sense. At first I was thinking that soft limit should be
> considered only under global mem. pressure (at least documentation says
> so) but now it makes sense.
> We can push on over-soft limit groups more because they told us they
> could sacrifice something...  Anyway documentation needs an update as
> well.

You are right, I'll look into it.

> But we have to be little bit careful here. I am still quite confuses how
> we should handle hierarchies vs. subtrees. See bellow.

> > @@ -1318,6 +1123,36 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
> >  	return margin >> PAGE_SHIFT;
> >  }
> >  
> > +/**
> > + * mem_cgroup_over_softlimit
> > + * @root: hierarchy root
> > + * @memcg: child of @root to test
> > + *
> > + * Returns %true if @memcg exceeds its own soft limit or contributes
> > + * to the soft limit excess of one of its parents up to and including
> > + * @root.
> > + */
> > +bool mem_cgroup_over_softlimit(struct mem_cgroup *root,
> > +			       struct mem_cgroup *memcg)
> > +{
> > +	if (mem_cgroup_disabled())
> > +		return false;
> > +
> > +	if (!root)
> > +		root = root_mem_cgroup;
> > +
> > +	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
> > +		/* root_mem_cgroup does not have a soft limit */
> > +		if (memcg == root_mem_cgroup)
> > +			break;
> > +		if (res_counter_soft_limit_excess(&memcg->res))
> > +			return true;
> > +		if (memcg == root)
> > +			break;
> > +	}
> > +	return false;
> > +}
> 
> Well, this might be little bit tricky. We do not check whether memcg and
> root are in a hierarchy (in terms of use_hierarchy) relation. 
> 
> If we are under global reclaim then we iterate over all memcgs and so
> there is no guarantee that there is a hierarchical relation between the
> given memcg and its parent. While, on the other hand, if we are doing
> memcg reclaim then we have this guarantee.
> 
> Why should we punish a group (subtree) which is perfectly under its soft
> limit just because some other subtree contributes to the common parent's
> usage and makes it over its limit?
> Should we check memcg->use_hierarchy here?

We do, actually.  parent_mem_cgroup() checks the res_counter parent,
which is only set when ->use_hierarchy is also set.  The loop should
never walk upwards outside of a hierarchy.

And yes, if you have this:

        A
       / \
      B   C

and configured a soft limit for A, you asked for both B and C to be
responsible when this limit is exceeded, that's not new behaviour.

> Does it even makes sense to setup soft limit on a parent group without
> hierarchies?
> Well I have to admit that hierarchies makes me headache.

There is no parent without a hierarchy.  It is insofar pretty
confusing that you can actually create a directory hierarchy that does
not reflect a memcg hierarchy:

	# pwd
	/sys/fs/cgroup/memory/foo/bar
	# cat memory.usage_in_bytes 
	450560
	# cat ../memory.usage_in_bytes 
	0

there is no accounting/limiting/whatever parent-child relationship
between foo and bar.

> > @@ -2121,8 +2121,16 @@ static void shrink_zone(int priority, struct zone *zone,
> >  			.mem_cgroup = memcg,
> >  			.zone = zone,
> >  		};
> > +		int epriority = priority;
> > +		/*
> > +		 * Put more pressure on hierarchies that exceed their
> > +		 * soft limit, to push them back harder than their
> > +		 * well-behaving siblings.
> > +		 */
> > +		if (mem_cgroup_over_softlimit(root, memcg))
> > +			epriority = 0;
> 
> This sounds too aggressive to me. Shouldn't we just double the pressure
> or something like that?

That's the historical value.  When I tried priority - 1, it was not
aggressive enough.

> Previously we always had nr_to_reclaim == SWAP_CLUSTER_MAX when we did
> memcg reclaim but this is not the case now. For the kswapd we have
> nr_to_reclaim == ULONG_MAX so we will not break out of the reclaim early
> and we have to scan a lot.
> Direct reclaim (shrink or hard limit) shouldn't be affected here.

It took me a while: we had SWAP_CLUSTER_MAX in _soft limit reclaim_,
which means that even with priority 0 we would bail after reclaiming
SWAP_CLUSTER_MAX from each lru of a zone.  But it's now happening with
kswapd's own scan_control, so the overreclaim protection is gone.

That is indeed a change in behaviour I haven't noticed, good catch!

I will look into it.

WARNING: multiple messages have this Message-ID (diff)

From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Ying Han <yinghan@google.com>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 2/2] mm: memcg: hierarchical soft limit reclaim
Date: Fri, 13 Jan 2012 16:50:01 +0100	[thread overview]
Message-ID: <20120113155001.GB1653@cmpxchg.org> (raw)
In-Reply-To: <20120113120406.GC17060@tiehlicka.suse.cz>

On Fri, Jan 13, 2012 at 01:04:06PM +0100, Michal Hocko wrote:
> On Tue 10-01-12 16:02:52, Johannes Weiner wrote:
> > Right now, memcg soft limits are implemented by having a sorted tree
> > of memcgs that are in excess of their limits.  Under global memory
> > pressure, kswapd first reclaims from the biggest excessor and then
> > proceeds to do regular global reclaim.  The result of this is that
> > pages are reclaimed from all memcgs, but more scanning happens against
> > those above their soft limit.
> > 
> > With global reclaim doing memcg-aware hierarchical reclaim by default,
> > this is a lot easier to implement: everytime a memcg is reclaimed
> > from, scan more aggressively (per tradition with a priority of 0) if
> > it's above its soft limit.  With the same end result of scanning
> > everybody, but soft limit excessors a bit more.
> > 
> > Advantages:
> > 
> >   o smoother reclaim: soft limit reclaim is a separate stage before
> >     global reclaim, whose result is not communicated down the line and
> >     so overreclaim of the groups in excess is very likely.  After this
> >     patch, soft limit reclaim is fully integrated into regular reclaim
> >     and each memcg is considered exactly once per cycle.
> > 
> >   o true hierarchy support: soft limits are only considered when
> >     kswapd does global reclaim, but after this patch, targetted
> >     reclaim of a memcg will mind the soft limit settings of its child
> >     groups.
> 
> Yes it makes sense. At first I was thinking that soft limit should be
> considered only under global mem. pressure (at least documentation says
> so) but now it makes sense.
> We can push on over-soft limit groups more because they told us they
> could sacrifice something...  Anyway documentation needs an update as
> well.

You are right, I'll look into it.

> But we have to be little bit careful here. I am still quite confuses how
> we should handle hierarchies vs. subtrees. See bellow.

> > @@ -1318,6 +1123,36 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
> >  	return margin >> PAGE_SHIFT;
> >  }
> >  
> > +/**
> > + * mem_cgroup_over_softlimit
> > + * @root: hierarchy root
> > + * @memcg: child of @root to test
> > + *
> > + * Returns %true if @memcg exceeds its own soft limit or contributes
> > + * to the soft limit excess of one of its parents up to and including
> > + * @root.
> > + */
> > +bool mem_cgroup_over_softlimit(struct mem_cgroup *root,
> > +			       struct mem_cgroup *memcg)
> > +{
> > +	if (mem_cgroup_disabled())
> > +		return false;
> > +
> > +	if (!root)
> > +		root = root_mem_cgroup;
> > +
> > +	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
> > +		/* root_mem_cgroup does not have a soft limit */
> > +		if (memcg == root_mem_cgroup)
> > +			break;
> > +		if (res_counter_soft_limit_excess(&memcg->res))
> > +			return true;
> > +		if (memcg == root)
> > +			break;
> > +	}
> > +	return false;
> > +}
> 
> Well, this might be little bit tricky. We do not check whether memcg and
> root are in a hierarchy (in terms of use_hierarchy) relation. 
> 
> If we are under global reclaim then we iterate over all memcgs and so
> there is no guarantee that there is a hierarchical relation between the
> given memcg and its parent. While, on the other hand, if we are doing
> memcg reclaim then we have this guarantee.
> 
> Why should we punish a group (subtree) which is perfectly under its soft
> limit just because some other subtree contributes to the common parent's
> usage and makes it over its limit?
> Should we check memcg->use_hierarchy here?

We do, actually.  parent_mem_cgroup() checks the res_counter parent,
which is only set when ->use_hierarchy is also set.  The loop should
never walk upwards outside of a hierarchy.

And yes, if you have this:

        A
       / \
      B   C

and configured a soft limit for A, you asked for both B and C to be
responsible when this limit is exceeded, that's not new behaviour.

> Does it even makes sense to setup soft limit on a parent group without
> hierarchies?
> Well I have to admit that hierarchies makes me headache.

There is no parent without a hierarchy.  It is insofar pretty
confusing that you can actually create a directory hierarchy that does
not reflect a memcg hierarchy:

	# pwd
	/sys/fs/cgroup/memory/foo/bar
	# cat memory.usage_in_bytes 
	450560
	# cat ../memory.usage_in_bytes 
	0

there is no accounting/limiting/whatever parent-child relationship
between foo and bar.

> > @@ -2121,8 +2121,16 @@ static void shrink_zone(int priority, struct zone *zone,
> >  			.mem_cgroup = memcg,
> >  			.zone = zone,
> >  		};
> > +		int epriority = priority;
> > +		/*
> > +		 * Put more pressure on hierarchies that exceed their
> > +		 * soft limit, to push them back harder than their
> > +		 * well-behaving siblings.
> > +		 */
> > +		if (mem_cgroup_over_softlimit(root, memcg))
> > +			epriority = 0;
> 
> This sounds too aggressive to me. Shouldn't we just double the pressure
> or something like that?

That's the historical value.  When I tried priority - 1, it was not
aggressive enough.

> Previously we always had nr_to_reclaim == SWAP_CLUSTER_MAX when we did
> memcg reclaim but this is not the case now. For the kswapd we have
> nr_to_reclaim == ULONG_MAX so we will not break out of the reclaim early
> and we have to scan a lot.
> Direct reclaim (shrink or hard limit) shouldn't be affected here.

It took me a while: we had SWAP_CLUSTER_MAX in _soft limit reclaim_,
which means that even with priority 0 we would bail after reclaiming
SWAP_CLUSTER_MAX from each lru of a zone.  But it's now happening with
kswapd's own scan_control, so the overreclaim protection is gone.

That is indeed a change in behaviour I haven't noticed, good catch!

I will look into it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-01-13 15:50 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-10 15:02 [patch 0/2] mm: memcg reclaim integration followups Johannes Weiner
2012-01-10 15:02 ` Johannes Weiner
2012-01-10 15:02 ` [patch 1/2] mm: memcg: per-memcg reclaim statistics Johannes Weiner
2012-01-10 15:02   ` Johannes Weiner
     [not found]   ` <1326207772-16762-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-10 23:54     ` Ying Han
2012-01-10 23:54       ` Ying Han
2012-01-10 23:54       ` Ying Han
     [not found]       ` <CALWz4izbTw4+7zbfiED9Lx=6RwiqxE11g5-fNRHTh=mcP=vQ2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-11  0:30         ` Johannes Weiner
2012-01-11  0:30           ` Johannes Weiner
2012-01-11  0:30           ` Johannes Weiner
2012-01-11 22:33           ` Ying Han
2012-01-11 22:33             ` Ying Han
     [not found]             ` <CALWz4iy4hw9jQ++w4oiZG_hih-x9iieuEmnRBfxYKriAKSoOgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-12  9:17               ` Johannes Weiner
2012-01-12  9:17                 ` Johannes Weiner
2012-01-12  9:17                 ` Johannes Weiner
2012-01-10 15:02 ` [patch 2/2] mm: memcg: hierarchical soft limit reclaim Johannes Weiner
2012-01-10 15:02   ` Johannes Weiner
2012-01-11 21:42   ` Ying Han
2012-01-11 21:42     ` Ying Han
     [not found]     ` <CALWz4izwNBN_qcSsqg-qYw-Esc9vBL3=4cv3Wsg1jf6001_fWQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-12  8:59       ` Johannes Weiner
2012-01-12  8:59         ` Johannes Weiner
2012-01-12  8:59         ` Johannes Weiner
     [not found]         ` <20120112085904.GG24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-13 21:31           ` Ying Han
2012-01-13 21:31             ` Ying Han
2012-01-13 21:31             ` Ying Han
     [not found]             ` <CALWz4iz3sQX+pCr19rE3_SwV+pRFhDJ7Lq-uJuYBq6u3mRU3AQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-13 22:44               ` Johannes Weiner
2012-01-13 22:44                 ` Johannes Weiner
2012-01-13 22:44                 ` Johannes Weiner
     [not found]                 ` <20120113224424.GC1653-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-17 14:22                   ` Sha
2012-01-17 14:22                     ` Sha
2012-01-17 14:22                     ` Sha
     [not found]                     ` <4F158418.2090509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-01-17 14:53                       ` Johannes Weiner
2012-01-17 14:53                         ` Johannes Weiner
2012-01-17 14:53                         ` Johannes Weiner
     [not found]                         ` <20120117145348.GA3144-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-17 20:25                           ` Ying Han
2012-01-17 20:25                             ` Ying Han
2012-01-17 20:25                             ` Ying Han
     [not found]                             ` <CALWz4iwYpkP6Dfz+3NFXQK9ToaKdm8WCsbBmHRLVwRjVp0wjOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-17 21:56                               ` Johannes Weiner
2012-01-17 21:56                                 ` Johannes Weiner
2012-01-17 21:56                                 ` Johannes Weiner
2012-01-17 23:39                                 ` Ying Han
2012-01-17 23:39                                   ` Ying Han
2012-01-18  7:17                         ` Sha
     [not found]                           ` <CAFj3OHWY2Biw54gaGeH5fkxzgOhxn7NAibeYT_Jmga-_ypNSRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-18  9:25                             ` Johannes Weiner
2012-01-18  9:25                               ` Johannes Weiner
2012-01-18  9:25                               ` Johannes Weiner
2012-01-18 11:25                               ` Sha
2012-01-18 11:25                                 ` Sha
2012-01-18 15:27                                 ` Michal Hocko
2012-01-18 15:27                                   ` Michal Hocko
2012-01-19  6:38                                   ` Sha
2012-01-19  6:38                                     ` Sha
     [not found]   ` <1326207772-16762-3-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-12  1:54     ` KAMEZAWA Hiroyuki
2012-01-12  1:54       ` KAMEZAWA Hiroyuki
2012-01-12  1:54       ` KAMEZAWA Hiroyuki
2012-01-13 12:16       ` Johannes Weiner
2012-01-13 12:16         ` Johannes Weiner
     [not found]         ` <20120113121645.GA1653-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-18  5:26           ` KAMEZAWA Hiroyuki
2012-01-18  5:26             ` KAMEZAWA Hiroyuki
2012-01-18  5:26             ` KAMEZAWA Hiroyuki
2012-01-13 12:04     ` Michal Hocko
2012-01-13 12:04       ` Michal Hocko
2012-01-13 12:04       ` Michal Hocko
2012-01-13 15:50       ` Johannes Weiner [this message]
2012-01-13 15:50         ` Johannes Weiner
     [not found]         ` <20120113155001.GB1653-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-13 16:34           ` Michal Hocko
2012-01-13 16:34             ` Michal Hocko
2012-01-13 16:34             ` Michal Hocko
     [not found]             ` <20120113163423.GG17060-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org>
2012-01-13 21:45               ` Ying Han
2012-01-13 21:45                 ` Ying Han
2012-01-13 21:45                 ` Ying Han
2012-01-18  9:45                 ` Johannes Weiner
2012-01-18  9:45                   ` Johannes Weiner
2012-01-18  9:45                   ` Johannes Weiner
2012-01-18 20:38                   ` Ying Han
2012-01-18 20:38                     ` Ying Han

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120113155001.GB1653@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.