Re: [PATCH v5] Soft limit rework

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Ying Han <yinghan@google.com>,
	Hugh Dickins <hughd@google.com>,
	Michel Lespinasse <walken@google.com>,
	Greg Thelen <gthelen@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Tejun Heo <tj@kernel.org>, Balbir Singh <bsingharora@gmail.com>,
	Glauber Costa <glommer@gmail.com>
Subject: Re: [PATCH v5] Soft limit rework
Date: Fri, 13 Sep 2013 16:49:53 +0200	[thread overview]
Message-ID: <20130913144953.GA23857@dhcp22.suse.cz> (raw)
In-Reply-To: <20130906192311.GE856@cmpxchg.org>

On Fri 06-09-13 15:23:11, Johannes Weiner wrote:
> On Wed, Sep 04, 2013 at 06:38:23PM +0200, Michal Hocko wrote:
[...]
> > To handle overcommit situations more gracefully. As the documentation
> > states:
> > "
> > 7. Soft limits
> > 
> > Soft limits allow for greater sharing of memory. The idea behind soft limits
> > is to allow control groups to use as much of the memory as needed, provided
> > 
> > a. There is no memory contention
> > b. They do not exceed their hard limit
> > 
> > When the system detects memory contention or low memory, control groups
> > are pushed back to their soft limits. If the soft limit of each control
> > group is very high, they are pushed back as much as possible to make
> > sure that one control group does not starve the others of memory.
> > 
> > Please note that soft limits is a best-effort feature; it comes with
> > no guarantees, but it does its best to make sure that when memory is
> > heavily contended for, memory is allocated based on the soft limit
> > hints/setup. Currently soft limit based reclaim is set up such that
> > it gets invoked from balance_pgdat (kswapd).
> > "
> > 
> > Except for the last sentence the same holds for the integrated
> > implementation as well. With the patchset we are doing the soft reclaim
> > also for the targeted reclaim which was simply not possible previously
> > because of the data structures limitations. And doing soft reclaim from
> > target reclaim makes a lot of sense to me because whether we have a
> > global or hierarchical memory pressure doesn't make any difference that
> > some groups are set up to sacrifice their memory to help to release the
> > pressure.
> 
> The issue I have with this is that the semantics of the soft limit are
> so backwards that we should strive to get this stuff right
> conceptually before integrating this better into the VM.
> 
> We have a big user that asks for guarantees, which are comparable but
> the invert opposite of this.  Instead of specifying what is optional
> in one group, you specify what is essential in the other group.  And
> the default is to guarantee nothing instead of everything like soft
> limits are currently defined.
> 
> We even tried to invert the default soft limit setting in the past,
> which went nowhere because we can't do these subtle semantic changes
> on an existing interface.
> 
> I would really like to deprecate soft limits and introduce something
> new that has the proper semantics we want from the get-go.  Its
> implementation could very much look like your code, so we can easily
> reuse that.  But the interface and its semantics should come first.

I am open to discussin such a change I just do not see any reason to
have a crippled soft reclaim implementation for the mean time.
Especially when it doesn't look like such a new interface is easy to
agree on.

[...]
> > > You have not shown that prio-0 scans are a problem. 
> > 
> > OK, I thought this was self evident but let me be more specific.
> > 
> > The scan the world is almost always a problem. We are no longer doing
> > proportional anon/file reclaim (swappiness is ignored). This is wrong
> > from at least two points of view. Firstly it makes the reclaim decisions
> > different a lot for groups that are under the soft limit and those
> > that are over. Secondly, and more importantly, this might lead to a
> > pre-mature swapping, especially when there is a lot of IO going on.
> > 
> > The global reclaim suffers from the very same problem and that is why
> > we try to prevent from prio-0 reclaim as much as possible and use it
> > only as a last resort.
> 
> I know that and I can see that this should probably be fixed, but
> there is no quantification for this.  We have no per-memcg reclaim
> statistics

Not having statistic is a separate issue. It makes the situation worse
but that is not a new thing. The old implementation is even worse
because the soft reclaim activity is basically hidden from global
reclaim counters. So a lot of pages might get scanned and we will have
no way to find out. That part is inherently fixed by the series because
of the integration.

> and your test cases were not useful in determining what's going on
> reclaim-wise.

I will instrument the kernel for the next round of tests which would be
hopefully more descriptive.

[...]
> > That simple call from kswapd is not that simple at all in fact. It hides
> > a lot of memcg specific code which is far from being trivial. Even worse
> > that memcg specific code gets back to the reclaim code with different
> > reclaim parameters than those used from the context it has been called
> > from.
> 
> It does not matter to understanding generic reclaim code, though, and
> acts more like the shrinkers.  We send it off to get memory and it
> comes back with results.

Shrinker interface is just too bad. It might work for dentries and
inodes but it failed in many other subsystems where it ended up in
do-something mode. Soft reclaim is yet another example where we are
doing an artificial scan-the-world reclaim to hammer somebody. Fairness
is basically impossible to guarantee and there are corner cases which
are just waiting to explode.

[...]
> Soft limit is about balancing reclaim pressure and I already pointed
> out that your control group has so much limit slack that you can't
> tell if the main group is performing better because of reclaim
> aggressiveness (good) or because the memory is just taken from your
> control group (bad).
> 
> Please either say why I'm wrong or stop asserting points that have
> been refuted.

I will work on improving my testing setup. I will come back with results
early next week hopefully.

[...] 
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-09-13 14:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-18 12:09 [PATCH v5] Soft limit rework Michal Hocko
2013-06-18 12:09 ` [PATCH v5 1/8] memcg, vmscan: integrate soft reclaim tighter with zone shrinking code Michal Hocko
2013-06-18 12:09 ` [PATCH v5 2/8] memcg: Get rid of soft-limit tree infrastructure Michal Hocko
2013-06-18 12:09 ` [PATCH v5 3/8] vmscan, memcg: Do softlimit reclaim also for targeted reclaim Michal Hocko
2013-06-18 12:09 ` [PATCH v5 4/8] memcg: enhance memcg iterator to support predicates Michal Hocko
2013-06-18 12:09 ` [PATCH v5 5/8] memcg: track children in soft limit excess to improve soft limit Michal Hocko
2013-06-18 12:09 ` [PATCH v5 6/8] memcg, vmscan: Do not attempt soft limit reclaim if it would not scan anything Michal Hocko
2013-06-18 12:09 ` [PATCH v5 7/8] memcg: Track all children over limit in the root Michal Hocko
2013-06-18 12:09 ` [PATCH v5 8/8] memcg, vmscan: do not fall into reclaim-all pass too quickly Michal Hocko
2013-06-18 19:01 ` [PATCH v5] Soft limit rework Johannes Weiner
2013-06-19 10:20   ` Michal Hocko
2013-06-20 11:12 ` Mel Gorman
2013-06-21 14:06   ` Michal Hocko
2013-06-21 14:09     ` Michal Hocko
2013-06-21 15:04       ` Michal Hocko
2013-06-21 15:09         ` Michal Hocko
2013-06-21 16:34           ` Tejun Heo
2013-06-25 15:49   ` Michal Hocko
2013-08-19 16:35 ` Johannes Weiner
2013-08-20  9:14   ` Michal Hocko
2013-08-20 14:13     ` Johannes Weiner
2013-08-22 10:58       ` Michal Hocko
2013-09-03 16:15         ` Johannes Weiner
2013-09-04 16:38           ` Michal Hocko
2013-09-06 19:23             ` Johannes Weiner
2013-09-13 14:49               ` Michal Hocko [this message]
2013-09-13 16:17                 ` Johannes Weiner
2013-09-16 16:44                   ` Michal Hocko
     [not found]                     ` <20130916164405.GG3674-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-17 19:56                       ` Johannes Weiner
2013-09-17 20:57                         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130913144953.GA23857@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=glommer@gmail.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    --cc=walken@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).