From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
Balbir Singh
<bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
KAMEZAWA Hiroyuki
<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>,
Michel Lespinasse
<walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: memcg: softlimit on internal nodes
Date: Wed, 24 Apr 2013 17:45:31 -0400 [thread overview]
Message-ID: <20130424214531.GA18686@cmpxchg.org> (raw)
In-Reply-To: <20130422183020.GF12543-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
On Mon, Apr 22, 2013 at 11:30:20AM -0700, Tejun Heo wrote:
> Hey,
>
> On Mon, Apr 22, 2013 at 06:20:12PM +0200, Michal Hocko wrote:
> > Although the default limit is correct it is impractical for use
> > because it doesn't allow for "I behave do not reclaim me if you can"
> > cases. And we can implement such a behavior really easily with backward
> > compatibility and new interfaces (aka reuse the soft limit for that).
>
> Okay, now we're back to square one and I'm reinstating all the mean
> things I said in this thread. :P No wonder everyone is so confused
> about this. Michal, you can't overload two controls which exert
> pressure on the opposite direction onto a single knob and define a
> sane hierarchical behavior for it. You're making it a point control
> rather than range one. Maybe you can define some twisted rules
> serving certain specific use case, but it's gonna be confusing /
> broken for different use cases.
Historically soft limit meant prioritizing certain memcgs over others
and the memcgs over their soft limit should experience relatively more
reclaim pressure than the ones below their soft limit.
Now, if we go and say you are only reclaimed when you exceed your soft
limit we would retain the prioritization aspect. Groups in excess of
their soft limits would still experience relatively more reclaim
pressure than their well-behaved peers. But it would have the nice
side effect of acting more or less like a guarantee as well.
I don't think this approach is as unreasonable as you make it out to
be, but it does make things more complicated. It could be argued that
we should add a separate guarantee knob because two simple knobs might
be better than a complicated one.
The question is whether this solves Google's problem, though.
Currently, when a memcg is selected for a certain type of reclaim, it
and all its children are treated as one single leaf entity in the
overall hierarchy: when a parent node hits its hard limit, we assume
equal fault of every member in the hierarchy for that situation and,
consequently, we reclaim all of them equally. We do the same thing
for the soft limit: if the parent, whose memory consumption is defined
as the sum of memory consumed by all members of the hierarchy,
breaches the soft limit then all members are reclaimed equally because
no single member is more at fault than the others. I would expect if
we added a guarantee knob, this would also mean that no individual
memcg can be treated as being within their guaranteed memory if the
hierarchy as a whole is in excess of its guarantee.
The root of the hierarchy represents the whole hierarchy. Its memory
usage is the combined memory usage of all members. The limit set to
the hierarchy root applies to the combined memory usage of the
hierarchy. Breaching that limit has consequences for the hierarchy as
a whole. Be it soft limit or guarantee.
This is how hierarchies have always worked and it allows limits to be
layered and apply depending on the source of pressure:
root (physical memory = 32G)
/ \
A B (hard limit = 25G, guarantee = 16G)
/ \ / \
A1 A2 / B2 (guarantee = 10G)
/
B1 (guarantee = 15G)
Remember that hard limits are usually overcommitted, so you allow B to
use more of the fair share of memory when A does not need it, but you
want to keep it capped to keep latency reasonable when A ramps up.
As long as B is hitting its own hard limit, you value B1's and B2's
guarantees in the context of pressure local to the hierarchy; in the
context of B having 25G worth of memory; in the context of B1
competing with B2 over the memory allowed by B.
However, as soon as global reclaim kicks in, the context changes and
the priorities shift. Now, B does not have 25G anymore but only 16G
*in its competition with A*. We absolutely do not want to respect the
guarantees made to B1 and B2. Not only can they not be met anyway,
but they are utterly meaningless at this point. They were set with
25G in mind.
[ It may be conceivable that you want different guarantees for B1 and
B2 depending on where the pressure comes from. One setting for when
the 25G limit applies, one setting when the 32G physical memory
limit applies. Basically, every group would need a vector of
guarantee settings with one setting per ancestor.
That being said, I absolutely disagree with the idea of trying to
adhere to individual memcg guarantees in the first reclaim cycle,
regardless of context and then just ignore them on the second pass.
It's a horrible way to guess which context the admin had in mind. ]
Now, there is of course the other scenario in which the current
hierarchical limit application can get in your way: when you give
intermediate nodes their own memory. Because then you may see the
need to apply certain limits to that hierarchy root's local memory
only instead of all memory in the hierarchy. But once we open that
door, you might expect this to be an option for every limit, where
even the hard limit of a hierarchy root only applies to that group's
local memory instead of the whole hierarchy. I certainly do not want
to apply hierarchy semantics for some limits and not for others. But
Google has basically asked for hierarchical hard limits and local soft
limits / guarantees.
In summary, we are now looking at both local and hierarchical limits
times number of ancestors PER MEMCG to support all those use cases
properly.
So I'm asking what I already asked a year ago: are you guys sure you
can not change your cgroup tree layout and that we have to solve it by
adding new limit semantics?!
next prev parent reply other threads:[~2013-04-24 21:45 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-20 0:26 memcg: softlimit on internal nodes Tejun Heo
[not found] ` <20130420002620.GA17179-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-04-20 0:42 ` Tejun Heo
2013-04-20 3:35 ` Greg Thelen
2013-04-21 1:53 ` Tejun Heo
2013-04-20 3:16 ` Michal Hocko
2013-04-21 2:23 ` Tejun Heo
[not found] ` <20130421022321.GE19097-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-04-21 8:55 ` Michel Lespinasse
[not found] ` <CANN689GuN_5QdgPBjr7h6paVmPeCvLHYfLWNLsJMWib9V9G_Fw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-22 4:24 ` Tejun Heo
[not found] ` <20130422042445.GA25089-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-04-22 7:14 ` Michel Lespinasse
2013-04-22 14:48 ` Tejun Heo
2013-04-22 15:37 ` Michal Hocko
2013-04-22 15:46 ` Tejun Heo
[not found] ` <20130422154620.GB12543-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-04-22 15:54 ` Michal Hocko
[not found] ` <20130422155454.GH18286-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-04-22 16:01 ` Tejun Heo
2013-04-23 9:58 ` Michel Lespinasse
2013-04-23 10:17 ` Glauber Costa
[not found] ` <51765FB2.3070506-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-04-23 11:40 ` Michal Hocko
[not found] ` <20130423114020.GC8001-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-04-23 11:54 ` Glauber Costa
2013-04-23 12:51 ` Michel Lespinasse
[not found] ` <CANN689FaGBi+LmdoSGBf3D9HmLD8Emma1_M3T1dARSD6=75B0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-23 13:06 ` Michal Hocko
[not found] ` <20130423130627.GG8001-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-04-23 13:13 ` Glauber Costa
2013-04-23 13:28 ` Michal Hocko
[not found] ` <CANN689Hz5A+iMM3T76-8RCh8YDnoGrYBvtjL_+cXaYRR0OkGRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-23 11:32 ` Michal Hocko
[not found] ` <20130423113216.GB8001-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-04-23 12:45 ` Michel Lespinasse
[not found] ` <CANN689G47EFiSpH-d=yQSiUxPcHXveBi_aCL=o3yoHSa8K7LbQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-23 12:59 ` Michal Hocko
2013-04-23 12:51 ` Michal Hocko
2013-04-21 12:46 ` Michal Hocko
2013-04-22 4:39 ` Tejun Heo
2013-04-22 15:19 ` Michal Hocko
[not found] ` <20130422151908.GF18286-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-04-22 15:57 ` Tejun Heo
[not found] ` <20130422155703.GC12543-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-04-22 15:57 ` Tejun Heo
2013-04-22 16:20 ` Michal Hocko
[not found] ` <20130422162012.GI18286-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-04-22 18:30 ` Tejun Heo
2013-04-23 9:33 ` [RFC v2 0/4] soft limit rework Michal Hocko
[not found] ` <1366709639-10240-1-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>
2013-04-23 9:33 ` [RFC v2 1/4] memcg: integrate soft reclaim tighter with zone shrinking code Michal Hocko
2013-04-23 9:33 ` [RFC v2 2/4] memcg: Get rid of soft-limit tree infrastructure Michal Hocko
2013-04-23 9:33 ` [RFC v2 3/4] vmscan, memcg: Do softlimit reclaim also for targeted reclaim Michal Hocko
2013-04-23 9:33 ` [RFC v2 4/4] memcg: Ignore soft limit until it is explicitly specified Michal Hocko
[not found] ` <20130422183020.GF12543-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-04-23 9:29 ` memcg: softlimit on internal nodes Michal Hocko
2013-04-23 17:09 ` Tejun Heo
2013-04-26 11:51 ` Michal Hocko
[not found] ` <20130426115120.GG31157-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-04-26 18:37 ` Tejun Heo
[not found] ` <20130426183741.GA25940-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-04-29 15:27 ` Michal Hocko
2013-04-24 21:45 ` Johannes Weiner [this message]
2013-04-25 0:33 ` Tejun Heo
[not found] ` <20130425003335.GA32353-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-04-29 18:39 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130424214531.GA18686@cmpxchg.org \
--to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
--cc=bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
--cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox