From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Greg Thelen <gthelen@google.com>,
Michel Lespinasse <walken@google.com>, Tejun Heo <tj@kernel.org>,
Hugh Dickins <hughd@google.com>,
Roman Gushchin <klamm@yandex-team.ru>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH v2 0/4] memcg: Low-limit reclaim
Date: Thu, 5 Jun 2014 18:09:04 +0200 [thread overview]
Message-ID: <20140605160904.GC15939@dhcp22.suse.cz> (raw)
In-Reply-To: <20140605154328.GX2878@cmpxchg.org>
On Thu 05-06-14 11:43:28, Johannes Weiner wrote:
> On Thu, Jun 05, 2014 at 04:32:35PM +0200, Michal Hocko wrote:
> > On Wed 04-06-14 11:44:08, Johannes Weiner wrote:
> > > On Wed, Jun 04, 2014 at 04:46:58PM +0200, Michal Hocko wrote:
> > > > On Tue 03-06-14 10:22:49, Johannes Weiner wrote:
> > > > > On Tue, Jun 03, 2014 at 01:07:43PM +0200, Michal Hocko wrote:
> > > > [...]
> > > > > > If we consider that memcg and its limits are not zone aware while the
> > > > > > page allocator and reclaim are zone oriented then I can see a problem
> > > > > > of unexpected reclaim failure although there is no over commit on the
> > > > > > low_limit globally. And we do not have in-kernel effective measures to
> > > > > > mitigate this inherent problem. At least not now and I am afraid it is
> > > > > > a long route to have something that would work reasonably well in such
> > > > > > cases.
> > > > >
> > > > > Which "inherent problem"?
> > > >
> > > > zone unawareness of the limit vs. allocation/reclaim which are zone
> > > > oriented.
> > >
> > > This is a quote from another subthread where you haven't responded:
> > >
> > > ---
> > >
> > > > > > > But who actually cares if an individual zone can be reclaimed?
> > > > > > >
> > > > > > > Userspace allocations can fall back to any other zone. Unless there
> > > > > > > are hard bindings, but hopefully nobody binds a memcg to a node that
> > > > > > > is smaller than that memcg's guarantee.
> > > > > >
> > > > > > The protected group might spill over to another group and eat it when
> > > > > > another group would be simply pushed out from the node it is bound to.
> > > > >
> > > > > I don't really understand the point you're trying to make.
> > > >
> > > > I was just trying to show a case where individual zone matters. To make
> > > > it more specific consider 2 groups A (with low-limit 60% RAM) and B
> > > > (say with low-limit 10% RAM) and bound to a node X (25% of RAM). Now
> > > > having 70% of RAM reserved for guarantee makes some sense, right? B is
> > > > not over-committing the node it is bound to. Yet the A's allocations
> > > > might make pressure on X regardless that the whole system is still doing
> > > > good. This can lead to a situation where X gets depleted and nothing
> > > > would be reclaimable leading to an OOM condition.
> > >
> > > Once you assume control of memory *placement* in the system like this,
> > > you can not also pretend to be clueless and have unreclaimable memory
> > > of this magnitude spread around into nodes used by other bound tasks.
> >
> > You are still assuming that the administrator controls the placement.
> > The load running in your memcg might be a black box for admin. E.g. a
> > container which pays $$ to get a priority and not get reclaimed if that
> > is possible. Admin can make sure that the cumulative low_limits for
> > containers are sane but he doesn't have any control over what the loads
> > inside are doing and potential OOM when one tries to DOS the other is
> > definitely not welcome.
>
> This is completely backwards, though: if you pay for guaranteed
I didn't say anything about guarantee, though. You even do not need
anything as strong as guarantee. You are paying for prioritization.
> memory, you don't want to get reclaimed just because some other task
> that might not even have guarantees starts allocating with a
> restricted node mask. This breaks isolation.
If the other task doesn't have any limit set then its pages would get
reclaimed. This wouldn't be everybody within low limit situation.
> For one, this can be used maliciously by intentionally binding a
> low-priority task to a node with guaranteed memory and starting to
> allocate. Even with a small hard limit, you can just plow through
> files to push guaranteed cache of the other group out of memory.
>
> But even if it's not malicious, in such a scenario I'd still prefer
> OOMing the task with the more restrictive node mask over reclaiming
> guaranteed memory.
Why?
> Then, on top of that, we can use direct migration to mitigate OOMs in
> these scenarios (should we sufficiently care about them), but I'd much
> prefer OOMs over breaking isolation and the possible priority
> inversion that is inherent in the fallback on NUMA setups.
Could you be more specific about what you mean by priority inversion?
> > > If we were to actively support such configurations, we should be doing
> > > direct NUMA balancing and migrate these pages out of node X when B
> > > needs to allocate.
> >
> > Migration is certainly a way how to reduce the risk. It is a question
> > whether this is something to be done by the kernel implicitly or by
> > administrator.
>
> As long as the kernel is responsible for *any* placement - i.e. unless
> you bind everything - it might as well be the kernel that fixes it up.
>
> > > That would fix the problem for all unevictable
> > > memory, not just memcg guarantees, and would prefer node-offloading
> > > over swapping in cases where swap is available.
> >
> > That would certainly lower the risk. But there still might be unmovable
> > memory sitting on the node so this will never be 100%.
>
> Yes, and as per above, I think in most cases it's actually preferable
> to kill the bound task (or direct-migrate) over violating guarantees
> of another task.
>
> > > > > > So to me it sounds more responsible to promise only as much as we can
> > > > > > handle. I think that fallback mode is not crippling the semantic of
> > > > > > the knob as it triggers only for limit overcommit or strange corner
> > > > > > cases. We have agreed that we do not care about the first one and
> > > > > > handling the later one by potentially fatal action doesn't sounds very
> > > > > > user friendly to me.
> > > > >
> > > > > It *absolutely* cripples the semantics. Think about the security use
> > > > > cases of mlock for example, where certain memory may never hit the
> > > > > platter. This wouldn't be possible with your watered down guarantees.
> > > >
> > > > Is this really a use case? It sounds like a weak one to me. Because
> > > > any sudden memory consumption above the limit can reclaim your
> > > > to-protect-page it will hit the platter and you cannot do anything about
> > > > this. So yeah, this is not mlock.
> > >
> > > You are right, that is a weak usecase.
> > >
> > > It doesn't change the fact that it does severely weaken the semantics
> > > and turns it into another best-effort mechanism that the user can't
> > > count on. This sucks. It sucked with soft limits and it will suck
> > > again. The irony is that Greg even pointed out you should be doing
> > > soft limits if you want this sort of behavior.
> >
> > The question is whether we really _need_ hard guarantees. I came with
> > the low_limit as a replacement for soft_limit which really sucks. But it
> > sucks not because you cannot count on it. It is the way how it has
> > opposite semantic which sucks - and the implementation of course. I have
> > tried to fix it and that route was a no-go.
>
> We need hard guarantees for actual isolation. Otherwise you can't
> charge for guarantees.
You can still charge for prioritization. Which on its own is a valid use
case. You seem to be bound to hard guanratee and overlook that there is
a class of use cases which do not need such a behavior.
Please note that setting up hard guarantee is really non trivial task.
Especially if any downtime of the service which you want to protect is a
big deal. I wouldn't be surprised if the risk was big enough that using
the limit would be a no-go although there would be a possibility of
performance improvement.
> > I think the hard guarantee makes some sense when we allow to overcommit
> > the limit. Somebody might really want to setup lowlimit == hardlimit
> > because reclaim would be more harmful than restart of the application.
> > I would however expect that this would be more of an exception rather
> > than regular use. Most users I can think of will set low_limit to an
> > effective working set size to be isolated from other loads and ephemeral
> > reclaim will not hurt them. OOM would on other hand would be really
> > harmful.
>
> It's not an either-or because OOM would happen to one group, and
> guaranteed memory reclaim would happen to another.
I do not follow.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-06-05 16:09 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-28 12:26 [PATCH v2 0/4] memcg: Low-limit reclaim Michal Hocko
2014-04-28 12:26 ` [PATCH 1/4] memcg, mm: introduce lowlimit reclaim Michal Hocko
2014-04-30 22:55 ` Johannes Weiner
2014-05-02 9:36 ` Michal Hocko
2014-05-02 12:07 ` Michal Hocko
2014-05-02 13:01 ` Johannes Weiner
2014-05-02 14:15 ` Michal Hocko
2014-05-02 15:04 ` Johannes Weiner
2014-05-02 15:11 ` Michal Hocko
2014-05-02 15:34 ` Johannes Weiner
2014-05-02 15:48 ` Michal Hocko
2014-05-06 19:58 ` Michal Hocko
2014-05-02 15:58 ` Johannes Weiner
2014-05-02 16:49 ` Michal Hocko
2014-05-02 22:00 ` Johannes Weiner
2014-05-05 14:21 ` Michal Hocko
2014-05-19 16:18 ` Michal Hocko
2014-06-11 15:15 ` Johannes Weiner
2014-06-11 16:08 ` Michal Hocko
2014-05-06 13:29 ` Johannes Weiner
2014-05-06 14:32 ` Michal Hocko
2014-05-06 15:21 ` Johannes Weiner
2014-05-06 16:12 ` Michal Hocko
2014-05-06 16:51 ` Johannes Weiner
2014-05-06 18:30 ` Michal Hocko
2014-05-06 19:55 ` Johannes Weiner
2014-04-28 12:26 ` [PATCH 2/4] memcg: Allow setting low_limit Michal Hocko
2014-04-28 12:26 ` [PATCH 3/4] memcg, doc: clarify global vs. limit reclaims Michal Hocko
2014-04-30 23:03 ` Johannes Weiner
2014-05-02 9:43 ` Michal Hocko
2014-05-06 19:56 ` Michal Hocko
2014-04-28 12:26 ` [PATCH 4/4] memcg: Document memory.low_limit_in_bytes Michal Hocko
2014-04-30 22:57 ` Johannes Weiner
2014-05-02 9:46 ` Michal Hocko
2014-04-28 15:46 ` [PATCH v2 0/4] memcg: Low-limit reclaim Roman Gushchin
2014-04-29 7:42 ` Greg Thelen
2014-04-29 10:50 ` Roman Gushchin
2014-04-29 12:54 ` Michal Hocko
2014-04-30 21:52 ` Andrew Morton
2014-04-30 22:49 ` Johannes Weiner
2014-05-02 12:03 ` Michal Hocko
2014-04-30 21:59 ` Andrew Morton
2014-05-02 11:22 ` Michal Hocko
2014-05-28 12:10 ` Michal Hocko
2014-05-28 13:49 ` Johannes Weiner
2014-05-28 14:21 ` Michal Hocko
2014-05-28 15:28 ` Johannes Weiner
2014-05-28 15:54 ` Michal Hocko
2014-05-28 16:33 ` Johannes Weiner
2014-06-03 11:07 ` Michal Hocko
2014-06-03 14:22 ` Johannes Weiner
2014-06-04 14:46 ` Michal Hocko
2014-06-04 15:44 ` Johannes Weiner
2014-06-04 19:18 ` Hugh Dickins
2014-06-04 21:45 ` Johannes Weiner
2014-06-05 14:51 ` Michal Hocko
2014-06-05 16:10 ` Johannes Weiner
2014-06-05 16:43 ` Michal Hocko
2014-06-05 18:23 ` Johannes Weiner
2014-06-06 14:44 ` Michal Hocko
2014-06-06 14:46 ` [PATCH 1/2] mm, memcg: allow OOM if no memcg is eligible during direct reclaim Michal Hocko
2014-06-06 14:46 ` [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim Michal Hocko
2014-06-06 15:29 ` Tejun Heo
2014-06-06 15:34 ` Tejun Heo
2014-06-09 8:30 ` Michal Hocko
2014-06-09 13:54 ` Tejun Heo
2014-06-09 22:52 ` Greg Thelen
2014-06-10 16:57 ` Johannes Weiner
2014-06-10 22:16 ` Greg Thelen
2014-06-11 7:57 ` Michal Hocko
2014-06-11 8:00 ` [PATCH 1/2] mm, memcg: allow OOM if no memcg is eligible during direct reclaim Michal Hocko
2014-06-11 8:00 ` [PATCH 2/2] memcg: Allow guarantee reclaim Michal Hocko
2014-06-11 15:36 ` Johannes Weiner
2014-06-12 13:22 ` Michal Hocko
2014-06-12 13:56 ` Johannes Weiner
2014-06-12 14:22 ` Michal Hocko
2014-06-12 16:17 ` Tejun Heo
2014-06-16 12:59 ` Michal Hocko
2014-06-16 13:57 ` Tejun Heo
2014-06-16 14:04 ` Michal Hocko
2014-06-16 14:12 ` Tejun Heo
2014-06-16 14:29 ` Michal Hocko
2014-06-16 14:40 ` Tejun Heo
2014-06-12 16:51 ` Johannes Weiner
2014-06-16 13:22 ` Michal Hocko
2014-06-11 15:20 ` [PATCH 1/2] mm, memcg: allow OOM if no memcg is eligible during direct reclaim Johannes Weiner
2014-06-11 16:14 ` Michal Hocko
2014-06-11 12:31 ` [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim Tejun Heo
2014-06-11 14:11 ` Michal Hocko
2014-06-11 15:34 ` Tejun Heo
2014-06-05 19:36 ` [PATCH v2 0/4] memcg: Low-limit reclaim Tejun Heo
2014-06-05 14:32 ` Michal Hocko
2014-06-05 15:43 ` Johannes Weiner
2014-06-05 16:09 ` Michal Hocko [this message]
2014-06-05 16:46 ` Johannes Weiner
2014-05-28 16:17 ` Greg Thelen
2014-06-03 11:09 ` Michal Hocko
2014-06-03 14:01 ` Greg Thelen
2014-06-03 14:44 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140605160904.GC15939@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=klamm@yandex-team.ru \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=tj@kernel.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).