Re: [PATCH v2 0/4] memcg: Low-limit reclaim

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Greg Thelen <gthelen@google.com>,
	Michel Lespinasse <walken@google.com>, Tejun Heo <tj@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Roman Gushchin <klamm@yandex-team.ru>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH v2 0/4] memcg: Low-limit reclaim
Date: Thu, 5 Jun 2014 18:09:04 +0200	[thread overview]
Message-ID: <20140605160904.GC15939@dhcp22.suse.cz> (raw)
In-Reply-To: <20140605154328.GX2878@cmpxchg.org>

On Thu 05-06-14 11:43:28, Johannes Weiner wrote:
> On Thu, Jun 05, 2014 at 04:32:35PM +0200, Michal Hocko wrote:
> > On Wed 04-06-14 11:44:08, Johannes Weiner wrote:
> > > On Wed, Jun 04, 2014 at 04:46:58PM +0200, Michal Hocko wrote:
> > > > On Tue 03-06-14 10:22:49, Johannes Weiner wrote:
> > > > > On Tue, Jun 03, 2014 at 01:07:43PM +0200, Michal Hocko wrote:
> > > > [...]
> > > > > > If we consider that memcg and its limits are not zone aware while the
> > > > > > page allocator and reclaim are zone oriented then I can see a problem
> > > > > > of unexpected reclaim failure although there is no over commit on the
> > > > > > low_limit globally. And we do not have in-kernel effective measures to
> > > > > > mitigate this inherent problem. At least not now and I am afraid it is
> > > > > > a long route to have something that would work reasonably well in such
> > > > > > cases.
> > > > > 
> > > > > Which "inherent problem"?
> > > > 
> > > > zone unawareness of the limit vs. allocation/reclaim which are zone
> > > > oriented.
> > > 
> > > This is a quote from another subthread where you haven't responded:
> > > 
> > > ---
> > > 
> > > > > > > But who actually cares if an individual zone can be reclaimed?
> > > > > > > 
> > > > > > > Userspace allocations can fall back to any other zone.  Unless there
> > > > > > > are hard bindings, but hopefully nobody binds a memcg to a node that
> > > > > > > is smaller than that memcg's guarantee. 
> > > > > > 
> > > > > > The protected group might spill over to another group and eat it when
> > > > > > another group would be simply pushed out from the node it is bound to.
> > > > > 
> > > > > I don't really understand the point you're trying to make.
> > > > 
> > > > I was just trying to show a case where individual zone matters. To make
> > > > it more specific consider 2 groups A (with low-limit 60% RAM) and B
> > > > (say with low-limit 10% RAM) and bound to a node X (25% of RAM). Now
> > > > having 70% of RAM reserved for guarantee makes some sense, right? B is
> > > > not over-committing the node it is bound to. Yet the A's allocations
> > > > might make pressure on X regardless that the whole system is still doing
> > > > good. This can lead to a situation where X gets depleted and nothing
> > > > would be reclaimable leading to an OOM condition.
> > > 
> > > Once you assume control of memory *placement* in the system like this,
> > > you can not also pretend to be clueless and have unreclaimable memory
> > > of this magnitude spread around into nodes used by other bound tasks.
> > 
> > You are still assuming that the administrator controls the placement.
> > The load running in your memcg might be a black box for admin. E.g. a
> > container which pays $$ to get a priority and not get reclaimed if that
> > is possible. Admin can make sure that the cumulative low_limits for
> > containers are sane but he doesn't have any control over what the loads
> > inside are doing and potential OOM when one tries to DOS the other is
> > definitely not welcome.
> 
> This is completely backwards, though: if you pay for guaranteed

I didn't say anything about guarantee, though. You even do not need
anything as strong as guarantee. You are paying for prioritization.

> memory, you don't want to get reclaimed just because some other task
> that might not even have guarantees starts allocating with a
> restricted node mask.  This breaks isolation.

If the other task doesn't have any limit set then its pages would get
reclaimed. This wouldn't be everybody within low limit situation.

> For one, this can be used maliciously by intentionally binding a
> low-priority task to a node with guaranteed memory and starting to
> allocate.  Even with a small hard limit, you can just plow through
> files to push guaranteed cache of the other group out of memory.
>
> But even if it's not malicious, in such a scenario I'd still prefer
> OOMing the task with the more restrictive node mask over reclaiming
> guaranteed memory.

Why?

> Then, on top of that, we can use direct migration to mitigate OOMs in
> these scenarios (should we sufficiently care about them), but I'd much
> prefer OOMs over breaking isolation and the possible priority
> inversion that is inherent in the fallback on NUMA setups.

Could you be more specific about what you mean by priority inversion?

> > > If we were to actively support such configurations, we should be doing
> > > direct NUMA balancing and migrate these pages out of node X when B
> > > needs to allocate. 
> > 
> > Migration is certainly a way how to reduce the risk. It is a question
> > whether this is something to be done by the kernel implicitly or by
> > administrator.
> 
> As long as the kernel is responsible for *any* placement - i.e. unless
> you bind everything - it might as well be the kernel that fixes it up.
> 
> > > That would fix the problem for all unevictable
> > > memory, not just memcg guarantees, and would prefer node-offloading
> > > over swapping in cases where swap is available.
> > 
> > That would certainly lower the risk. But there still might be unmovable
> > memory sitting on the node so this will never be 100%.
> 
> Yes, and as per above, I think in most cases it's actually preferable
> to kill the bound task (or direct-migrate) over violating guarantees
> of another task.
> 
> > > > > > So to me it sounds more responsible to promise only as much as we can
> > > > > > handle. I think that fallback mode is not crippling the semantic of
> > > > > > the knob as it triggers only for limit overcommit or strange corner
> > > > > > cases. We have agreed that we do not care about the first one and
> > > > > > handling the later one by potentially fatal action doesn't sounds very
> > > > > > user friendly to me.
> > > > > 
> > > > > It *absolutely* cripples the semantics.  Think about the security use
> > > > > cases of mlock for example, where certain memory may never hit the
> > > > > platter.  This wouldn't be possible with your watered down guarantees.
> > > > 
> > > > Is this really a use case? It sounds like a weak one to me. Because
> > > > any sudden memory consumption above the limit can reclaim your
> > > > to-protect-page it will hit the platter and you cannot do anything about
> > > > this. So yeah, this is not mlock.
> > > 
> > > You are right, that is a weak usecase.
> > > 
> > > It doesn't change the fact that it does severely weaken the semantics
> > > and turns it into another best-effort mechanism that the user can't
> > > count on.  This sucks.  It sucked with soft limits and it will suck
> > > again.  The irony is that Greg even pointed out you should be doing
> > > soft limits if you want this sort of behavior.
> > 
> > The question is whether we really _need_ hard guarantees. I came with
> > the low_limit as a replacement for soft_limit which really sucks. But it
> > sucks not because you cannot count on it. It is the way how it has
> > opposite semantic which sucks - and the implementation of course. I have
> > tried to fix it and that route was a no-go.
> 
> We need hard guarantees for actual isolation.  Otherwise you can't
> charge for guarantees.

You can still charge for prioritization. Which on its own is a valid use
case. You seem to be bound to hard guanratee and overlook that there is
a class of use cases which do not need such a behavior.

Please note that setting up hard guarantee is really non trivial task.
Especially if any downtime of the service which you want to protect is a
big deal. I wouldn't be surprised if the risk was big enough that using
the limit would be a no-go although there would be a possibility of
performance improvement.

> > I think the hard guarantee makes some sense when we allow to overcommit
> > the limit. Somebody might really want to setup lowlimit == hardlimit
> > because reclaim would be more harmful than restart of the application.
> > I would however expect that this would be more of an exception rather
> > than regular use. Most users I can think of will set low_limit to an
> > effective working set size to be isolated from other loads and ephemeral
> > reclaim will not hurt them. OOM would on other hand would be really
> > harmful.
> 
> It's not an either-or because OOM would happen to one group, and
> guaranteed memory reclaim would happen to another.

I do not follow.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-06-05 16:09 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-28 12:26 [PATCH v2 0/4] memcg: Low-limit reclaim Michal Hocko
2014-04-28 12:26 ` [PATCH 1/4] memcg, mm: introduce lowlimit reclaim Michal Hocko
2014-04-30 22:55   ` Johannes Weiner
2014-05-02  9:36     ` Michal Hocko
2014-05-02 12:07       ` Michal Hocko
2014-05-02 13:01         ` Johannes Weiner
2014-05-02 14:15           ` Michal Hocko
2014-05-02 15:04             ` Johannes Weiner
2014-05-02 15:11               ` Michal Hocko
2014-05-02 15:34                 ` Johannes Weiner
2014-05-02 15:48                   ` Michal Hocko
2014-05-06 19:58                     ` Michal Hocko
2014-05-02 15:58       ` Johannes Weiner
2014-05-02 16:49         ` Michal Hocko
2014-05-02 22:00           ` Johannes Weiner
2014-05-05 14:21             ` Michal Hocko
2014-05-19 16:18               ` Michal Hocko
2014-06-11 15:15               ` Johannes Weiner
2014-06-11 16:08                 ` Michal Hocko
2014-05-06 13:29             ` Johannes Weiner
2014-05-06 14:32               ` Michal Hocko
2014-05-06 15:21                 ` Johannes Weiner
2014-05-06 16:12                   ` Michal Hocko
2014-05-06 16:51                     ` Johannes Weiner
2014-05-06 18:30                       ` Michal Hocko
2014-05-06 19:55                         ` Johannes Weiner
2014-04-28 12:26 ` [PATCH 2/4] memcg: Allow setting low_limit Michal Hocko
2014-04-28 12:26 ` [PATCH 3/4] memcg, doc: clarify global vs. limit reclaims Michal Hocko
2014-04-30 23:03   ` Johannes Weiner
2014-05-02  9:43     ` Michal Hocko
2014-05-06 19:56       ` Michal Hocko
2014-04-28 12:26 ` [PATCH 4/4] memcg: Document memory.low_limit_in_bytes Michal Hocko
2014-04-30 22:57   ` Johannes Weiner
2014-05-02  9:46     ` Michal Hocko
2014-04-28 15:46 ` [PATCH v2 0/4] memcg: Low-limit reclaim Roman Gushchin
2014-04-29  7:42   ` Greg Thelen
2014-04-29 10:50     ` Roman Gushchin
2014-04-29 12:54       ` Michal Hocko
2014-04-30 21:52 ` Andrew Morton
2014-04-30 22:49   ` Johannes Weiner
2014-05-02 12:03   ` Michal Hocko
2014-04-30 21:59 ` Andrew Morton
2014-05-02 11:22   ` Michal Hocko
2014-05-28 12:10 ` Michal Hocko
2014-05-28 13:49   ` Johannes Weiner
2014-05-28 14:21     ` Michal Hocko
2014-05-28 15:28       ` Johannes Weiner
2014-05-28 15:54         ` Michal Hocko
2014-05-28 16:33           ` Johannes Weiner
2014-06-03 11:07             ` Michal Hocko
2014-06-03 14:22               ` Johannes Weiner
2014-06-04 14:46                 ` Michal Hocko
2014-06-04 15:44                   ` Johannes Weiner
2014-06-04 19:18                     ` Hugh Dickins
2014-06-04 21:45                       ` Johannes Weiner
2014-06-05 14:51                         ` Michal Hocko
2014-06-05 16:10                           ` Johannes Weiner
2014-06-05 16:43                             ` Michal Hocko
2014-06-05 18:23                               ` Johannes Weiner
2014-06-06 14:44                                 ` Michal Hocko
2014-06-06 14:46                                   ` [PATCH 1/2] mm, memcg: allow OOM if no memcg is eligible during direct reclaim Michal Hocko
2014-06-06 14:46                                     ` [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim Michal Hocko
2014-06-06 15:29                                       ` Tejun Heo
2014-06-06 15:34                                         ` Tejun Heo
2014-06-09  8:30                                         ` Michal Hocko
2014-06-09 13:54                                           ` Tejun Heo
2014-06-09 22:52                                       ` Greg Thelen
2014-06-10 16:57                                         ` Johannes Weiner
2014-06-10 22:16                                           ` Greg Thelen
2014-06-11  7:57                                           ` Michal Hocko
2014-06-11  8:00                                             ` [PATCH 1/2] mm, memcg: allow OOM if no memcg is eligible during direct reclaim Michal Hocko
2014-06-11  8:00                                               ` [PATCH 2/2] memcg: Allow guarantee reclaim Michal Hocko
2014-06-11 15:36                                                 ` Johannes Weiner
2014-06-12 13:22                                                   ` Michal Hocko
2014-06-12 13:56                                                     ` Johannes Weiner
2014-06-12 14:22                                                       ` Michal Hocko
2014-06-12 16:17                                                         ` Tejun Heo
2014-06-16 12:59                                                           ` Michal Hocko
2014-06-16 13:57                                                             ` Tejun Heo
2014-06-16 14:04                                                               ` Michal Hocko
2014-06-16 14:12                                                                 ` Tejun Heo
2014-06-16 14:29                                                                   ` Michal Hocko
2014-06-16 14:40                                                                     ` Tejun Heo
2014-06-12 16:51                                                         ` Johannes Weiner
2014-06-16 13:22                                                           ` Michal Hocko
2014-06-11 15:20                                               ` [PATCH 1/2] mm, memcg: allow OOM if no memcg is eligible during direct reclaim Johannes Weiner
2014-06-11 16:14                                                 ` Michal Hocko
2014-06-11 12:31                                             ` [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim Tejun Heo
2014-06-11 14:11                                               ` Michal Hocko
2014-06-11 15:34                                                 ` Tejun Heo
2014-06-05 19:36                       ` [PATCH v2 0/4] memcg: Low-limit reclaim Tejun Heo
2014-06-05 14:32                     ` Michal Hocko
2014-06-05 15:43                       ` Johannes Weiner
2014-06-05 16:09                         ` Michal Hocko [this message]
2014-06-05 16:46                           ` Johannes Weiner
2014-05-28 16:17         ` Greg Thelen
2014-06-03 11:09           ` Michal Hocko
2014-06-03 14:01             ` Greg Thelen
2014-06-03 14:44               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140605160904.GC15939@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=klamm@yandex-team.ru \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=tj@kernel.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).