linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, Roman Gushchin <guro@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes.
Date: Mon, 20 Aug 2018 07:54:17 +0200	[thread overview]
Message-ID: <20180820055417.GA29735@dhcp22.suse.cz> (raw)
In-Reply-To: <49a73f8a-a472-a464-f5bf-ebd7994ce2d3@i-love.sakura.ne.jp>

On Sun 19-08-18 23:23:41, Tetsuo Handa wrote:
> On 2018/08/14 20:33, Michal Hocko wrote:
> > On Sat 11-08-18 12:12:52, Tetsuo Handa wrote:
> >> On 2018/08/10 20:16, Michal Hocko wrote:
> >>>> How do you decide whether oom_reaper() was not able to reclaim much?
> >>>
> >>> Just a rule of thumb. If it freed at least few kBs then we should be good
> >>> to MMF_OOM_SKIP.
> >>
> >> I don't think so. We are talking about situations where MMF_OOM_SKIP is set
> >> before memory enough to prevent the OOM killer from selecting next OOM victim
> >> was reclaimed.
> > 
> > There is nothing like enough memory to prevent a new victim selection.
> > Just think of streaming source of allocation without any end. There is
> > simply no way to tell that we have freed enough. We have to guess and
> > tune based on reasonable workloads.
> 
> I'm not talking about "allocation without any end" case.
> We already inserted fatal_signal_pending(current) checks (except vmalloc()
> where tsk_is_oom_victim(current) would be used instead).
> 
> What we are talking about is a situation where we could avoid selecting next
> OOM victim if we waited for some more time after MMF_OOM_SKIP was set.

And that some more time is undefined without a crystal ball. And we have
desperately shortage of those.
 
> >> Apart from the former is "sequential processing" and "the OOM reaper pays the cost
> >> for reclaiming" while the latter is "parallel (or round-robin) processing" and "the
> >> allocating thread pays the cost for reclaiming", both are timeout based back off
> >> with number of retry attempt with a cap.
> > 
> > And it is exactly the who pays the price concern I've already tried to
> > explain that bothers me.
> 
> Are you aware that we can fall into situation where nobody can pay the price for
> reclaiming memory?

I fail to see how this is related to direct vs. kthread oom reaping
though. Unless the kthread is starved by other means then it can always
jump in and handle the situation.

> > I really do not see how making the code more complex by ensuring that
> > allocators share a fair part of the direct oom repaing will make the
> > situation any easier.
> 
> You are completely ignoring/misunderstanding the background of
> commit 9bfe5ded054b8e28 ("mm, oom: remove sleep from under oom_lock").
> 
> That patch was applied in order to mitigate a lockup problem caused by the fact
> that allocators can deprive the OOM reaper of all CPU resources for making progress
> due to very very broken assumption at
> 
>         /*
>          * Acquire the oom lock.  If that fails, somebody else is
>          * making progress for us.
>          */
>         if (!mutex_trylock(&oom_lock)) {
>                 *did_some_progress = 1;
>                 schedule_timeout_uninterruptible(1);
>                 return NULL;
>         }
> 
> on the allocator side.
> 
> Direct OOM reaping is a method for ensuring that allocators spend _some_ CPU
> resources for making progress. I already showed how to prevent allocators from
> trying to reclaim all (e.g. multiple TB) memory at once because you worried it.
> 
> >                       Really there are basically two issues we really
> > should be after. Improve the oom reaper to tear down wider range of
> > memory (namely mlock) and to improve the cooperation with the exit path
> > to handle free_pgtables more gracefully because it is true that some
> > processes might really consume a lot of memory in page tables without
> > mapping  a lot of anonymous memory. Neither of the two is addressed by
> > your proposal. So if you want to help then try to think about the two
> > issues.
> 
> Your "improvement" is to tear down wider range of memory whereas
> my "improvement" is to ensure that CPU resource is spent for reclaiming memory and
> David's "improvement" is to mitigate unnecessary killing of additional processes.
> Therefore, your "Neither of the two is addressed by your proposal." is pointless.

OK, then we really have to agree to disagree.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-08-20  5:54 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-04 13:29 [PATCH 1/4] mm, oom: Remove wake_oom_reaper() Tetsuo Handa
2018-08-04 13:29 ` [PATCH 2/4] mm, oom: Check pending victims earlier in out_of_memory() Tetsuo Handa
2018-08-04 13:29 ` [PATCH 3/4] mm, oom: Remove unused "abort" path Tetsuo Handa
2018-08-04 13:29 ` [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes Tetsuo Handa
2018-08-06 13:45   ` Michal Hocko
2018-08-06 20:19     ` David Rientjes
2018-08-06 20:51       ` Michal Hocko
2018-08-09 20:16         ` David Rientjes
2018-08-10  9:07           ` Michal Hocko
2018-08-10 10:54             ` Tetsuo Handa
2018-08-10 11:16               ` Michal Hocko
2018-08-11  3:12                 ` Tetsuo Handa
2018-08-14 11:33                   ` Michal Hocko
2018-08-19 14:23                     ` Tetsuo Handa
2018-08-20  5:54                       ` Michal Hocko [this message]
2018-08-20 22:03                         ` Tetsuo Handa
2018-08-21  6:16                           ` Michal Hocko
2018-08-21 13:39                             ` Tetsuo Handa
2018-08-19 23:45             ` David Rientjes
2018-08-20  6:07               ` Michal Hocko
2018-08-20 21:31                 ` David Rientjes
2018-08-21  6:09                   ` Michal Hocko
2018-08-21 17:20                     ` David Rientjes
2018-08-22  8:03                       ` Michal Hocko
2018-08-22 20:54                         ` David Rientjes
2018-09-01 11:48         ` Tetsuo Handa
2018-09-06 11:35           ` Michal Hocko
2018-09-06 11:50             ` Tetsuo Handa
2018-09-06 12:05               ` Michal Hocko
2018-09-06 13:40                 ` Tetsuo Handa
2018-09-06 13:56                   ` Michal Hocko
2018-09-06 14:06                     ` Tetsuo Handa
2018-09-06 14:16                       ` Michal Hocko
2018-09-06 21:13                         ` Tetsuo Handa
2018-09-07 11:10                           ` Michal Hocko
2018-09-07 11:36                             ` Tetsuo Handa
2018-09-07 11:51                               ` Michal Hocko
2018-09-07 13:30                                 ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180820055417.GA29735@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).