linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org
Subject: Re: [PATCH] mm/page_alloc: Wait for oom_lock before retrying.
Date: Thu, 8 Dec 2016 14:27:15 +0100	[thread overview]
Message-ID: <20161208132714.GA26530@dhcp22.suse.cz> (raw)
In-Reply-To: <201612080029.IBD55588.OSOFOtHVMLQFFJ@I-love.SAKURA.ne.jp>

On Thu 08-12-16 00:29:26, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 06-12-16 19:33:59, Tetsuo Handa wrote:
> > > If the OOM killer is invoked when many threads are looping inside the
> > > page allocator, it is possible that the OOM killer is preempted by other
> > > threads.
> > 
> > Hmm, the only way I can see this would happen is when the task which
> > actually manages to take the lock is not invoking the OOM killer for
> > whatever reason. Is this what happens in your case? Are you able to
> > trigger this reliably?
> 
> Regarding http://I-love.SAKURA.ne.jp/tmp/serial-20161206.txt.xz ,
> somebody called oom_kill_process() and reached
> 
>   pr_err("%s: Kill process %d (%s) score %u or sacrifice child\n",
> 
> line but did not reach
> 
>   pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
> 
> line within tolerable delay.

I would be really interested in that. This can happen only if
find_lock_task_mm fails. This would mean that either we are selecting a
child without mm or the selected victim has no mm anymore. Both cases
should be ephemeral because oom_badness will rule those tasks on the
next round. So the primary question here is why no other task has hit
out_of_memory. Have you tried to instrument the kernel and see whether
GFP_NOFS contexts simply preempted any other attempt to get there?
I would find it quite unlikely but not impossible. If that is the case
we should really think how to move forward. One way is to make the oom
path fully synchronous as suggested below. Other is to tweak GFP_NOFS
some more and do not take the lock while we are evaluating that. This
sounds quite messy though.

[...]

> > So, why don't you simply s@mutex_trylock@mutex_lock_killable@ then?
> > The trylock is simply an optimistic heuristic to retry while the memory
> > is being freed. Making this part sync might help for the case you are
> > seeing.
> 
> May I? Something like below? With patch below, the OOM killer can send
> SIGKILL smoothly and printk() can report smoothly (the frequency of
> "** XXX printk messages dropped **" messages is significantly reduced).

Well, this has to be properly evaluated. The fact that
__oom_reap_task_mm requires the oom_lock makes it more complicated. We
definitely do not want to starve it. On the other hand the oom
invocation path shouldn't stall for too long and even when we have
hundreds of tasks blocked on the lock and blocking the oom reaper then
the reaper should run _eventually_. It might take some time but this a
glacial slow path so it should be acceptable.

That being said, this should be OK. But please make sure to mention all
these details in the changelog. Also make sure to document the actual
failure mode as mentioned above.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2c6d5f6..ee0105b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3075,7 +3075,7 @@ void warn_alloc(gfp_t gfp_mask, const char *fmt, ...)
>  	 * Acquire the oom lock.  If that fails, somebody else is
>  	 * making progress for us.
>  	 */
> -	if (!mutex_trylock(&oom_lock)) {
> +	if (mutex_lock_killable(&oom_lock)) {
>  		*did_some_progress = 1;
>  		schedule_timeout_uninterruptible(1);
>  		return NULL;

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-12-08 13:27 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-06 10:33 [PATCH] mm/page_alloc: Wait for oom_lock before retrying Tetsuo Handa
2016-12-07  8:15 ` Michal Hocko
2016-12-07 15:29   ` Tetsuo Handa
2016-12-08  8:20     ` Vlastimil Babka
2016-12-08 11:00       ` Tetsuo Handa
2016-12-08 13:32         ` Michal Hocko
2016-12-08 16:18         ` Sergey Senozhatsky
2016-12-08 13:27     ` Michal Hocko [this message]
2016-12-09 14:23       ` Tetsuo Handa
2016-12-09 14:46         ` Michal Hocko
2016-12-10 11:24           ` Tetsuo Handa
2016-12-12  9:07             ` Michal Hocko
2016-12-12 11:49               ` Petr Mladek
2016-12-12 13:00                 ` Michal Hocko
2016-12-12 14:05                   ` Tetsuo Handa
2016-12-13  1:06                 ` Sergey Senozhatsky
2016-12-12 12:12               ` Tetsuo Handa
2016-12-12 12:55                 ` Michal Hocko
2016-12-12 13:19                   ` Michal Hocko
2016-12-13 12:06                     ` Tetsuo Handa
2016-12-13 17:06                       ` Michal Hocko
2016-12-14 11:37                         ` Tetsuo Handa
2016-12-14 12:42                           ` Michal Hocko
2016-12-14 16:36                             ` Tetsuo Handa
2016-12-14 18:18                               ` Michal Hocko
2016-12-15 10:21                                 ` Tetsuo Handa
2016-12-19 11:25                                   ` Tetsuo Handa
2016-12-19 12:27                                     ` Sergey Senozhatsky
2016-12-20 15:39                                       ` Sergey Senozhatsky
2016-12-22 10:27                                         ` Tetsuo Handa
2016-12-22 10:53                                           ` Petr Mladek
2016-12-22 13:40                                             ` Sergey Senozhatsky
2016-12-22 13:33                                           ` Tetsuo Handa
2016-12-22 19:24                                             ` Michal Hocko
2016-12-24  6:25                                               ` Tetsuo Handa
2016-12-26 11:49                                                 ` Michal Hocko
2016-12-27 10:39                                                   ` Tetsuo Handa
2016-12-27 10:57                                                     ` Michal Hocko
2016-12-22 13:42                                           ` Sergey Senozhatsky
2016-12-22 14:01                                             ` Tetsuo Handa
2016-12-22 14:09                                               ` Sergey Senozhatsky
2016-12-22 14:30                                                 ` Sergey Senozhatsky
2016-12-26 10:54                                                 ` Tetsuo Handa
2016-12-26 11:34                                                   ` Sergey Senozhatsky
2017-01-12 13:10                                                     ` Petr Mladek
2017-01-13  2:52                                                       ` Sergey Senozhatsky
2017-01-13  3:53                                                         ` Sergey Senozhatsky
2017-01-13 11:15                                                           ` Petr Mladek
2017-01-13 11:14                                                         ` Petr Mladek
2017-01-12 14:18                                                     ` Petr Mladek
2017-01-13  2:28                                                       ` Sergey Senozhatsky
2017-01-13 11:03                                                         ` Petr Mladek
2017-01-13 11:50                                                           ` Sergey Senozhatsky
2017-01-13 12:15                                                             ` Petr Mladek
2016-12-26 11:41                                                   ` Sergey Senozhatsky
2017-01-13 14:03                                                     ` Petr Mladek
2016-12-15  1:11                         ` Sergey Senozhatsky
2016-12-15  6:35                           ` Michal Hocko
2016-12-15 10:16                             ` Petr Mladek
2016-12-14  9:37                       ` Petr Mladek
2016-12-14 10:20                         ` Sergey Senozhatsky
2016-12-14 11:01                           ` Petr Mladek
2016-12-14 12:23                             ` Sergey Senozhatsky
2016-12-14 12:47                               ` Petr Mladek
2016-12-14 10:26                         ` Michal Hocko
2016-12-15  7:34                           ` Sergey Senozhatsky
2016-12-14 11:37                         ` Tetsuo Handa
2016-12-14 12:36                           ` Petr Mladek
2016-12-14 12:44                             ` Michal Hocko
2016-12-14 13:36                               ` Tetsuo Handa
2016-12-14 13:52                                 ` Michal Hocko
2016-12-14 12:50                             ` Sergey Senozhatsky
2016-12-12 14:59                   ` Tetsuo Handa
2016-12-12 15:55                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161208132714.GA26530@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).