All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org, mhocko@kernel.org, vdavydov.dev@gmail.com,
	hannes@cmpxchg.org, tj@kernel.org, kernel-team@fb.com,
	cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection
Date: Thu, 29 Jun 2017 14:47:48 -0400	[thread overview]
Message-ID: <20170629184748.GB27714@castle> (raw)
In-Reply-To: <201706230652.FDH69263.OtOLFSFMHFOQJV@I-love.SAKURA.ne.jp>

On Fri, Jun 23, 2017 at 06:52:20AM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Roman Gushchin wrote:
> > > On Thu, Jun 22, 2017 at 09:40:28AM +0900, Tetsuo Handa wrote:
> > > > Roman Gushchin wrote:
> > > > > --- a/mm/oom_kill.c
> > > > > +++ b/mm/oom_kill.c
> > > > > @@ -992,6 +992,13 @@ bool out_of_memory(struct oom_control *oc)
> > > > >  	if (oom_killer_disabled)
> > > > >  		return false;
> > > > >  
> > > > > +	/*
> > > > > +	 * If there are oom victims in flight, we don't need to select
> > > > > +	 * a new victim.
> > > > > +	 */
> > > > > +	if (atomic_read(&oom_victims) > 0)
> > > > > +		return true;
> > > > > +
> > > > >  	if (!is_memcg_oom(oc)) {
> > > > >  		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> > > > >  		if (freed > 0)
> > > > 
> > > > The OOM reaper is not available for CONFIG_MMU=n kernels, and timeout based
> > > > giveup is not permitted, but a multithreaded process might be selected as
> > > > an OOM victim. Not setting TIF_MEMDIE to all threads sharing an OOM victim's
> > > > mm increases possibility of preventing some OOM victim thread from terminating
> > > > (e.g. one of them cannot leave __alloc_pages_slowpath() with mmap_sem held for
> > > > write due to waiting for the TIF_MEMDIE thread to call exit_oom_victim() when
> > > > the TIF_MEMDIE thread is waiting for the thread with mmap_sem held for write).
> > > 
> > > I agree, that CONFIG_MMU=n is a special case, and the proposed approach can't
> > > be used directly. But can you, please, why do you find the first  chunk wrong?
> > 
> > Since you are checking oom_victims before checking task_will_free_mem(current),
> > only one thread can get TIF_MEMDIE. This is where a multithreaded OOM victim without
> > the OOM reaper can get stuck forever.
> 
> Oops, I misinterpreted. This is where a multithreaded OOM victim with or without
> the OOM reaper can get stuck forever. Think about a process with two threads is
> selected by the OOM killer and only one of these two threads can get TIF_MEMDIE.
> 
>   Thread-1                 Thread-2                 The OOM killer           The OOM reaper
> 
>                            Calls down_write(&current->mm->mmap_sem).
>   Enters __alloc_pages_slowpath().
>                            Enters __alloc_pages_slowpath().
>   Takes oom_lock.
>   Calls out_of_memory().
>                                                     Selects Thread-1 as an OOM victim.
>   Gets SIGKILL.            Gets SIGKILL.
>   Gets TIF_MEMDIE.
>   Releases oom_lock.
>   Leaves __alloc_pages_slowpath() because Thread-1 has TIF_MEMDIE.
>                                                                              Takes oom_lock.
>                                                                              Will do nothing because down_read_trylock() fails.
>                                                                              Releases oom_lock.
>                                                                              Gives up and sets MMF_OOM_SKIP after one second.
>                            Takes oom_lock.
>                            Calls out_of_memory().
>                            Will not check MMF_OOM_SKIP because Thread-1 still has TIF_MEMDIE. // <= get stuck waiting for Thread-1.
>                            Releases oom_lock.
>                            Will not leave __alloc_pages_slowpath() because Thread-2 does not have TIF_MEMDIE.
>                            Will not call up_write(&current->mm->mmap_sem).
>   Reaches do_exit().
>   Calls down_read(&current->mm->mmap_sem) in exit_mm() in do_exit(). // <= get stuck waiting for Thread-2.
>   Will not call up_read(&current->mm->mmap_sem) in exit_mm() in do_exit().
>   Will not clear TIF_MEMDIE in exit_oom_victim() in exit_mm() in do_exit().

That's interesting... Does it mean, that we have to give an access to the reserves
to all threads to guarantee the forward progress?

What do you think about Michal's approach? He posted a link in the thread.

Thank you!

Roman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Roman Gushchin <guro@fb.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: <linux-mm@kvack.org>, <mhocko@kernel.org>,
	<vdavydov.dev@gmail.com>, <hannes@cmpxchg.org>, <tj@kernel.org>,
	<kernel-team@fb.com>, <cgroups@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection
Date: Thu, 29 Jun 2017 14:47:48 -0400	[thread overview]
Message-ID: <20170629184748.GB27714@castle> (raw)
In-Reply-To: <201706230652.FDH69263.OtOLFSFMHFOQJV@I-love.SAKURA.ne.jp>

On Fri, Jun 23, 2017 at 06:52:20AM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Roman Gushchin wrote:
> > > On Thu, Jun 22, 2017 at 09:40:28AM +0900, Tetsuo Handa wrote:
> > > > Roman Gushchin wrote:
> > > > > --- a/mm/oom_kill.c
> > > > > +++ b/mm/oom_kill.c
> > > > > @@ -992,6 +992,13 @@ bool out_of_memory(struct oom_control *oc)
> > > > >  	if (oom_killer_disabled)
> > > > >  		return false;
> > > > >  
> > > > > +	/*
> > > > > +	 * If there are oom victims in flight, we don't need to select
> > > > > +	 * a new victim.
> > > > > +	 */
> > > > > +	if (atomic_read(&oom_victims) > 0)
> > > > > +		return true;
> > > > > +
> > > > >  	if (!is_memcg_oom(oc)) {
> > > > >  		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> > > > >  		if (freed > 0)
> > > > 
> > > > The OOM reaper is not available for CONFIG_MMU=n kernels, and timeout based
> > > > giveup is not permitted, but a multithreaded process might be selected as
> > > > an OOM victim. Not setting TIF_MEMDIE to all threads sharing an OOM victim's
> > > > mm increases possibility of preventing some OOM victim thread from terminating
> > > > (e.g. one of them cannot leave __alloc_pages_slowpath() with mmap_sem held for
> > > > write due to waiting for the TIF_MEMDIE thread to call exit_oom_victim() when
> > > > the TIF_MEMDIE thread is waiting for the thread with mmap_sem held for write).
> > > 
> > > I agree, that CONFIG_MMU=n is a special case, and the proposed approach can't
> > > be used directly. But can you, please, why do you find the first  chunk wrong?
> > 
> > Since you are checking oom_victims before checking task_will_free_mem(current),
> > only one thread can get TIF_MEMDIE. This is where a multithreaded OOM victim without
> > the OOM reaper can get stuck forever.
> 
> Oops, I misinterpreted. This is where a multithreaded OOM victim with or without
> the OOM reaper can get stuck forever. Think about a process with two threads is
> selected by the OOM killer and only one of these two threads can get TIF_MEMDIE.
> 
>   Thread-1                 Thread-2                 The OOM killer           The OOM reaper
> 
>                            Calls down_write(&current->mm->mmap_sem).
>   Enters __alloc_pages_slowpath().
>                            Enters __alloc_pages_slowpath().
>   Takes oom_lock.
>   Calls out_of_memory().
>                                                     Selects Thread-1 as an OOM victim.
>   Gets SIGKILL.            Gets SIGKILL.
>   Gets TIF_MEMDIE.
>   Releases oom_lock.
>   Leaves __alloc_pages_slowpath() because Thread-1 has TIF_MEMDIE.
>                                                                              Takes oom_lock.
>                                                                              Will do nothing because down_read_trylock() fails.
>                                                                              Releases oom_lock.
>                                                                              Gives up and sets MMF_OOM_SKIP after one second.
>                            Takes oom_lock.
>                            Calls out_of_memory().
>                            Will not check MMF_OOM_SKIP because Thread-1 still has TIF_MEMDIE. // <= get stuck waiting for Thread-1.
>                            Releases oom_lock.
>                            Will not leave __alloc_pages_slowpath() because Thread-2 does not have TIF_MEMDIE.
>                            Will not call up_write(&current->mm->mmap_sem).
>   Reaches do_exit().
>   Calls down_read(&current->mm->mmap_sem) in exit_mm() in do_exit(). // <= get stuck waiting for Thread-2.
>   Will not call up_read(&current->mm->mmap_sem) in exit_mm() in do_exit().
>   Will not clear TIF_MEMDIE in exit_oom_victim() in exit_mm() in do_exit().

That's interesting... Does it mean, that we have to give an access to the reserves
to all threads to guarantee the forward progress?

What do you think about Michal's approach? He posted a link in the thread.

Thank you!

Roman

  reply	other threads:[~2017-06-29 18:47 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-21 21:19 [v3 0/6] cgroup-aware OOM killer Roman Gushchin
2017-06-21 21:19 ` [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection Roman Gushchin
2017-06-21 21:19   ` Roman Gushchin
     [not found]   ` <201706220040.v5M0eSnK074332@www262.sakura.ne.jp>
     [not found]     ` <201706220040.v5M0eSnK074332-etx+eQDEXHD7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>
2017-06-22 16:58       ` Roman Gushchin
2017-06-22 16:58         ` Roman Gushchin
2017-06-22 16:58         ` Roman Gushchin
2017-06-22 20:37         ` Tetsuo Handa
     [not found]           ` <201706230537.IDB21366.SQHJVFOOFOMFLt-JPay3/Yim36HaxMnTkn67Xf5DAMn2ifp@public.gmane.org>
2017-06-22 21:52             ` Tetsuo Handa
2017-06-22 21:52               ` Tetsuo Handa
2017-06-29 18:47               ` Roman Gushchin [this message]
2017-06-29 18:47                 ` Roman Gushchin
2017-06-29 20:13                 ` Tetsuo Handa
2017-06-29 20:13                   ` Tetsuo Handa
     [not found]   ` <1498079956-24467-2-git-send-email-guro-b10kYP2dOMg@public.gmane.org>
2017-06-29  9:04     ` Michal Hocko
2017-06-29  9:04       ` Michal Hocko
2017-06-29  9:04       ` Michal Hocko
2017-06-21 21:19 ` [v3 2/6] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-06-21 21:19   ` Roman Gushchin
2017-07-10 23:05   ` David Rientjes
2017-07-10 23:05     ` David Rientjes
2017-07-11 12:51     ` Roman Gushchin
2017-07-11 12:51       ` Roman Gushchin
2017-07-11 12:51       ` Roman Gushchin
2017-07-11 20:56       ` David Rientjes
2017-07-11 20:56         ` David Rientjes
     [not found]         ` <alpine.DEB.2.10.1707111342190.60183-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
2017-07-12 12:11           ` Roman Gushchin
2017-07-12 12:11             ` Roman Gushchin
2017-07-12 12:11             ` Roman Gushchin
2017-07-12 20:26             ` David Rientjes
2017-07-12 20:26               ` David Rientjes
2017-06-21 21:19 ` [v3 3/6] mm, oom: cgroup-aware OOM killer debug info Roman Gushchin
2017-06-21 21:19   ` Roman Gushchin
2017-06-21 21:19 ` [v3 4/6] mm, oom: introduce oom_score_adj for memory cgroups Roman Gushchin
2017-06-21 21:19   ` Roman Gushchin
2017-06-21 21:19 ` [v3 5/6] mm, oom: don't mark all oom victims tasks with TIF_MEMDIE Roman Gushchin
2017-06-21 21:19   ` Roman Gushchin
2017-06-29  8:53   ` Michal Hocko
2017-06-29  8:53     ` Michal Hocko
2017-06-29 18:45     ` Roman Gushchin
2017-06-29 18:45       ` Roman Gushchin
2017-06-30  8:25       ` Michal Hocko
2017-06-30  8:25         ` Michal Hocko
2017-06-21 21:19 ` [v3 6/6] mm,oom,docs: describe the cgroup-aware OOM killer Roman Gushchin
2017-06-21 21:19   ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170629184748.GB27714@castle \
    --to=guro@fb.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.