All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Oleg Nesterov <oleg@redhat.com>, Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
Date: Wed, 26 Jul 2017 18:39:28 +0200	[thread overview]
Message-ID: <20170726163928.GB29716@redhat.com> (raw)
In-Reply-To: <20170726054533.GA960@dhcp22.suse.cz>

On Wed, Jul 26, 2017 at 07:45:33AM +0200, Michal Hocko wrote:
> Yes, exit_aio is the only blocking call I know of currently. But I would
> like this to be as robust as possible and so I do not want to rely on
> the current implementation. This can change in future and I can
> guarantee that nobody will think about the oom path when adding
> something to the final __mmput path.

I think ksm_exit may block too waiting for allocations, the generic
idea is those calls before exit_mmap can cause a problem yes.

> > exit_mmap would have no issue, if there was enough time in the
> > lifetime CPU to allocate the memory, sure the memory will also be
> > freed in finite amount of time by exit_mmap.
> 
> I am not sure I understand. Say that any call prior to unmap_vmas blocks
> on a lock which is held by another call path which cannot proceed with
> the allocation...

What I meant was, if three was no prior call to exit_mmap->unmap_vmas.

> I really do not want to rely on any timing. This just too fragile. Once
> we have killed a task then we shouldn't pick another victim until it
> passed exit_mmap or the oom_reaper did its job. Otherwise we just risk
> false positives while we have already disrupted the workload.

On smaller systems lack or parallelism in OOM killing surely isn't a
problem.

> This will work more or less the same to what we have currently.
> 
> [victim]		[oom reaper]				[oom killer]
> do_exit			__oom_reap_task_mm
>   mmput
>     __mmput
> 			  mmget_not_zero
> 			    test_and_set_bit(MMF_OOM_SKIP)
> 			    					oom_evaluate_task
> 								   # select next victim 
> 			  # reap the mm
>       unmap_vmas
>
> so we can select a next victim while the current one is still not
> completely torn down.

How does oom_evaluate_task possibly run at the same time of
test_and_set_bit in __oom_reap_task_mm considering both are running
under the oom_lock? It's hard to see how what you describe above could
materialize as second and third column cannot run in parallel because
of the oom_lock.

I don't think there was any issue, but then you pointed out the
locking on signal->oom_mm that is protected by the task_lock vs
current->mm NULL check, so I can replace in my patch the
test_and_set_bit with set_bit on one side and the oom_mm task_lock
protected locking on the other side. This way I can put back a set_bit
in the __mmput fast path (instead of test_and_set_bit) and it's even
more efficient. With such a change, I'll also stop depending on the
oom_lock to prevent second and third column to run in parallel.

I still didn't remove the oom_lock outright that seems orthogonal
change unrelated to this issue but now you could remove it as far as
the above is concerned.

> I hope 3f70dc38cec2 ("mm: make sure that kthreads will not refault oom
> reaped memory") will clarify this code. If not please start a new thread
> so that we do not conflate different things together.

I'll look into that, thanks.
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Oleg Nesterov <oleg@redhat.com>, Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
Date: Wed, 26 Jul 2017 18:39:28 +0200	[thread overview]
Message-ID: <20170726163928.GB29716@redhat.com> (raw)
In-Reply-To: <20170726054533.GA960@dhcp22.suse.cz>

On Wed, Jul 26, 2017 at 07:45:33AM +0200, Michal Hocko wrote:
> Yes, exit_aio is the only blocking call I know of currently. But I would
> like this to be as robust as possible and so I do not want to rely on
> the current implementation. This can change in future and I can
> guarantee that nobody will think about the oom path when adding
> something to the final __mmput path.

I think ksm_exit may block too waiting for allocations, the generic
idea is those calls before exit_mmap can cause a problem yes.

> > exit_mmap would have no issue, if there was enough time in the
> > lifetime CPU to allocate the memory, sure the memory will also be
> > freed in finite amount of time by exit_mmap.
> 
> I am not sure I understand. Say that any call prior to unmap_vmas blocks
> on a lock which is held by another call path which cannot proceed with
> the allocation...

What I meant was, if three was no prior call to exit_mmap->unmap_vmas.

> I really do not want to rely on any timing. This just too fragile. Once
> we have killed a task then we shouldn't pick another victim until it
> passed exit_mmap or the oom_reaper did its job. Otherwise we just risk
> false positives while we have already disrupted the workload.

On smaller systems lack or parallelism in OOM killing surely isn't a
problem.

> This will work more or less the same to what we have currently.
> 
> [victim]		[oom reaper]				[oom killer]
> do_exit			__oom_reap_task_mm
>   mmput
>     __mmput
> 			  mmget_not_zero
> 			    test_and_set_bit(MMF_OOM_SKIP)
> 			    					oom_evaluate_task
> 								   # select next victim 
> 			  # reap the mm
>       unmap_vmas
>
> so we can select a next victim while the current one is still not
> completely torn down.

How does oom_evaluate_task possibly run at the same time of
test_and_set_bit in __oom_reap_task_mm considering both are running
under the oom_lock? It's hard to see how what you describe above could
materialize as second and third column cannot run in parallel because
of the oom_lock.

I don't think there was any issue, but then you pointed out the
locking on signal->oom_mm that is protected by the task_lock vs
current->mm NULL check, so I can replace in my patch the
test_and_set_bit with set_bit on one side and the oom_mm task_lock
protected locking on the other side. This way I can put back a set_bit
in the __mmput fast path (instead of test_and_set_bit) and it's even
more efficient. With such a change, I'll also stop depending on the
oom_lock to prevent second and third column to run in parallel.

I still didn't remove the oom_lock outright that seems orthogonal
change unrelated to this issue but now you could remove it as far as
the above is concerned.

> I hope 3f70dc38cec2 ("mm: make sure that kthreads will not refault oom
> reaped memory") will clarify this code. If not please start a new thread
> so that we do not conflate different things together.

I'll look into that, thanks.
Andrea

  reply	other threads:[~2017-07-26 16:39 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24  7:23 [PATCH] mm, oom: allow oom reaper to race with exit_mmap Michal Hocko
2017-07-24  7:23 ` Michal Hocko
2017-07-24 14:00 ` Kirill A. Shutemov
2017-07-24 14:00   ` Kirill A. Shutemov
2017-07-24 14:15   ` Michal Hocko
2017-07-24 14:15     ` Michal Hocko
2017-07-24 14:51     ` Kirill A. Shutemov
2017-07-24 14:51       ` Kirill A. Shutemov
2017-07-24 16:11       ` Michal Hocko
2017-07-24 16:11         ` Michal Hocko
2017-07-25 14:17         ` Kirill A. Shutemov
2017-07-25 14:17           ` Kirill A. Shutemov
2017-07-25 14:26           ` Michal Hocko
2017-07-25 14:26             ` Michal Hocko
2017-07-25 15:07             ` Kirill A. Shutemov
2017-07-25 15:07               ` Kirill A. Shutemov
2017-07-25 15:15               ` Michal Hocko
2017-07-25 15:15                 ` Michal Hocko
2017-07-25 14:26         ` Michal Hocko
2017-07-25 15:17           ` Kirill A. Shutemov
2017-07-25 15:17             ` Kirill A. Shutemov
2017-07-25 15:23             ` Michal Hocko
2017-07-25 15:23               ` Michal Hocko
2017-07-25 15:31               ` Kirill A. Shutemov
2017-07-25 15:31                 ` Kirill A. Shutemov
2017-07-25 16:04                 ` Michal Hocko
2017-07-25 16:04                   ` Michal Hocko
2017-07-25 19:19                   ` Andrea Arcangeli
2017-07-25 19:19                     ` Andrea Arcangeli
2017-07-26  5:45                     ` Michal Hocko
2017-07-26  5:45                       ` Michal Hocko
2017-07-26 16:29                       ` Andrea Arcangeli
2017-07-26 16:29                         ` Andrea Arcangeli
2017-07-26 16:43                         ` Andrea Arcangeli
2017-07-26 16:43                           ` Andrea Arcangeli
2017-07-27  6:50                         ` Michal Hocko
2017-07-27  6:50                           ` Michal Hocko
2017-07-27 14:55                           ` Andrea Arcangeli
2017-07-27 14:55                             ` Andrea Arcangeli
2017-07-28  6:23                             ` Michal Hocko
2017-07-28  6:23                               ` Michal Hocko
2017-07-28  1:58                         ` [PATCH 1/1] mm: oom: let oom_reap_task and exit_mmap to run kbuild test robot
2017-08-15  0:20                         ` [PATCH] mm, oom: allow oom reaper to race with exit_mmap David Rientjes
2017-08-15  0:20                           ` David Rientjes
2017-07-24 15:27 ` Michal Hocko
2017-07-24 15:27   ` Michal Hocko
2017-07-24 16:42 ` kbuild test robot
2017-07-24 18:12   ` Michal Hocko
2017-07-24 18:12     ` Michal Hocko
2017-07-25 15:26 ` Andrea Arcangeli
2017-07-25 15:26   ` Andrea Arcangeli
2017-07-25 15:45   ` Michal Hocko
2017-07-25 15:45     ` Michal Hocko
2017-07-25 18:26     ` Andrea Arcangeli
2017-07-25 18:26       ` Andrea Arcangeli
2017-07-26  5:45       ` Michal Hocko
2017-07-26  5:45         ` Michal Hocko
2017-07-26 16:39         ` Andrea Arcangeli [this message]
2017-07-26 16:39           ` Andrea Arcangeli
2017-07-27  6:32           ` Michal Hocko
2017-07-27  6:32             ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2017-07-27  8:29 Manish Jaggi
2017-07-27  9:24 ` Michal Hocko
2017-08-10  8:16 Michal Hocko
2017-08-10  8:16 ` Michal Hocko
2017-08-10 18:05 ` Andrea Arcangeli
2017-08-10 18:05   ` Andrea Arcangeli
2017-08-10 18:51   ` Michal Hocko
2017-08-10 18:51     ` Michal Hocko
2017-08-10 20:36     ` Michal Hocko
2017-08-10 20:36       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170726163928.GB29716@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.