From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Oleg Nesterov <oleg@redhat.com>, Hugh Dickins <hughd@google.com>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
Date: Tue, 25 Jul 2017 17:26:39 +0200 [thread overview]
Message-ID: <20170725152639.GP29716@redhat.com> (raw)
In-Reply-To: <20170724072332.31903-1-mhocko@kernel.org>
On Mon, Jul 24, 2017 at 09:23:32AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> David has noticed that the oom killer might kill additional tasks while
> the exiting oom victim hasn't terminated yet because the oom_reaper marks
> the curent victim MMF_OOM_SKIP too early when mm->mm_users dropped down
> to 0. The race is as follows
>
> oom_reap_task do_exit
> exit_mm
> __oom_reap_task_mm
> mmput
> __mmput
> mmget_not_zero # fails
> exit_mmap # frees memory
> set_bit(MMF_OOM_SKIP)
>
> The victim is still visible to the OOM killer until it is unhashed.
I think this is a very minor problem, in the worst case you get a
false positive oom kill, and it requires a race condition for it to
happen. I wouldn't add mmap_sem in exit_mmap just for this considering
the mmget_not_zero is already enough to leave exit_mmap alone.
Could you first clarify these points then I'll understand better what
the above is about:
1) if exit_mmap runs for a long time with terabytes of RAM with
mmap_sem held for writing like your patch does, wouldn't then
oom_reap_task_mm fail the same way after a few tries on
down_read_trylock? Despite your patch got applied? Isn't that
simply moving the failure that leads to set_bit(MMF_OOM_SKIP) from
mmget_not_zero to down_read_trylock?
2) why isn't __oom_reap_task_mm returning different retvals in case
mmget_not_zero fails? What is the point to schedule_timeout
and retry MAX_OOM_REAP_RETRIES times if mmget_not_zero caused it to
return null as it can't do anything about such task anymore? Why
are we scheduling those RETRIES times if mm_users is 0?
3) if exit_mmap is freeing lots of memory already, why should there be
another OOM immediately? I thought oom reaper only was needed when
the task on the right column couldn't reach the final mmput to set
mm_users to 0. Why exactly is a problem that MMF_OOM_SKIP gets set
on the mm, if exit_mmap is already guaranteed to be running? Why
isn't the oom reaper happy to just stop in such case and wait it to
complete? exit_mmap doesn't even take the mmap_sem and it's running
in R state, how would it block in a way that requires the OOM
reaper to free memory from another process to complete?
4) how is it safe to overwrite a VM_FAULT_RETRY that returns without
mmap_sem and then the arch code will release the mmap_sem despite
it was already released by handle_mm_fault? Anonymous memory faults
aren't common to return VM_FAULT_RETRY but an userfault
can. Shouldn't there be a block that prevents overwriting if
VM_FAULT_RETRY is set below? (not only VM_FAULT_ERROR)
if (unlikely((current->flags & PF_KTHREAD) && !(ret & VM_FAULT_ERROR)
&& test_bit(MMF_UNSTABLE, &vma->vm_mm->flags)))
ret = VM_FAULT_SIGBUS;
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Oleg Nesterov <oleg@redhat.com>, Hugh Dickins <hughd@google.com>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
Date: Tue, 25 Jul 2017 17:26:39 +0200 [thread overview]
Message-ID: <20170725152639.GP29716@redhat.com> (raw)
In-Reply-To: <20170724072332.31903-1-mhocko@kernel.org>
On Mon, Jul 24, 2017 at 09:23:32AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> David has noticed that the oom killer might kill additional tasks while
> the exiting oom victim hasn't terminated yet because the oom_reaper marks
> the curent victim MMF_OOM_SKIP too early when mm->mm_users dropped down
> to 0. The race is as follows
>
> oom_reap_task do_exit
> exit_mm
> __oom_reap_task_mm
> mmput
> __mmput
> mmget_not_zero # fails
> exit_mmap # frees memory
> set_bit(MMF_OOM_SKIP)
>
> The victim is still visible to the OOM killer until it is unhashed.
I think this is a very minor problem, in the worst case you get a
false positive oom kill, and it requires a race condition for it to
happen. I wouldn't add mmap_sem in exit_mmap just for this considering
the mmget_not_zero is already enough to leave exit_mmap alone.
Could you first clarify these points then I'll understand better what
the above is about:
1) if exit_mmap runs for a long time with terabytes of RAM with
mmap_sem held for writing like your patch does, wouldn't then
oom_reap_task_mm fail the same way after a few tries on
down_read_trylock? Despite your patch got applied? Isn't that
simply moving the failure that leads to set_bit(MMF_OOM_SKIP) from
mmget_not_zero to down_read_trylock?
2) why isn't __oom_reap_task_mm returning different retvals in case
mmget_not_zero fails? What is the point to schedule_timeout
and retry MAX_OOM_REAP_RETRIES times if mmget_not_zero caused it to
return null as it can't do anything about such task anymore? Why
are we scheduling those RETRIES times if mm_users is 0?
3) if exit_mmap is freeing lots of memory already, why should there be
another OOM immediately? I thought oom reaper only was needed when
the task on the right column couldn't reach the final mmput to set
mm_users to 0. Why exactly is a problem that MMF_OOM_SKIP gets set
on the mm, if exit_mmap is already guaranteed to be running? Why
isn't the oom reaper happy to just stop in such case and wait it to
complete? exit_mmap doesn't even take the mmap_sem and it's running
in R state, how would it block in a way that requires the OOM
reaper to free memory from another process to complete?
4) how is it safe to overwrite a VM_FAULT_RETRY that returns without
mmap_sem and then the arch code will release the mmap_sem despite
it was already released by handle_mm_fault? Anonymous memory faults
aren't common to return VM_FAULT_RETRY but an userfault
can. Shouldn't there be a block that prevents overwriting if
VM_FAULT_RETRY is set below? (not only VM_FAULT_ERROR)
if (unlikely((current->flags & PF_KTHREAD) && !(ret & VM_FAULT_ERROR)
&& test_bit(MMF_UNSTABLE, &vma->vm_mm->flags)))
ret = VM_FAULT_SIGBUS;
Thanks,
Andrea
next prev parent reply other threads:[~2017-07-25 15:26 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-24 7:23 [PATCH] mm, oom: allow oom reaper to race with exit_mmap Michal Hocko
2017-07-24 7:23 ` Michal Hocko
2017-07-24 14:00 ` Kirill A. Shutemov
2017-07-24 14:00 ` Kirill A. Shutemov
2017-07-24 14:15 ` Michal Hocko
2017-07-24 14:15 ` Michal Hocko
2017-07-24 14:51 ` Kirill A. Shutemov
2017-07-24 14:51 ` Kirill A. Shutemov
2017-07-24 16:11 ` Michal Hocko
2017-07-24 16:11 ` Michal Hocko
2017-07-25 14:17 ` Kirill A. Shutemov
2017-07-25 14:17 ` Kirill A. Shutemov
2017-07-25 14:26 ` Michal Hocko
2017-07-25 14:26 ` Michal Hocko
2017-07-25 15:07 ` Kirill A. Shutemov
2017-07-25 15:07 ` Kirill A. Shutemov
2017-07-25 15:15 ` Michal Hocko
2017-07-25 15:15 ` Michal Hocko
2017-07-25 14:26 ` Michal Hocko
2017-07-25 15:17 ` Kirill A. Shutemov
2017-07-25 15:17 ` Kirill A. Shutemov
2017-07-25 15:23 ` Michal Hocko
2017-07-25 15:23 ` Michal Hocko
2017-07-25 15:31 ` Kirill A. Shutemov
2017-07-25 15:31 ` Kirill A. Shutemov
2017-07-25 16:04 ` Michal Hocko
2017-07-25 16:04 ` Michal Hocko
2017-07-25 19:19 ` Andrea Arcangeli
2017-07-25 19:19 ` Andrea Arcangeli
2017-07-26 5:45 ` Michal Hocko
2017-07-26 5:45 ` Michal Hocko
2017-07-26 16:29 ` Andrea Arcangeli
2017-07-26 16:29 ` Andrea Arcangeli
2017-07-26 16:43 ` Andrea Arcangeli
2017-07-26 16:43 ` Andrea Arcangeli
2017-07-27 6:50 ` Michal Hocko
2017-07-27 6:50 ` Michal Hocko
2017-07-27 14:55 ` Andrea Arcangeli
2017-07-27 14:55 ` Andrea Arcangeli
2017-07-28 6:23 ` Michal Hocko
2017-07-28 6:23 ` Michal Hocko
2017-07-28 1:58 ` [PATCH 1/1] mm: oom: let oom_reap_task and exit_mmap to run kbuild test robot
2017-08-15 0:20 ` [PATCH] mm, oom: allow oom reaper to race with exit_mmap David Rientjes
2017-08-15 0:20 ` David Rientjes
2017-07-24 15:27 ` Michal Hocko
2017-07-24 15:27 ` Michal Hocko
2017-07-24 16:42 ` kbuild test robot
2017-07-24 18:12 ` Michal Hocko
2017-07-24 18:12 ` Michal Hocko
2017-07-25 15:26 ` Andrea Arcangeli [this message]
2017-07-25 15:26 ` Andrea Arcangeli
2017-07-25 15:45 ` Michal Hocko
2017-07-25 15:45 ` Michal Hocko
2017-07-25 18:26 ` Andrea Arcangeli
2017-07-25 18:26 ` Andrea Arcangeli
2017-07-26 5:45 ` Michal Hocko
2017-07-26 5:45 ` Michal Hocko
2017-07-26 16:39 ` Andrea Arcangeli
2017-07-26 16:39 ` Andrea Arcangeli
2017-07-27 6:32 ` Michal Hocko
2017-07-27 6:32 ` Michal Hocko
-- strict thread matches above, loose matches on Subject: below --
2017-07-27 8:29 Manish Jaggi
2017-07-27 9:24 ` Michal Hocko
2017-08-10 8:16 Michal Hocko
2017-08-10 8:16 ` Michal Hocko
2017-08-10 18:05 ` Andrea Arcangeli
2017-08-10 18:05 ` Andrea Arcangeli
2017-08-10 18:51 ` Michal Hocko
2017-08-10 18:51 ` Michal Hocko
2017-08-10 20:36 ` Michal Hocko
2017-08-10 20:36 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170725152639.GP29716@redhat.com \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.