From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
David Rientjes <rientjes@google.com>,
Vladimir Davydov <vdavydov@parallels.com>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 6/6] mm, oom: fortify task_will_free_mem
Date: Mon, 30 May 2016 19:35:05 +0200 [thread overview]
Message-ID: <20160530173505.GA25287@redhat.com> (raw)
In-Reply-To: <1464613556-16708-7-git-send-email-mhocko@kernel.org>
On 05/30, Michal Hocko wrote:
>
> task_will_free_mem is rather weak.
I was thinking about the similar change because I noticed that try_oom_reaper()
is very, very wrong.
To the point I think that we need another change for stable which simply removes
spin_lock_irq(sighand->siglock) from try_oom_reaper(). It buys nothing, we can
check signal_group_exit() (which is wrong too ;) lockless, and at the same time
the kernel can crash because we can hit ->siglock == NULL.
So I do think this change is good in general.
I think that task_will_free_mem() should be un-inlined, and __task_will_free_mem()
should go into mm/oom-kill.c... but this is minor.
> -static inline bool task_will_free_mem(struct task_struct *task)
> +static inline bool __task_will_free_mem(struct task_struct *task)
> {
> struct signal_struct *sig = task->signal;
>
> @@ -119,16 +119,69 @@ static inline bool task_will_free_mem(struct task_struct *task)
> if (sig->flags & SIGNAL_GROUP_COREDUMP)
> return false;
>
> - if (!(task->flags & PF_EXITING))
> + if (!(task->flags & PF_EXITING || fatal_signal_pending(task)))
> return false;
>
> /* Make sure that the whole thread group is going down */
> - if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> + if (!thread_group_empty(task) &&
> + !(sig->flags & SIGNAL_GROUP_EXIT || fatal_signal_pending(task)))
> return false;
>
> return true;
> }
Well, let me suggest this again. I think it should do
if (SIGNAL_GROUP_COREDUMP)
return false;
if (SIGNAL_GROUP_EXIT)
return true;
if (thread_group_empty() && PF_EXITING)
return true;
return false;
we do not need fatal_signal_pending(), in this case SIGNAL_GROUP_EXIT should
be set (ignoring some bugs with sub-namespaces which we need to fix anyway).
At the same time, we do not want to return false if PF_EXITING is not set
if SIGNAL_GROUP_EXIT is set.
> +static inline bool task_will_free_mem(struct task_struct *task)
> +{
> + struct mm_struct *mm = NULL;
> + struct task_struct *p;
> + bool ret;
> +
> + /*
> + * If the process has passed exit_mm we have to skip it because
> + * we have lost a link to other tasks sharing this mm, we do not
> + * have anything to reap and the task might then get stuck waiting
> + * for parent as zombie and we do not want it to hold TIF_MEMDIE
> + */
> + p = find_lock_task_mm(task);
> + if (!p)
> + return false;
> +
> + if (!__task_will_free_mem(p)) {
> + task_unlock(p);
> + return false;
> + }
> +
> + mm = p->mm;
> + if (atomic_read(&mm->mm_users) <= 1) {
this is sub-optimal, we should probably take signal->live or ->nr_threads
into account... but OK, we can do this later.
> + rcu_read_lock();
> + for_each_process(p) {
> + ret = __task_will_free_mem(p);
> + if (!ret)
> + break;
> + }
> + rcu_read_unlock();
Yes, I agree very much.
But it seems you forgot to add the process_shares_mm() check into this loop?
and perhaps it also makes sense to add
if (same_thread_group(tsk, p))
continue;
This should not really matter, we know that __task_will_free_mem(p) should return
true. Just to make it more clear.
And. I think this needs smp_rmb() at the end of the loop (assuming we have the
process_shares_mm() check here). We need it to ensure that we read p->mm before
we read next_task(), to avoid the race with exit() + clone(CLONE_VM).
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
David Rientjes <rientjes@google.com>,
Vladimir Davydov <vdavydov@parallels.com>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 6/6] mm, oom: fortify task_will_free_mem
Date: Mon, 30 May 2016 19:35:05 +0200 [thread overview]
Message-ID: <20160530173505.GA25287@redhat.com> (raw)
In-Reply-To: <1464613556-16708-7-git-send-email-mhocko@kernel.org>
On 05/30, Michal Hocko wrote:
>
> task_will_free_mem is rather weak.
I was thinking about the similar change because I noticed that try_oom_reaper()
is very, very wrong.
To the point I think that we need another change for stable which simply removes
spin_lock_irq(sighand->siglock) from try_oom_reaper(). It buys nothing, we can
check signal_group_exit() (which is wrong too ;) lockless, and at the same time
the kernel can crash because we can hit ->siglock == NULL.
So I do think this change is good in general.
I think that task_will_free_mem() should be un-inlined, and __task_will_free_mem()
should go into mm/oom-kill.c... but this is minor.
> -static inline bool task_will_free_mem(struct task_struct *task)
> +static inline bool __task_will_free_mem(struct task_struct *task)
> {
> struct signal_struct *sig = task->signal;
>
> @@ -119,16 +119,69 @@ static inline bool task_will_free_mem(struct task_struct *task)
> if (sig->flags & SIGNAL_GROUP_COREDUMP)
> return false;
>
> - if (!(task->flags & PF_EXITING))
> + if (!(task->flags & PF_EXITING || fatal_signal_pending(task)))
> return false;
>
> /* Make sure that the whole thread group is going down */
> - if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> + if (!thread_group_empty(task) &&
> + !(sig->flags & SIGNAL_GROUP_EXIT || fatal_signal_pending(task)))
> return false;
>
> return true;
> }
Well, let me suggest this again. I think it should do
if (SIGNAL_GROUP_COREDUMP)
return false;
if (SIGNAL_GROUP_EXIT)
return true;
if (thread_group_empty() && PF_EXITING)
return true;
return false;
we do not need fatal_signal_pending(), in this case SIGNAL_GROUP_EXIT should
be set (ignoring some bugs with sub-namespaces which we need to fix anyway).
At the same time, we do not want to return false if PF_EXITING is not set
if SIGNAL_GROUP_EXIT is set.
> +static inline bool task_will_free_mem(struct task_struct *task)
> +{
> + struct mm_struct *mm = NULL;
> + struct task_struct *p;
> + bool ret;
> +
> + /*
> + * If the process has passed exit_mm we have to skip it because
> + * we have lost a link to other tasks sharing this mm, we do not
> + * have anything to reap and the task might then get stuck waiting
> + * for parent as zombie and we do not want it to hold TIF_MEMDIE
> + */
> + p = find_lock_task_mm(task);
> + if (!p)
> + return false;
> +
> + if (!__task_will_free_mem(p)) {
> + task_unlock(p);
> + return false;
> + }
> +
> + mm = p->mm;
> + if (atomic_read(&mm->mm_users) <= 1) {
this is sub-optimal, we should probably take signal->live or ->nr_threads
into account... but OK, we can do this later.
> + rcu_read_lock();
> + for_each_process(p) {
> + ret = __task_will_free_mem(p);
> + if (!ret)
> + break;
> + }
> + rcu_read_unlock();
Yes, I agree very much.
But it seems you forgot to add the process_shares_mm() check into this loop?
and perhaps it also makes sense to add
if (same_thread_group(tsk, p))
continue;
This should not really matter, we know that __task_will_free_mem(p) should return
true. Just to make it more clear.
And. I think this needs smp_rmb() at the end of the loop (assuming we have the
process_shares_mm() check here). We need it to ensure that we read p->mm before
we read next_task(), to avoid the race with exit() + clone(CLONE_VM).
Oleg.
next prev parent reply other threads:[~2016-05-30 17:35 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-30 13:05 [PATCH 0/6 -v2] Handle oom bypass more gracefully Michal Hocko
2016-05-30 13:05 ` Michal Hocko
2016-05-30 13:05 ` [PATCH 1/6] proc, oom: drop bogus task_lock and mm check Michal Hocko
2016-05-30 13:05 ` Michal Hocko
2016-05-30 13:49 ` Vladimir Davydov
2016-05-30 13:49 ` Vladimir Davydov
2016-05-30 17:43 ` Oleg Nesterov
2016-05-30 17:43 ` Oleg Nesterov
2016-05-31 7:32 ` Michal Hocko
2016-05-31 7:32 ` Michal Hocko
2016-05-31 22:53 ` Oleg Nesterov
2016-05-31 22:53 ` Oleg Nesterov
2016-06-01 6:53 ` Michal Hocko
2016-06-01 6:53 ` Michal Hocko
2016-06-01 10:41 ` Tetsuo Handa
2016-06-01 10:41 ` Tetsuo Handa
2016-06-01 10:48 ` Michal Hocko
2016-06-01 10:48 ` Michal Hocko
2016-05-30 13:05 ` [PATCH 2/6] proc, oom_adj: extract oom_score_adj setting into a helper Michal Hocko
2016-05-30 13:05 ` Michal Hocko
2016-05-30 13:05 ` [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-05-30 13:05 ` Michal Hocko
2016-05-31 7:41 ` Michal Hocko
2016-05-31 7:41 ` Michal Hocko
2016-05-30 13:05 ` [PATCH 4/6] mm, oom: skip vforked tasks from being selected Michal Hocko
2016-05-30 13:05 ` Michal Hocko
2016-05-30 19:28 ` Oleg Nesterov
2016-05-30 19:28 ` Oleg Nesterov
2016-05-31 7:42 ` Michal Hocko
2016-05-31 7:42 ` Michal Hocko
2016-05-31 21:43 ` Oleg Nesterov
2016-05-31 21:43 ` Oleg Nesterov
2016-06-01 7:09 ` Michal Hocko
2016-06-01 7:09 ` Michal Hocko
2016-06-01 14:12 ` Tetsuo Handa
2016-06-01 14:25 ` Michal Hocko
2016-06-02 10:45 ` Tetsuo Handa
2016-06-02 11:20 ` Michal Hocko
2016-06-02 11:31 ` Tetsuo Handa
2016-06-02 12:55 ` Michal Hocko
2016-05-30 13:05 ` [PATCH 5/6] mm, oom: kill all tasks sharing the mm Michal Hocko
2016-05-30 13:05 ` Michal Hocko
2016-05-30 18:18 ` Oleg Nesterov
2016-05-30 18:18 ` Oleg Nesterov
2016-05-31 7:43 ` Michal Hocko
2016-05-31 7:43 ` Michal Hocko
2016-05-31 21:48 ` Oleg Nesterov
2016-05-31 21:48 ` Oleg Nesterov
2016-05-30 13:05 ` [PATCH 6/6] mm, oom: fortify task_will_free_mem Michal Hocko
2016-05-30 13:05 ` Michal Hocko
2016-05-30 17:35 ` Oleg Nesterov [this message]
2016-05-30 17:35 ` Oleg Nesterov
2016-05-31 7:46 ` Michal Hocko
2016-05-31 7:46 ` Michal Hocko
2016-05-31 22:29 ` Oleg Nesterov
2016-05-31 22:29 ` Oleg Nesterov
2016-06-01 7:03 ` Michal Hocko
2016-06-01 7:03 ` Michal Hocko
2016-05-31 15:03 ` Tetsuo Handa
2016-05-31 15:10 ` Michal Hocko
2016-05-31 15:29 ` Tetsuo Handa
2016-06-01 7:25 ` Michal Hocko
2016-06-01 12:04 ` Tetsuo Handa
2016-06-01 12:43 ` Michal Hocko
2016-06-02 14:03 ` [PATCH 7/6] mm, oom: task_will_free_mem should skip oom_reaped tasks Michal Hocko
2016-06-02 14:03 ` Michal Hocko
2016-06-02 15:24 ` Tetsuo Handa
2016-06-02 15:50 ` Michal Hocko
-- strict thread matches above, loose matches on Subject: below --
2016-05-26 12:40 [PATCH 0/5] Handle oom bypass more gracefully Michal Hocko
2016-05-26 12:40 ` [PATCH 6/6] mm, oom: fortify task_will_free_mem Michal Hocko
2016-05-26 12:40 ` Michal Hocko
2016-05-26 14:11 ` Tetsuo Handa
2016-05-26 14:23 ` Michal Hocko
2016-05-26 14:41 ` Tetsuo Handa
2016-05-26 14:41 ` Tetsuo Handa
2016-05-26 14:56 ` Michal Hocko
2016-05-26 14:56 ` Michal Hocko
2016-05-27 11:07 ` Michal Hocko
2016-05-27 11:07 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160530173505.GA25287@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.