From: Oleg Nesterov <oleg@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
anfei <anfei.zhou@gmail.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
nishimura@mxp.nes.nec.co.jp,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Mel Gorman <mel@csn.ul.ie>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] oom: give current access to memory reserves if it has been killed
Date: Thu, 1 Apr 2010 00:50:11 +0200 [thread overview]
Message-ID: <20100331224904.GA4025@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1003311342410.25284@chino.kir.corp.google.com>
On 03/31, David Rientjes wrote:
>
> On Wed, 31 Mar 2010, Oleg Nesterov wrote:
>
> > On 03/30, David Rientjes wrote:
> > >
> > > On Tue, 30 Mar 2010, Oleg Nesterov wrote:
> > >
> > > > Note that __oom_kill_task() does force_sig(SIGKILL) which assumes that
> > > > ->sighand != NULL. This is not true if out_of_memory() is called after
> > > > current has already passed exit_notify().
> > >
> > > We have an even bigger problem if current is in the oom killer at
> > > exit_notify() since it has already detached its ->mm in exit_mm() :)
> >
> > Can't understand... I thought that in theory even kmalloc(1) can trigger
> > oom.
>
> __oom_kill_task() cannot be called on a task without an ->mm.
Why? You ignored this part:
Say, right after exit_mm() we are doing acct_process(), and f_op->write()
needs a page. So, you are saying that in this case __page_cache_alloc()
can never trigger out_of_memory() ?
why this is not possible?
David, I am not arguing, I am asking.
> > > The check for !p->mm was moved in the -mm tree (and the oom killer was
> > > entirely rewritten in that tree, so I encourage you to work off of it
> > > instead
> >
> > OK, but I guess this !p->mm check is still wrong for the same reason.
> > In fact I do not understand why it is needed in select_bad_process()
> > right before oom_badness() which checks ->mm too (and this check is
> > equally wrong).
>
> It prevents kthreads from being killed.
No it doesn't, see use_mm(). See also another email I sent.
> > > so if the oom killer finds an already exiting task,
> > > it will become a no-op since it should eventually free memory and avoids a
> > > needless oom kill.
> >
> > No, afaics, And this reminds that I already complained about this
> > PF_EXITING check.
> >
> > Once again, p is the group leader. It can be dead (no ->mm, PF_EXITING
> > is set) but it can have sub-threads. This means, unless I missed something,
> > any user can trivially disable select_bad_process() forever.
> >
>
> The task is in the process of exiting and will do so if its not current,
> otherwise it will get access to memory reserves since we're obviously oom
> in the exit path. Thus, we'll be freeing that memory soon or recalling
> the oom killer to kill additional tasks once those children have been
> reparented (or one of its children was sacrificed).
Just can't understand.
OK, a bad user does
int sleep_forever(void *)
{
pause();
}
int main(void)
{
pthread_create(sleep_forever);
syscall(__NR_exit);
}
Now, every time select_bad_process() is called it will find this process
and PF_EXITING is true, so it just returns ERR_PTR(-1UL). And note that
this process is not going to exit.
> > Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children).
> > Again, this is not right even if we forget about !child->mm check.
> > This list_for_each_entry() can only see the processes forked by the
> > main thread.
> >
>
> That's the intention.
Why? shouldn't oom_badness() return the same result for any thread
in thread group? We should take all childs into account.
> > Likewise, oom_kill_process()->list_for_each_entry() is not right too.
> >
>
> Why?
>
> > Hmm. Why oom_forkbomb_penalty() does thread_group_cputime() under
> > task_lock() ? It seems, ->alloc_lock() is only needed for get_mm_rss().
> >
>
> Right, but we need to ensure that the check for !child->mm || child->mm ==
> tsk->mm fails before adding in get_mm_rss(child->mm). It can race and
> detach its mm prior to the dereference.
Oh, yes sure, I mentioned get_mm_rss() above.
> It would be possible to move the
> thread_group_cputime() out of this critical section,
Yes, this is what I meant.
> but I felt it was
> better to do filter all tasks with child->mm == tsk->mm first before
> unnecessarily finding the cputime for them.
Yes, but we can check child->mm == tsk->mm, call get_mm_counter() and drop
task_lock().
Oleg.
WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
anfei <anfei.zhou@gmail.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
nishimura@mxp.nes.nec.co.jp,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Mel Gorman <mel@csn.ul.ie>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] oom: give current access to memory reserves if it has been killed
Date: Thu, 1 Apr 2010 00:50:11 +0200 [thread overview]
Message-ID: <20100331224904.GA4025@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1003311342410.25284@chino.kir.corp.google.com>
On 03/31, David Rientjes wrote:
>
> On Wed, 31 Mar 2010, Oleg Nesterov wrote:
>
> > On 03/30, David Rientjes wrote:
> > >
> > > On Tue, 30 Mar 2010, Oleg Nesterov wrote:
> > >
> > > > Note that __oom_kill_task() does force_sig(SIGKILL) which assumes that
> > > > ->sighand != NULL. This is not true if out_of_memory() is called after
> > > > current has already passed exit_notify().
> > >
> > > We have an even bigger problem if current is in the oom killer at
> > > exit_notify() since it has already detached its ->mm in exit_mm() :)
> >
> > Can't understand... I thought that in theory even kmalloc(1) can trigger
> > oom.
>
> __oom_kill_task() cannot be called on a task without an ->mm.
Why? You ignored this part:
Say, right after exit_mm() we are doing acct_process(), and f_op->write()
needs a page. So, you are saying that in this case __page_cache_alloc()
can never trigger out_of_memory() ?
why this is not possible?
David, I am not arguing, I am asking.
> > > The check for !p->mm was moved in the -mm tree (and the oom killer was
> > > entirely rewritten in that tree, so I encourage you to work off of it
> > > instead
> >
> > OK, but I guess this !p->mm check is still wrong for the same reason.
> > In fact I do not understand why it is needed in select_bad_process()
> > right before oom_badness() which checks ->mm too (and this check is
> > equally wrong).
>
> It prevents kthreads from being killed.
No it doesn't, see use_mm(). See also another email I sent.
> > > so if the oom killer finds an already exiting task,
> > > it will become a no-op since it should eventually free memory and avoids a
> > > needless oom kill.
> >
> > No, afaics, And this reminds that I already complained about this
> > PF_EXITING check.
> >
> > Once again, p is the group leader. It can be dead (no ->mm, PF_EXITING
> > is set) but it can have sub-threads. This means, unless I missed something,
> > any user can trivially disable select_bad_process() forever.
> >
>
> The task is in the process of exiting and will do so if its not current,
> otherwise it will get access to memory reserves since we're obviously oom
> in the exit path. Thus, we'll be freeing that memory soon or recalling
> the oom killer to kill additional tasks once those children have been
> reparented (or one of its children was sacrificed).
Just can't understand.
OK, a bad user does
int sleep_forever(void *)
{
pause();
}
int main(void)
{
pthread_create(sleep_forever);
syscall(__NR_exit);
}
Now, every time select_bad_process() is called it will find this process
and PF_EXITING is true, so it just returns ERR_PTR(-1UL). And note that
this process is not going to exit.
> > Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children).
> > Again, this is not right even if we forget about !child->mm check.
> > This list_for_each_entry() can only see the processes forked by the
> > main thread.
> >
>
> That's the intention.
Why? shouldn't oom_badness() return the same result for any thread
in thread group? We should take all childs into account.
> > Likewise, oom_kill_process()->list_for_each_entry() is not right too.
> >
>
> Why?
>
> > Hmm. Why oom_forkbomb_penalty() does thread_group_cputime() under
> > task_lock() ? It seems, ->alloc_lock() is only needed for get_mm_rss().
> >
>
> Right, but we need to ensure that the check for !child->mm || child->mm ==
> tsk->mm fails before adding in get_mm_rss(child->mm). It can race and
> detach its mm prior to the dereference.
Oh, yes sure, I mentioned get_mm_rss() above.
> It would be possible to move the
> thread_group_cputime() out of this critical section,
Yes, this is what I meant.
> but I felt it was
> better to do filter all tasks with child->mm == tsk->mm first before
> unnecessarily finding the cputime for them.
Yes, but we can check child->mm == tsk->mm, call get_mm_counter() and drop
task_lock().
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-31 22:52 UTC|newest]
Thread overview: 197+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-24 16:25 [PATCH] oom killer: break from infinite loop Anfei Zhou
2010-03-24 16:25 ` Anfei Zhou
2010-03-25 2:51 ` KOSAKI Motohiro
2010-03-25 2:51 ` KOSAKI Motohiro
2010-03-26 22:08 ` Andrew Morton
2010-03-26 22:08 ` Andrew Morton
2010-03-26 22:33 ` Oleg Nesterov
2010-03-26 22:33 ` Oleg Nesterov
2010-03-28 14:55 ` anfei
2010-03-28 14:55 ` anfei
2010-03-28 16:28 ` Oleg Nesterov
2010-03-28 16:28 ` Oleg Nesterov
2010-03-28 21:21 ` David Rientjes
2010-03-28 21:21 ` David Rientjes
2010-03-29 11:21 ` Oleg Nesterov
2010-03-29 11:21 ` Oleg Nesterov
2010-03-29 20:49 ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes
2010-03-29 20:49 ` David Rientjes
2010-03-30 15:46 ` Oleg Nesterov
2010-03-30 15:46 ` Oleg Nesterov
2010-03-30 20:26 ` David Rientjes
2010-03-30 20:26 ` David Rientjes
2010-03-31 17:58 ` Oleg Nesterov
2010-03-31 17:58 ` Oleg Nesterov
2010-03-31 20:47 ` Oleg Nesterov
2010-03-31 20:47 ` Oleg Nesterov
2010-04-01 8:35 ` David Rientjes
2010-04-01 8:35 ` David Rientjes
2010-04-01 8:57 ` [patch -mm] oom: hold tasklist_lock when dumping tasks David Rientjes
2010-04-01 14:27 ` Oleg Nesterov
2010-04-01 19:16 ` David Rientjes
2010-04-01 13:59 ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov
2010-04-01 14:00 ` Oleg Nesterov
2010-04-01 19:12 ` David Rientjes
2010-04-01 19:12 ` David Rientjes
2010-04-02 11:14 ` Oleg Nesterov
2010-04-02 11:14 ` Oleg Nesterov
2010-04-02 18:30 ` [PATCH -mm 0/4] oom: linux has threads Oleg Nesterov
2010-04-02 18:30 ` Oleg Nesterov
2010-04-02 18:31 ` [PATCH -mm 1/4] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads Oleg Nesterov
2010-04-02 18:31 ` Oleg Nesterov
2010-04-02 19:05 ` David Rientjes
2010-04-02 19:05 ` David Rientjes
2010-04-02 18:32 ` [PATCH -mm 2/4] oom: select_bad_process: PF_EXITING check should take ->mm into account Oleg Nesterov
2010-04-02 18:32 ` Oleg Nesterov
2010-04-06 11:42 ` anfei
2010-04-06 11:42 ` anfei
2010-04-06 12:18 ` Oleg Nesterov
2010-04-06 12:18 ` Oleg Nesterov
2010-04-06 13:05 ` anfei
2010-04-06 13:05 ` anfei
2010-04-06 13:38 ` Oleg Nesterov
2010-04-06 13:38 ` Oleg Nesterov
2010-04-02 18:32 ` [PATCH -mm 3/4] oom: introduce find_lock_task_mm() to fix !mm false positives Oleg Nesterov
2010-04-02 18:32 ` Oleg Nesterov
2010-04-02 18:33 ` [PATCH -mm 4/4] oom: oom_forkbomb_penalty: move thread_group_cputime() out of task_lock() Oleg Nesterov
2010-04-02 18:33 ` Oleg Nesterov
2010-04-02 19:04 ` David Rientjes
2010-04-02 19:04 ` David Rientjes
2010-04-05 14:23 ` [PATCH -mm] oom: select_bad_process: never choose tasks with badness == 0 Oleg Nesterov
2010-04-05 14:23 ` Oleg Nesterov
2010-04-02 19:02 ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes
2010-04-02 19:02 ` David Rientjes
2010-04-02 19:14 ` Oleg Nesterov
2010-04-02 19:14 ` Oleg Nesterov
2010-04-02 19:46 ` David Rientjes
2010-04-02 19:46 ` David Rientjes
2010-04-02 19:54 ` [patch -mm] oom: exclude tasks with badness score of 0 from being selected David Rientjes
2010-04-02 19:54 ` David Rientjes
2010-04-02 21:04 ` Oleg Nesterov
2010-04-02 21:04 ` Oleg Nesterov
2010-04-02 21:22 ` [patch -mm v2] " David Rientjes
2010-04-02 21:22 ` David Rientjes
2010-04-02 20:55 ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov
2010-04-02 20:55 ` Oleg Nesterov
2010-03-31 21:07 ` David Rientjes
2010-03-31 21:07 ` David Rientjes
2010-03-31 22:50 ` Oleg Nesterov [this message]
2010-03-31 22:50 ` Oleg Nesterov
2010-03-31 23:30 ` Oleg Nesterov
2010-03-31 23:30 ` Oleg Nesterov
2010-03-31 23:48 ` David Rientjes
2010-03-31 23:48 ` David Rientjes
2010-04-01 14:39 ` Oleg Nesterov
2010-04-01 14:39 ` Oleg Nesterov
2010-04-01 18:58 ` David Rientjes
2010-04-01 18:58 ` David Rientjes
2010-04-01 8:25 ` David Rientjes
2010-04-01 8:25 ` David Rientjes
2010-04-01 15:26 ` Oleg Nesterov
2010-04-01 15:26 ` Oleg Nesterov
2010-04-08 21:08 ` David Rientjes
2010-04-08 21:08 ` David Rientjes
2010-04-09 12:38 ` Oleg Nesterov
2010-04-09 12:38 ` Oleg Nesterov
2010-03-30 16:39 ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call Oleg Nesterov
2010-03-30 16:39 ` Oleg Nesterov
2010-03-30 17:43 ` [PATCH -mm] proc: don't take ->siglock for /proc/pid/oom_adj Oleg Nesterov
2010-03-30 17:43 ` Oleg Nesterov
2010-03-30 20:30 ` David Rientjes
2010-03-30 20:30 ` David Rientjes
2010-03-31 9:17 ` Oleg Nesterov
2010-03-31 9:17 ` Oleg Nesterov
2010-03-31 18:59 ` Oleg Nesterov
2010-03-31 18:59 ` Oleg Nesterov
2010-03-31 21:14 ` David Rientjes
2010-03-31 21:14 ` David Rientjes
2010-03-31 23:00 ` Oleg Nesterov
2010-03-31 23:00 ` Oleg Nesterov
2010-04-01 8:32 ` David Rientjes
2010-04-01 8:32 ` David Rientjes
2010-04-01 15:37 ` Oleg Nesterov
2010-04-01 15:37 ` Oleg Nesterov
2010-04-01 19:04 ` David Rientjes
2010-04-01 19:04 ` David Rientjes
2010-03-30 20:32 ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call David Rientjes
2010-03-30 20:32 ` David Rientjes
2010-03-31 9:16 ` Oleg Nesterov
2010-03-31 9:16 ` Oleg Nesterov
2010-03-31 20:17 ` Oleg Nesterov
2010-03-31 20:17 ` Oleg Nesterov
2010-04-01 7:41 ` David Rientjes
2010-04-01 7:41 ` David Rientjes
2010-04-01 13:13 ` [PATCH 0/1] oom: fix the unsafe usage of badness() in proc_oom_score() Oleg Nesterov
2010-04-01 13:13 ` Oleg Nesterov
2010-04-01 13:13 ` [PATCH 1/1] " Oleg Nesterov
2010-04-01 13:13 ` Oleg Nesterov
2010-04-01 19:03 ` David Rientjes
2010-04-01 19:03 ` David Rientjes
2010-03-29 14:06 ` [PATCH] oom killer: break from infinite loop anfei
2010-03-29 14:06 ` anfei
2010-03-29 20:01 ` David Rientjes
2010-03-29 20:01 ` David Rientjes
2010-03-30 14:29 ` anfei
2010-03-30 14:29 ` anfei
2010-03-30 20:29 ` David Rientjes
2010-03-30 20:29 ` David Rientjes
2010-03-31 0:57 ` KAMEZAWA Hiroyuki
2010-03-31 0:57 ` KAMEZAWA Hiroyuki
2010-03-31 6:07 ` David Rientjes
2010-03-31 6:07 ` David Rientjes
2010-03-31 6:13 ` KAMEZAWA Hiroyuki
2010-03-31 6:13 ` KAMEZAWA Hiroyuki
2010-03-31 6:30 ` Balbir Singh
2010-03-31 6:30 ` Balbir Singh
2010-03-31 6:31 ` KAMEZAWA Hiroyuki
2010-03-31 6:31 ` KAMEZAWA Hiroyuki
2010-03-31 7:04 ` David Rientjes
2010-03-31 7:04 ` David Rientjes
2010-03-31 6:32 ` David Rientjes
2010-03-31 6:32 ` David Rientjes
2010-03-31 7:08 ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found David Rientjes
2010-03-31 7:08 ` KAMEZAWA Hiroyuki
2010-03-31 8:04 ` Balbir Singh
2010-03-31 10:38 ` David Rientjes
2010-04-04 23:28 ` David Rientjes
2010-04-05 21:30 ` Andrew Morton
2010-04-05 22:40 ` David Rientjes
2010-04-05 22:49 ` Andrew Morton
2010-04-05 23:01 ` David Rientjes
2010-04-06 12:08 ` KOSAKI Motohiro
2010-04-06 21:47 ` David Rientjes
2010-04-07 0:20 ` KAMEZAWA Hiroyuki
2010-04-07 13:29 ` KOSAKI Motohiro
2010-04-08 18:05 ` David Rientjes
2010-04-21 19:17 ` Andrew Morton
2010-04-21 22:04 ` David Rientjes
2010-04-22 0:23 ` KAMEZAWA Hiroyuki
2010-04-22 8:34 ` David Rientjes
2010-04-27 22:58 ` [patch -mm] oom: reintroduce and deprecate oom_kill_allocating_task David Rientjes
2010-04-28 0:57 ` KAMEZAWA Hiroyuki
2010-04-22 7:23 ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found Nick Piggin
2010-04-22 7:25 ` KAMEZAWA Hiroyuki
2010-04-22 10:09 ` Nick Piggin
2010-04-22 10:27 ` KAMEZAWA Hiroyuki
2010-04-22 21:11 ` David Rientjes
2010-04-22 10:28 ` David Rientjes
2010-04-22 15:39 ` Nick Piggin
2010-04-22 21:09 ` David Rientjes
2010-05-04 23:55 ` David Rientjes
2010-04-08 17:36 ` David Rientjes
2010-04-02 10:17 ` [PATCH] oom killer: break from infinite loop Mel Gorman
2010-04-02 10:17 ` Mel Gorman
2010-04-04 23:26 ` David Rientjes
2010-04-04 23:26 ` David Rientjes
2010-04-05 10:47 ` Mel Gorman
2010-04-05 10:47 ` Mel Gorman
2010-04-06 22:40 ` David Rientjes
2010-04-06 22:40 ` David Rientjes
2010-03-29 11:31 ` anfei
2010-03-29 11:31 ` anfei
2010-03-29 11:46 ` Oleg Nesterov
2010-03-29 11:46 ` Oleg Nesterov
2010-03-29 12:09 ` anfei
2010-03-29 12:09 ` anfei
2010-03-28 2:46 ` David Rientjes
2010-03-28 2:46 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100331224904.GA4025@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anfei.zhou@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.