All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account
Date: Tue, 1 Jun 2010 22:18:43 +0200	[thread overview]
Message-ID: <20100601201843.GA20732@redhat.com> (raw)
In-Reply-To: <20100601093951.2430.A69D9226@jp.fujitsu.com>

On 06/01, KOSAKI Motohiro wrote:
>
> > I'd like to add a note... with or without this, we have problems
> > with the coredump. A thread participating in the coredumping
> > (group-leader in this case) can have PF_EXITING && mm, but this doesn't
> > mean it is going to exit soon, and the dumper can use a lot more memory.
>
> Sure. I think coredump sould do nothing if oom occur.
> So, merely making PF_COREDUMP is bad idea? I mean
>
> task-flags		allocator
> ------------------------------------------------
> none			N/A
> TIF_MEMDIE		allow to use emergency memory.
> 			don't call page reclaim.
> PF_COREDUMP		N/A
> TIF_MEMDIE+PF_COREDUMP	disallow to use emergency memory.
> 			don't call page reclaim.
>
> In other word, coredump path makes allocation failure if the task
> marked as TIF_MEMDIE.

Perhaps... But where should TIF_MEMDIE go this case? Let me clarify.

Two threads, group-leader L and its sub-thread T. T dumps the code.
In this case both threads have ->mm != NULL, L has PF_EXITING.

The first problem is, select_bad_process() always return -1 in this
case (even if the caller is T, this doesn't matter).

The second problem is that we should add TIF_MEMDIE to T, not L.

This is more or less easy. For simplicity, let's suppose we removed
this PF_EXITING check from select_bad_process().

Otoh, if we make do_coredump() interruptible (and we should do this
in any case), then perhaps the TIF_MEMDIE+PF_COREDUMP is not really
needed? Afaics we always send SIGKILL along with TIF_MEMDIE.

> > And, as it was already discussed, we only check the group-leader here.
> > But I can't suggest something better.
>
> I guess signal_group_exit() is enough in practical case.

Unlike SIGNAL_GROUP_EXIT check, signal_group_exit() can also mean
exec. This is probably correct. If we see the task inside de_thread()
he is going to free its old mm soon.

The problem is this check doesn't cover the case when a single-threaded
task exits (even if it does sys_exit_group). And it is not enough to
remove the thread_group_empty-case-optimization from do_group_exit(),
it can call sys_exit() instead.

But anyway I agree, select_bad_process can probably check

	signal_group_exit() || (PF_EXITINF && thread_group_empty())

And in that case it is better to remove the "&& p->mm" part of the
current check.

Oleg.


WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account
Date: Tue, 1 Jun 2010 22:18:43 +0200	[thread overview]
Message-ID: <20100601201843.GA20732@redhat.com> (raw)
In-Reply-To: <20100601093951.2430.A69D9226@jp.fujitsu.com>

On 06/01, KOSAKI Motohiro wrote:
>
> > I'd like to add a note... with or without this, we have problems
> > with the coredump. A thread participating in the coredumping
> > (group-leader in this case) can have PF_EXITING && mm, but this doesn't
> > mean it is going to exit soon, and the dumper can use a lot more memory.
>
> Sure. I think coredump sould do nothing if oom occur.
> So, merely making PF_COREDUMP is bad idea? I mean
>
> task-flags		allocator
> ------------------------------------------------
> none			N/A
> TIF_MEMDIE		allow to use emergency memory.
> 			don't call page reclaim.
> PF_COREDUMP		N/A
> TIF_MEMDIE+PF_COREDUMP	disallow to use emergency memory.
> 			don't call page reclaim.
>
> In other word, coredump path makes allocation failure if the task
> marked as TIF_MEMDIE.

Perhaps... But where should TIF_MEMDIE go this case? Let me clarify.

Two threads, group-leader L and its sub-thread T. T dumps the code.
In this case both threads have ->mm != NULL, L has PF_EXITING.

The first problem is, select_bad_process() always return -1 in this
case (even if the caller is T, this doesn't matter).

The second problem is that we should add TIF_MEMDIE to T, not L.

This is more or less easy. For simplicity, let's suppose we removed
this PF_EXITING check from select_bad_process().

Otoh, if we make do_coredump() interruptible (and we should do this
in any case), then perhaps the TIF_MEMDIE+PF_COREDUMP is not really
needed? Afaics we always send SIGKILL along with TIF_MEMDIE.

> > And, as it was already discussed, we only check the group-leader here.
> > But I can't suggest something better.
>
> I guess signal_group_exit() is enough in practical case.

Unlike SIGNAL_GROUP_EXIT check, signal_group_exit() can also mean
exec. This is probably correct. If we see the task inside de_thread()
he is going to free its old mm soon.

The problem is this check doesn't cover the case when a single-threaded
task exits (even if it does sys_exit_group). And it is not enough to
remove the thread_group_empty-case-optimization from do_group_exit(),
it can call sys_exit() instead.

But anyway I agree, select_bad_process can probably check

	signal_group_exit() || (PF_EXITINF && thread_group_empty())

And in that case it is better to remove the "&& p->mm" part of the
current check.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-06-01 20:20 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-31  9:33 [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KOSAKI Motohiro
2010-05-31  9:33 ` KOSAKI Motohiro
2010-05-31  9:35 ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account KOSAKI Motohiro
2010-05-31  9:35   ` KOSAKI Motohiro
2010-05-31 16:43   ` Oleg Nesterov
2010-05-31 16:43     ` Oleg Nesterov
2010-06-01  1:10     ` KOSAKI Motohiro
2010-06-01  1:10       ` KOSAKI Motohiro
2010-06-01 20:18       ` Oleg Nesterov [this message]
2010-06-01 20:18         ` Oleg Nesterov
2010-06-02 13:54         ` [PATCH] oom: remove PF_EXITING check completely KOSAKI Motohiro
2010-06-02 13:54           ` KOSAKI Motohiro
2010-06-02 15:54           ` Oleg Nesterov
2010-06-02 15:54             ` Oleg Nesterov
2010-06-02 21:02             ` David Rientjes
2010-06-02 21:02               ` David Rientjes
2010-06-03  4:48               ` KOSAKI Motohiro
2010-06-03  4:48                 ` KOSAKI Motohiro
2010-06-03  6:29                 ` David Rientjes
2010-06-03  6:29                   ` David Rientjes
2010-06-02 13:54         ` [PATCH] oom: Make coredump interruptible KOSAKI Motohiro
2010-06-02 13:54           ` KOSAKI Motohiro
2010-06-02 15:42           ` Oleg Nesterov
2010-06-02 15:42             ` Oleg Nesterov
2010-06-02 17:29             ` Roland McGrath
2010-06-02 17:29               ` Roland McGrath
2010-06-02 17:53               ` Oleg Nesterov
2010-06-02 17:53                 ` Oleg Nesterov
2010-06-02 18:58                 ` Roland McGrath
2010-06-02 18:58                   ` Roland McGrath
2010-06-02 20:38                   ` Oleg Nesterov
2010-06-02 20:38                     ` Oleg Nesterov
2010-06-03 14:03                     ` Oleg Nesterov
2010-06-03 14:03                       ` Oleg Nesterov
2010-06-04 10:54                     ` KOSAKI Motohiro
2010-06-04 10:54                       ` KOSAKI Motohiro
2010-06-04 11:27                       ` Oleg Nesterov
2010-06-04 11:27                         ` Oleg Nesterov
2010-06-04 11:34                         ` Oleg Nesterov
2010-06-04 11:34                           ` Oleg Nesterov
2010-06-09 19:53                         ` Oleg Nesterov
2010-06-09 19:53                           ` Oleg Nesterov
2010-06-09 20:41                           ` David Rientjes
2010-06-09 20:41                             ` David Rientjes
2010-06-09 21:03                             ` Oleg Nesterov
2010-06-09 21:03                               ` Oleg Nesterov
2010-06-13 11:24                           ` KOSAKI Motohiro
2010-06-13 11:24                             ` KOSAKI Motohiro
2010-06-13 15:53                             ` Oleg Nesterov
2010-06-13 15:53                               ` Oleg Nesterov
2010-06-13 17:13                               ` uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible) Oleg Nesterov
2010-06-13 17:13                                 ` Oleg Nesterov
2010-06-14  0:56                                 ` Roland McGrath
2010-06-14  0:56                                   ` Roland McGrath
2010-06-14 16:33                                   ` Oleg Nesterov
2010-06-14 16:33                                     ` Oleg Nesterov
2010-06-14 19:17                                     ` Roland McGrath
2010-06-14 19:17                                       ` Roland McGrath
2010-06-28 17:33                                       ` Oleg Nesterov
2010-06-28 17:33                                         ` Oleg Nesterov
2010-06-28 18:04                                         ` Roland McGrath
2010-06-28 18:04                                           ` Roland McGrath
2010-06-14  0:36                               ` [PATCH] oom: Make coredump interruptible Roland McGrath
2010-06-14  0:36                                 ` Roland McGrath
2010-06-14  0:26                     ` Roland McGrath
2010-06-14  0:26                       ` Roland McGrath
2010-06-01 20:39   ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account David Rientjes
2010-06-01 20:39     ` David Rientjes
2010-05-31  9:36 ` [PATCH 3/5] oom: introduce find_lock_task_mm() to fix !mm false positives KOSAKI Motohiro
2010-05-31  9:36   ` KOSAKI Motohiro
2010-06-01  0:57   ` KAMEZAWA Hiroyuki
2010-06-01  0:57     ` KAMEZAWA Hiroyuki
2010-06-01 20:42   ` David Rientjes
2010-06-01 20:42     ` David Rientjes
2010-06-02 16:05   ` Minchan Kim
2010-06-02 16:05     ` Minchan Kim
2010-05-31  9:37 ` [PATCH 4/5] oom: the points calculation of child processes must use find_lock_task_mm() too KOSAKI Motohiro
2010-05-31  9:37   ` KOSAKI Motohiro
2010-05-31 16:56   ` Oleg Nesterov
2010-05-31 16:56     ` Oleg Nesterov
2010-05-31 23:48     ` KOSAKI Motohiro
2010-05-31 23:48       ` KOSAKI Motohiro
2010-05-31  9:38 ` [PATCH 5/5] oom: __oom_kill_task() " KOSAKI Motohiro
2010-05-31  9:38   ` KOSAKI Motohiro
2010-06-01  1:02   ` KAMEZAWA Hiroyuki
2010-06-01  1:02     ` KAMEZAWA Hiroyuki
2010-06-01 20:44   ` David Rientjes
2010-06-01 20:44     ` David Rientjes
2010-06-01  0:54 ` [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KAMEZAWA Hiroyuki
2010-06-01  0:54   ` KAMEZAWA Hiroyuki
2010-06-01 20:36 ` David Rientjes
2010-06-01 20:36   ` David Rientjes
2010-06-01 21:20   ` Oleg Nesterov
2010-06-01 21:20     ` Oleg Nesterov
2010-06-01 21:26     ` David Rientjes
2010-06-01 21:26       ` David Rientjes
2010-06-02 13:54       ` KOSAKI Motohiro
2010-06-02 13:54         ` KOSAKI Motohiro
2010-06-02 21:09         ` David Rientjes
2010-06-02 21:09           ` David Rientjes
2010-06-02 21:33           ` Oleg Nesterov
2010-06-02 21:33             ` Oleg Nesterov
2010-06-02 21:46             ` David Rientjes
2010-06-02 21:46               ` David Rientjes
2010-06-03 14:27               ` Oleg Nesterov
2010-06-03 14:27                 ` Oleg Nesterov
2010-06-03 20:11                 ` David Rientjes
2010-06-03 20:11                   ` David Rientjes
2010-06-02 15:32 ` Minchan Kim
2010-06-02 15:32   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100601201843.GA20732@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.