All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Michal Hocko <mhocko@suse.cz>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2 1/1] oom: don't assume that a coredumping thread will exit soon
Date: Wed, 3 Dec 2014 20:50:20 +0100	[thread overview]
Message-ID: <20141203195020.GB11558@redhat.com> (raw)
In-Reply-To: <20141203195001.GA11558@redhat.com>

oom_kill.c assumes that PF_EXITING task should exit and free the memory
soon. This is wrong in many ways and one important case is the coredump.
A task can sleep in exit_mm() "forever" while the coredumping sub-thread
can need more memory.

Change the PF_EXITING checks to take SIGNAL_GROUP_COREDUMP into account,
we add the new trivial helper for that.

Note: this is only the first step, this patch doesn't try to solve other
problems. The SIGNAL_GROUP_COREDUMP check is obviously racy, a task can
participate in coredump after it was already observed in PF_EXITING state,
so TIF_MEMDIE (which also blocks oom-killer) still can be wrongly set.
fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
And even the name/usage of the new helper is confusing, an exiting thread
can only free its ->mm if it is the only/last task in thread group.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/oom.h |    6 ++++++
 mm/memcontrol.c     |    2 +-
 mm/oom_kill.c       |    6 +++---
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index e8d6e10..e16b945 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -92,6 +92,12 @@ static inline bool oom_gfp_allowed(gfp_t gfp_mask)
 
 extern struct task_struct *find_lock_task_mm(struct task_struct *p);
 
+static inline bool task_will_free_mem(struct task_struct *task)
+{
+	return (task->flags & PF_EXITING) &&
+		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
+}
+
 /* sysctls */
 extern int sysctl_oom_dump_tasks;
 extern int sysctl_oom_kill_allocating_task;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d6ac0e3..3d1d49f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1734,7 +1734,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 * select it.  The goal is to allow it to allocate so that it may
 	 * quickly exit and free its memory.
 	 */
-	if (fatal_signal_pending(current) || current->flags & PF_EXITING) {
+	if (fatal_signal_pending(current) || task_will_free_mem(current)) {
 		set_thread_flag(TIF_MEMDIE);
 		return;
 	}
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5340f6b..837ff31 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -281,7 +281,7 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
 	if (oom_task_origin(task))
 		return OOM_SCAN_SELECT;
 
-	if (task->flags & PF_EXITING && !force_kill) {
+	if (task_will_free_mem(task) && !force_kill) {
 		/*
 		 * If this task is not being ptraced on exit, then wait for it
 		 * to finish before killing some other task unnecessarily.
@@ -443,7 +443,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 * If the task is already exiting, don't alarm the sysadmin or kill
 	 * its children or threads, just set TIF_MEMDIE so it can die quickly
 	 */
-	if (p->flags & PF_EXITING) {
+	if (task_will_free_mem(p)) {
 		set_tsk_thread_flag(p, TIF_MEMDIE);
 		put_task_struct(p);
 		return;
@@ -649,7 +649,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	 * select it.  The goal is to allow it to allocate so that it may
 	 * quickly exit and free its memory.
 	 */
-	if (fatal_signal_pending(current) || current->flags & PF_EXITING) {
+	if (fatal_signal_pending(current) || task_will_free_mem(current)) {
 		set_thread_flag(TIF_MEMDIE);
 		return;
 	}
-- 
1.5.5.1



  reply	other threads:[~2014-12-03 19:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-27 23:03 [PATCH 0/2] oom && coredump Oleg Nesterov
2014-11-27 23:04 ` [PATCH 1/2] oom: don't assume that a coredumping thread will exit soon Oleg Nesterov
2014-12-02  9:19   ` Michal Hocko
2014-12-02 17:50     ` Oleg Nesterov
2014-12-02 18:31       ` Michal Hocko
2014-12-02 19:16         ` Oleg Nesterov
2014-12-03 13:07           ` Michal Hocko
2014-11-27 23:04 ` [PATCH 2/2] oom: kill the insufficient and no longer needed PT_TRACE_EXIT check Oleg Nesterov
2014-12-02  9:35   ` Michal Hocko
2014-12-02 18:05     ` Oleg Nesterov
2014-12-02 18:23       ` Michal Hocko
2014-11-28 19:28 ` [PATCH 0/2] oom && coredump Oleg Nesterov
2014-12-03 19:50 ` [PATCH v2 0/1] oom: don't assume that a coredumping thread will exit soon Oleg Nesterov
2014-12-03 19:50   ` Oleg Nesterov [this message]
2014-12-03 21:58     ` [PATCH v2 1/1] " Andrew Morton
2014-12-04 14:34     ` Michal Hocko
2014-12-05 23:46     ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141203195020.GB11558@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=rientjes@google.com \
    --cc=rjw@rjwysocki.net \
    --cc=tj@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.