From: Andrew Morton <akpm@linux-foundation.org>
To: Andrea Arcangeli <andrea@suse.de>
Cc: linux-mm@kvack.org, David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 09 of 24] fallback killing more tasks if tif-memdie doesn't go away
Date: Wed, 12 Sep 2007 05:30:22 -0700 [thread overview]
Message-ID: <20070912053022.b7d152c3.akpm@linux-foundation.org> (raw)
In-Reply-To: <9bf6a66eab3c52327daa.1187786936@v2.random>
On Wed, 22 Aug 2007 14:48:56 +0200 Andrea Arcangeli <andrea@suse.de> wrote:
> # HG changeset patch
> # User Andrea Arcangeli <andrea@suse.de>
> # Date 1187778125 -7200
> # Node ID 9bf6a66eab3c52327daa831ef101d7802bc71791
> # Parent ffdc30241856d7155ceedd4132eef684f7cc7059
> fallback killing more tasks if tif-memdie doesn't go away
>
> Waiting indefinitely for a TIF_MEMDIE task to go away will deadlock. Two
> tasks reading from the same inode at the same time and both going out of
> memory inside a read(largebuffer) syscall, will even deadlock through
> contention over the PG_locked bitflag. The task holding the semaphore
> detects oom but the oom killer decides to kill the task blocked in
> wait_on_page_locked(). The task holding the semaphore will hang inside
> alloc_pages that will never return because it will wait the TIF_MEMDIE
> task to go away, but the TIF_MEMDIE task can't go away until the task
> holding the semaphore is killed in the first place.
hrm, OK, that's not nice
> It's quite unpractical to teach the oom killer the locking dependencies
> across running tasks, so the feasible fix is to develop a logic that
> after waiting a long time for a TIF_MEMDIE tasks goes away, fallbacks
> on killing one more task. This also eliminates the possibility of
> suprious oom killage (i.e. two tasks killed despite only one had to be
> killed). It's not a math guarantee because we can't demonstrate that if
> a TIF_MEMDIE SIGKILLED task didn't mange to complete do_exit within
> 10sec, it never will. But the current probability of suprious oom
> killing is sure much higher than the probability of suprious oom killing
> with this patch applied.
>
> The whole locking is around the tasklist_lock. On one side do_exit reads
> TIF_MEMDIE and clears VM_is_OOM under the lock, on the other side the
> oom killer accesses VM_is_OOM and TIF_MEMDIE under the lock. This is a
> read_lock in the oom killer but it's actually a write lock thanks to the
> OOM_lock semaphore running one oom killer at once (the locking rule is,
> either use write_lock_irq or read_lock+OOM_lock).
>
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -849,6 +849,15 @@ static void exit_notify(struct task_stru
> if (tsk->exit_signal == -1 && likely(!tsk->ptrace))
> state = EXIT_DEAD;
> tsk->exit_state = state;
> +
> + /*
> + * Read TIF_MEMDIE and set VM_is_OOM to 0 atomically inside
> + * the tasklist_lock_lock.
> + */
> + if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) {
> + extern unsigned long VM_is_OOM;
> + clear_bit(0, &VM_is_OOM);
> + }
Please, no externs-in-C, ever.
> write_unlock_irq(&tasklist_lock);
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -29,6 +29,9 @@ int sysctl_panic_on_oom;
> int sysctl_panic_on_oom;
> /* #define DEBUG */
>
> +unsigned long VM_is_OOM;
what's with the studlycaps, btw?
> +static unsigned long last_tif_memdie_jiffies;
> +
> /**
> * badness - calculate a numeric value for how bad this task has been
> * @p: task struct of which task we should calculate
> @@ -226,21 +229,14 @@ static struct task_struct *select_bad_pr
> if (is_init(p))
> continue;
>
> - /*
> - * This task already has access to memory reserves and is
> - * being killed. Don't allow any other task access to the
> - * memory reserve.
> - *
> - * Note: this may have a chance of deadlock if it gets
> - * blocked waiting for another task which itself is waiting
> - * for memory. Is there a better alternative?
> - *
> - * Better not to skip PF_EXITING tasks, since they
> - * don't have access to the PF_MEMALLOC pool until
> - * we select them here first.
> - */
> - if (test_tsk_thread_flag(p, TIF_MEMDIE))
> - return ERR_PTR(-1UL);
> + if (unlikely(test_tsk_thread_flag(p, TIF_MEMDIE))) {
> + /*
> + * Either we already waited long enough,
> + * or exit_mm already run, so we must
> + * try to kill another task.
> + */
> + continue;
> + }
>
> if (p->oomkilladj == OOM_DISABLE)
> continue;
> @@ -277,13 +273,16 @@ static void __oom_kill_task(struct task_
> if (verbose)
> printk(KERN_ERR "Killed process %d (%s)\n", p->pid, p->comm);
>
> + if (!test_and_set_tsk_thread_flag(p, TIF_MEMDIE)) {
> + last_tif_memdie_jiffies = jiffies;
> + set_bit(0, &VM_is_OOM);
> + }
Do we actually need the bitops? Wouldn't a simple old foo=0 suffice here??
> /*
> * We give our sacrificial lamb high priority and access to
> * all the memory it needs. That way it should be able to
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> - set_tsk_thread_flag(p, TIF_MEMDIE);
>
> force_sig(SIGKILL, p);
> }
> @@ -420,6 +419,18 @@ void out_of_memory(struct zonelist *zone
> constraint = constrained_alloc(zonelist, gfp_mask);
> cpuset_lock();
> read_lock(&tasklist_lock);
> +
> + /*
> + * This holds the down(OOM_lock)+read_lock(tasklist_lock), so it's
> + * equivalent to write_lock_irq(tasklist_lock) as far as VM_is_OOM
> + * is concerned.
> + */
> + if (unlikely(test_bit(0, &VM_is_OOM))) {
> + if (time_before(jiffies, last_tif_memdie_jiffies + 10*HZ))
> + goto out;
> + printk("detected probable OOM deadlock, so killing another task\n");
Please include a facility level in all printks
> + last_tif_memdie_jiffies = jiffies;
> + }
>
> switch (constraint) {
> case CONSTRAINT_MEMORY_POLICY:
> @@ -441,10 +452,6 @@ retry:
> * issues we may have.
> */
> p = select_bad_process(&points);
> -
> - if (PTR_ERR(p) == -1UL)
> - goto out;
> /* Found nothing?!?! Either we hang forever, or we panic. */
> if (!p) {
> read_unlock(&tasklist_lock);
Something like this...
include/linux/swap.h | 1 +
kernel/exit.c | 11 +++++------
mm/oom_kill.c | 11 ++++++-----
3 files changed, 12 insertions(+), 11 deletions(-)
diff -puN kernel/exit.c~oom-handling-fallback-killing-more-tasks-if-tif-memdie-doesnt-go-away-fix kernel/exit.c
--- a/kernel/exit.c~oom-handling-fallback-killing-more-tasks-if-tif-memdie-doesnt-go-away-fix
+++ a/kernel/exit.c
@@ -30,6 +30,7 @@
#include <linux/kthread.h>
#include <linux/mempolicy.h>
#include <linux/taskstats_kern.h>
+#include <linux/swap.h>
#include <linux/delayacct.h>
#include <linux/freezer.h>
#include <linux/cpuset.h>
@@ -851,13 +852,11 @@ static void exit_notify(struct task_stru
tsk->exit_state = state;
/*
- * Read TIF_MEMDIE and set VM_is_OOM to 0 atomically inside
- * the tasklist_lock_lock.
+ * Read TIF_MEMDIE and set vm_is_oom to 0 atomically inside
+ * the tasklist_lock.
*/
- if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) {
- extern unsigned long VM_is_OOM;
- clear_bit(0, &VM_is_OOM);
- }
+ if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE)))
+ clear_bit(0, &vm_is_oom);
write_unlock_irq(&tasklist_lock);
diff -puN mm/oom_kill.c~oom-handling-fallback-killing-more-tasks-if-tif-memdie-doesnt-go-away-fix mm/oom_kill.c
--- a/mm/oom_kill.c~oom-handling-fallback-killing-more-tasks-if-tif-memdie-doesnt-go-away-fix
+++ a/mm/oom_kill.c
@@ -29,7 +29,7 @@
int sysctl_panic_on_oom;
/* #define DEBUG */
-unsigned long VM_is_OOM;
+unsigned long vm_is_oom;
static unsigned long last_tif_memdie_jiffies;
/**
@@ -268,7 +268,7 @@ static void __oom_kill_task(struct task_
if (!test_and_set_tsk_thread_flag(p, TIF_MEMDIE)) {
last_tif_memdie_jiffies = jiffies;
- set_bit(0, &VM_is_OOM);
+ set_bit(0, &vm_is_oom);
}
/*
* We give our sacrificial lamb high priority and access to
@@ -415,13 +415,14 @@ void out_of_memory(struct zonelist *zone
/*
* This holds the down(OOM_lock)+read_lock(tasklist_lock), so it's
- * equivalent to write_lock_irq(tasklist_lock) as far as VM_is_OOM
+ * equivalent to write_lock_irq(tasklist_lock) as far as vm_is_oom
* is concerned.
*/
- if (unlikely(test_bit(0, &VM_is_OOM))) {
+ if (unlikely(test_bit(0, &
if (time_before(jiffies, last_tif_memdie_jiffies + 10*HZ))
goto out;
- printk("detected probable OOM deadlock, so killing another task\n");
+ printk(KERN_ERR "detected probable OOM deadlock, so killing "
+ "another task\n");
last_tif_memdie_jiffies = jiffies;
}
diff -puN include/linux/swap.h~oom-handling-fallback-killing-more-tasks-if-tif-memdie-doesnt-go-away-fix include/linux/swap.h
--- a/include/linux/swap.h~oom-handling-fallback-killing-more-tasks-if-tif-memdie-doesnt-go-away-fix
+++ a/include/linux/swap.h
@@ -209,6 +209,7 @@ static inline int zone_reclaim(struct zo
#endif
extern int kswapd_run(int nid);
+extern unsigned long vm_is_oom;
#ifdef CONFIG_MMU
/* linux/mm/shmem.c */
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-09-12 12:30 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-22 12:48 [PATCH 00 of 24] OOM related fixes Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 01 of 24] remove nr_scan_inactive/active Andrea Arcangeli
2007-09-12 11:44 ` Andrew Morton
2008-01-02 17:50 ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 02 of 24] avoid oom deadlock in nfs_create_request Andrea Arcangeli
2007-09-12 23:54 ` Christoph Lameter
2007-08-22 12:48 ` [PATCH 03 of 24] prevent oom deadlocks during read/write operations Andrea Arcangeli
2007-09-12 11:56 ` Andrew Morton
2007-09-12 2:18 ` Nick Piggin
2008-01-03 0:53 ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 04 of 24] serialize oom killer Andrea Arcangeli
2007-09-12 12:02 ` Andrew Morton
2007-09-12 12:04 ` Andrew Morton
2007-09-12 12:11 ` Andrea Arcangeli
2008-01-03 0:55 ` Andrea Arcangeli
2007-09-13 0:09 ` Christoph Lameter
2007-09-13 18:32 ` David Rientjes
2007-09-13 18:37 ` Christoph Lameter
2007-09-13 18:46 ` David Rientjes
2007-09-13 18:53 ` Christoph Lameter
2007-09-14 0:36 ` David Rientjes
2007-09-14 2:31 ` Christoph Lameter
2007-09-14 3:33 ` David Rientjes
2007-09-18 16:44 ` David Rientjes
2007-09-18 16:44 ` [patch 1/4] oom: move prototypes to appropriate header file David Rientjes
2007-09-18 16:44 ` [patch 2/4] oom: move constraints to enum David Rientjes
2007-09-18 16:44 ` [patch 3/4] oom: save zonelist pointer for oom killer calls David Rientjes
2007-09-18 16:44 ` [patch 4/4] oom: serialize out of memory calls David Rientjes
2007-09-18 19:54 ` Christoph Lameter
2007-09-18 19:56 ` David Rientjes
2007-09-18 20:01 ` Christoph Lameter
2007-09-18 20:06 ` David Rientjes
2007-09-18 20:23 ` [patch 5/4] oom: rename serialization helper functions David Rientjes
2007-09-18 20:26 ` Christoph Lameter
2007-09-18 20:39 ` [patch 5/4 v2] " David Rientjes
2007-09-18 20:59 ` Christoph Lameter
2007-09-18 19:57 ` [patch 3/4] oom: save zonelist pointer for oom killer calls Christoph Lameter
2007-09-18 20:13 ` David Rientjes
2007-09-18 20:16 ` Christoph Lameter
2007-09-18 20:47 ` [patch 6/4] oom: pass null to kfree if zonelist is not cleared David Rientjes
2007-09-18 21:01 ` Christoph Lameter
2007-09-18 21:13 ` David Rientjes
2007-09-18 21:25 ` Christoph Lameter
2007-09-18 22:16 ` David Rientjes
2007-09-19 17:09 ` Paul Jackson
2007-09-19 18:21 ` David Rientjes
2007-09-18 19:55 ` [patch 2/4] oom: move constraints to enum Christoph Lameter
2007-08-22 12:48 ` [PATCH 05 of 24] avoid selecting already killed tasks Andrea Arcangeli
2007-09-13 0:13 ` Christoph Lameter
2007-08-22 12:48 ` [PATCH 06 of 24] reduce the probability of an OOM livelock Andrea Arcangeli
2007-09-12 12:17 ` Andrew Morton
2008-01-03 1:03 ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 07 of 24] balance_pgdat doesn't return the number of pages freed Andrea Arcangeli
2007-09-12 12:18 ` Andrew Morton
2007-09-13 0:26 ` Christoph Lameter
2007-08-22 12:48 ` [PATCH 08 of 24] don't depend on PF_EXITING tasks to go away Andrea Arcangeli
2007-09-12 12:20 ` Andrew Morton
2008-01-03 0:56 ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 09 of 24] fallback killing more tasks if tif-memdie doesn't " Andrea Arcangeli
2007-09-12 12:30 ` Andrew Morton [this message]
2007-09-12 12:34 ` Andrew Morton
2008-01-03 1:06 ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 10 of 24] stop useless vm trashing while we wait the TIF_MEMDIE task to exit Andrea Arcangeli
2007-09-12 12:42 ` Andrew Morton
2007-09-13 0:36 ` Christoph Lameter
2007-09-21 19:10 ` David Rientjes
2008-01-03 1:08 ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 11 of 24] the oom schedule timeout isn't needed with the VM_is_OOM logic Andrea Arcangeli
2007-09-12 12:44 ` Andrew Morton
2007-08-22 12:48 ` [PATCH 12 of 24] show mem information only when a task is actually being killed Andrea Arcangeli
2007-09-12 12:49 ` Andrew Morton
2007-08-22 12:49 ` [PATCH 13 of 24] simplify oom heuristics Andrea Arcangeli
2007-09-12 12:52 ` Andrew Morton
2007-09-12 13:40 ` Andrea Arcangeli
2007-09-12 20:52 ` Andrew Morton
2007-08-22 12:49 ` [PATCH 14 of 24] oom select should only take rss into account Andrea Arcangeli
2007-09-13 0:43 ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 15 of 24] limit reclaim if enough pages have been freed Andrea Arcangeli
2007-09-12 12:57 ` Andrew Morton
2008-01-03 1:12 ` Andrea Arcangeli
2007-09-12 12:58 ` Andrew Morton
2007-09-12 13:38 ` Andrea Arcangeli
2007-08-22 12:49 ` [PATCH 16 of 24] avoid some lock operation in vm fast path Andrea Arcangeli
2007-09-12 12:59 ` Andrew Morton
2007-09-13 0:49 ` Christoph Lameter
2007-09-13 1:16 ` Andrew Morton
2007-09-13 1:33 ` Christoph Lameter
2007-09-13 1:41 ` KAMEZAWA Hiroyuki
2007-09-13 1:44 ` Andrew Morton
2007-08-22 12:49 ` [PATCH 17 of 24] apply the anti deadlock features only to global oom Andrea Arcangeli
2007-09-12 13:02 ` Andrew Morton
2007-09-13 0:53 ` Christoph Lameter
2007-09-13 0:52 ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 18 of 24] run panic the same way in both places Andrea Arcangeli
2007-09-13 0:54 ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 19 of 24] cacheline align VM_is_OOM to prevent false sharing Andrea Arcangeli
2007-09-12 13:02 ` Andrew Morton
2007-09-12 13:36 ` Andrea Arcangeli
2007-09-13 0:55 ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 20 of 24] extract deadlock helper function Andrea Arcangeli
2007-08-22 12:49 ` [PATCH 21 of 24] select process to kill for cpusets Andrea Arcangeli
2007-09-12 13:05 ` Andrew Morton
2007-09-13 0:59 ` Christoph Lameter
2007-09-13 5:13 ` David Rientjes
2007-09-13 17:55 ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 22 of 24] extract select helper function Andrea Arcangeli
2007-08-22 12:49 ` [PATCH 23 of 24] serialize for cpusets Andrea Arcangeli
2007-09-12 13:10 ` Andrew Morton
2007-09-12 13:34 ` Andrea Arcangeli
2007-09-12 19:08 ` David Rientjes
2007-09-13 1:02 ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 24 of 24] add oom_kill_asking_task flag Andrea Arcangeli
2007-09-12 13:11 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070912053022.b7d152c3.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=andrea@suse.de \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.