linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch] mm, oom: allow exiting tasks to have access to memory reserves
@ 2012-03-07  2:25 David Rientjes
  2012-03-07  6:39 ` KOSAKI Motohiro
  0 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2012-03-07  2:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro

The tasklist iteration only checks processes and avoids individual
threads so it is possible that threads that are currently exiting may not
appropriately being selected for oom kill.  This can lead to negative
results such as an innocent process being killed in the interim or, in
the worst case, the machine panicking because there is nothing else to kill.

We automatically select PF_EXITING threads during the tasklist iteration,
so this saves time and prevents threads that haven't yet exited (although
their parent has been oom killed) from getting missed.

Note that by doing this we aren't actually oom killing an exiting thread
but rather giving it full access to memory reserves so it may quickly
exit and free its memory.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -568,11 +568,11 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask)
 	struct task_struct *p;
 
 	/*
-	 * If current has a pending SIGKILL, then automatically select it.  The
-	 * goal is to allow it to allocate so that it may quickly exit and free
-	 * its memory.
+	 * If current is exiting (or going to exit), then automatically select
+	 * it.  The goal is to allow it to allocate so that it may quickly exit
+	 * and free its memory.
 	 */
-	if (fatal_signal_pending(current)) {
+	if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
 		set_thread_flag(TIF_MEMDIE);
 		return;
 	}
@@ -723,11 +723,11 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		return;
 
 	/*
-	 * If current has a pending SIGKILL, then automatically select it.  The
-	 * goal is to allow it to allocate so that it may quickly exit and free
-	 * its memory.
+	 * If current is exiting (or going to exit), then automatically select
+	 * it.  The goal is to allow it to allocate so that it may quickly exit
+	 * and free its memory.
 	 */
-	if (fatal_signal_pending(current)) {
+	if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
 		set_thread_flag(TIF_MEMDIE);
 		return;
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] mm, oom: allow exiting tasks to have access to memory reserves
  2012-03-07  2:25 [patch] mm, oom: allow exiting tasks to have access to memory reserves David Rientjes
@ 2012-03-07  6:39 ` KOSAKI Motohiro
  2012-03-07  7:21   ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: KOSAKI Motohiro @ 2012-03-07  6:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	kosaki.motohiro

(3/6/12 9:25 PM), David Rientjes wrote:
> The tasklist iteration only checks processes and avoids individual
> threads so it is possible that threads that are currently exiting may not
> appropriately being selected for oom kill.  This can lead to negative
> results such as an innocent process being killed in the interim or, in
> the worst case, the machine panicking because there is nothing else to kill.
>
> We automatically select PF_EXITING threads during the tasklist iteration,
> so this saves time and prevents threads that haven't yet exited (although
> their parent has been oom killed) from getting missed.
>
> Note that by doing this we aren't actually oom killing an exiting thread
> but rather giving it full access to memory reserves so it may quickly
> exit and free its memory.
>
> Signed-off-by: David Rientjes<rientjes@google.com>

As far as I remembered, this idea was sometimes NAKed and you don't bring new idea here.
When exiting a process which have plenty threads, this patch allow to eat all of reserve memory
and bring us new serious failure.

Moreover, creating new thread isn't needed root privilege, then this trick can be used by attacker.

- kosaki


> ---
>   mm/oom_kill.c |   16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -568,11 +568,11 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask)
>   	struct task_struct *p;
>
>   	/*
> -	 * If current has a pending SIGKILL, then automatically select it.  The
> -	 * goal is to allow it to allocate so that it may quickly exit and free
> -	 * its memory.
> +	 * If current is exiting (or going to exit), then automatically select
> +	 * it.  The goal is to allow it to allocate so that it may quickly exit
> +	 * and free its memory.
>   	 */
> -	if (fatal_signal_pending(current)) {
> +	if (fatal_signal_pending(current) || (current->flags&  PF_EXITING)) {
>   		set_thread_flag(TIF_MEMDIE);
>   		return;
>   	}
> @@ -723,11 +723,11 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
>   		return;
>
>   	/*
> -	 * If current has a pending SIGKILL, then automatically select it.  The
> -	 * goal is to allow it to allocate so that it may quickly exit and free
> -	 * its memory.
> +	 * If current is exiting (or going to exit), then automatically select
> +	 * it.  The goal is to allow it to allocate so that it may quickly exit
> +	 * and free its memory.
>   	 */
> -	if (fatal_signal_pending(current)) {
> +	if (fatal_signal_pending(current) || (current->flags&  PF_EXITING)) {
>   		set_thread_flag(TIF_MEMDIE);
>   		return;
>   	}
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] mm, oom: allow exiting tasks to have access to memory reserves
  2012-03-07  6:39 ` KOSAKI Motohiro
@ 2012-03-07  7:21   ` David Rientjes
  2012-03-07  7:22     ` [patch v2] " David Rientjes
  2012-03-08 20:08     ` [patch] " Andrew Morton
  0 siblings, 2 replies; 6+ messages in thread
From: David Rientjes @ 2012-03-07  7:21 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Wed, 7 Mar 2012, KOSAKI Motohiro wrote:

> As far as I remembered, this idea was sometimes NAKed and you don't bring new
> idea here.

Nope, all patches I've ever proposed for the oom killer have been merged 
in some form or another.

> When exiting a process which have plenty threads, this patch allow to eat all
> of reserve memory
> and bring us new serious failure.
> 

It closes the risk of livelock if an oom killed thread, thread A, cannot 
exit because it's blocked on another thread, thread B, which cannot exit 
because it requires memory in the exit path and doesn't have access to 
memory reserves.  So this patch makes it more likely that an oom killed 
thread will be able to exit without livelocking.

You do remind me that we can remove this logic from select_bad_process(), 
however, as a cleanup which results in more lines being removed than 
added.  I'll reply with a v2.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [patch v2] mm, oom: allow exiting tasks to have access to memory reserves
  2012-03-07  7:21   ` David Rientjes
@ 2012-03-07  7:22     ` David Rientjes
  2012-03-08 20:08     ` [patch] " Andrew Morton
  1 sibling, 0 replies; 6+ messages in thread
From: David Rientjes @ 2012-03-07  7:22 UTC (permalink / raw)
  To: Andrew Morton, KOSAKI Motohiro
  Cc: linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro

The tasklist iteration only checks processes and avoids individual
threads so it is possible that threads that are currently exiting may not
appropriately being selected for oom kill.  This can lead to negative
results such as an innocent process being killed in the interim or, in
the worst case, the machine panicking because there is nothing else to kill.

We automatically select PF_EXITING threads during the tasklist iteration in
select_bad_process(), so this saves time and prevents threads that haven't
yet exited (although their parent has been oom killed) from getting missed.
It also allows that code to be removed from select_bad_process().

Note that by doing this we aren't actually oom killing an exiting thread
but rather giving it full access to memory reserves so it may quickly
exit and free its memory.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c |   40 +++++++++++++---------------------------
 1 file changed, 13 insertions(+), 27 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -342,26 +342,12 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 
 		if (p->flags & PF_EXITING) {
 			/*
-			 * If p is the current task and is in the process of
-			 * releasing memory, we allow the "kill" to set
-			 * TIF_MEMDIE, which will allow it to gain access to
-			 * memory reserves.  Otherwise, it may stall forever.
-			 *
-			 * The loop isn't broken here, however, in case other
-			 * threads are found to have already been oom killed.
+			 * If this task is not being ptraced on exit, then wait
+			 * for it to finish before killing some other task
+			 * unnecessarily.
 			 */
-			if (p == current) {
-				chosen = p;
-				*ppoints = 1000;
-			} else {
-				/*
-				 * If this task is not being ptraced on exit,
-				 * then wait for it to finish before killing
-				 * some other task unnecessarily.
-				 */
-				if (!(p->group_leader->ptrace & PT_TRACE_EXIT))
-					return ERR_PTR(-1UL);
-			}
+			if (!(p->group_leader->ptrace & PT_TRACE_EXIT))
+				return ERR_PTR(-1UL);
 		}
 
 		points = oom_badness(p, memcg, nodemask, totalpages);
@@ -568,11 +554,11 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask)
 	struct task_struct *p;
 
 	/*
-	 * If current has a pending SIGKILL, then automatically select it.  The
-	 * goal is to allow it to allocate so that it may quickly exit and free
-	 * its memory.
+	 * If current is exiting (or going to exit), then automatically select
+	 * it.  The goal is to allow it to allocate so that it may quickly exit
+	 * and free its memory.
 	 */
-	if (fatal_signal_pending(current)) {
+	if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
 		set_thread_flag(TIF_MEMDIE);
 		return;
 	}
@@ -723,11 +709,11 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		return;
 
 	/*
-	 * If current has a pending SIGKILL, then automatically select it.  The
-	 * goal is to allow it to allocate so that it may quickly exit and free
-	 * its memory.
+	 * If current is exiting (or going to exit), then automatically select
+	 * it.  The goal is to allow it to allocate so that it may quickly exit
+	 * and free its memory.
 	 */
-	if (fatal_signal_pending(current)) {
+	if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
 		set_thread_flag(TIF_MEMDIE);
 		return;
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] mm, oom: allow exiting tasks to have access to memory reserves
  2012-03-07  7:21   ` David Rientjes
  2012-03-07  7:22     ` [patch v2] " David Rientjes
@ 2012-03-08 20:08     ` Andrew Morton
  2012-03-08 21:59       ` David Rientjes
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2012-03-08 20:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Tue, 6 Mar 2012 23:21:52 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> Nope, all patches I've ever proposed for the oom killer have been merged 
> in some form or another.
> 
> > When exiting a process which have plenty threads, this patch allow to eat all
> > of reserve memory
> > and bring us new serious failure.
> > 
> 
> It closes the risk of livelock if an oom killed thread, thread A, cannot 
> exit because it's blocked on another thread, thread B, which cannot exit 
> because it requires memory in the exit path and doesn't have access to 
> memory reserves.  So this patch makes it more likely that an oom killed 
> thread will be able to exit without livelocking.

But it also "allow to eat all of reserve memory and bring us new
serious failure".  In theory, at least.

And afaict the proposed patch is a theoretical thing as well.  Has
anyone sat down and created tests to demonstrate either problem?  This
patch is either two-steps-forward-and-one-back or it is
one-step-forward-and-two-steps-back.  How are we to determine which of
these it is?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] mm, oom: allow exiting tasks to have access to memory reserves
  2012-03-08 20:08     ` [patch] " Andrew Morton
@ 2012-03-08 21:59       ` David Rientjes
  0 siblings, 0 replies; 6+ messages in thread
From: David Rientjes @ 2012-03-08 21:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Thu, 8 Mar 2012, Andrew Morton wrote:

> > It closes the risk of livelock if an oom killed thread, thread A, cannot 
> > exit because it's blocked on another thread, thread B, which cannot exit 
> > because it requires memory in the exit path and doesn't have access to 
> > memory reserves.  So this patch makes it more likely that an oom killed 
> > thread will be able to exit without livelocking.
> 
> But it also "allow to eat all of reserve memory and bring us new
> serious failure".  In theory, at least.
> 

Exactly, "in theory."  We've never seen an issue where a set of threads in 
do_exit() allocated memory at the same time to deplete all memory reserves 
while never freeing the memory so that reclaim consistently fails and all 
threads continue to enter into the oom killer to get access to memory 
reserves.

And, with the way the code is written before this patch, only one thread 
will have access to memory reserves and the oom killer will be a no-op 
until it exits.  There's a much higher liklihood that an oom killed thread 
may not exit because it's blocked on another thread that requires memory.  
That's what this patch addresses.

> And afaict the proposed patch is a theoretical thing as well.  Has
> anyone sat down and created tests to demonstrate either problem?

We've run with this patch internally for a year because an oom killed 
thread can't exit.  We used to address this with an oom killer timeout 
that would kill another thread only after 10s but it was much faster to 
just give access to memory reserves and to let them exit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-08 21:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-07  2:25 [patch] mm, oom: allow exiting tasks to have access to memory reserves David Rientjes
2012-03-07  6:39 ` KOSAKI Motohiro
2012-03-07  7:21   ` David Rientjes
2012-03-07  7:22     ` [patch v2] " David Rientjes
2012-03-08 20:08     ` [patch] " Andrew Morton
2012-03-08 21:59       ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).