linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
@ 2006-04-13 21:52 Dave Peterson
  2006-04-13 23:24 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Peterson @ 2006-04-13 21:52 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, riel, akpm

The patch below fixes some mm_struct reference counting bugs in
badness().

Signed-Off-By: David S. Peterson <dsp@llnl.gov>
---

Index: linux-2.6.17-rc1-oom/mm/oom_kill.c
===================================================================
--- linux-2.6.17-rc1-oom.orig/mm/oom_kill.c	2006-04-13 14:25:16.000000000 -0700
+++ linux-2.6.17-rc1-oom/mm/oom_kill.c	2006-04-13 14:31:47.000000000 -0700
@@ -47,14 +47,17 @@ unsigned long badness(struct task_struct
 {
 	unsigned long points, cpu_time, run_time, s;
 	struct list_head *tsk;
+	struct mm_struct *mm, *child_mm;
 
-	if (!p->mm)
+	mm = get_task_mm(p);
+
+	if (mm == NULL)
 		return 0;
 
 	/*
 	 * The memory size of the process is the basis for the badness.
 	 */
-	points = p->mm->total_vm;
+	points = mm->total_vm;
 
 	/*
 	 * Processes which fork a lot of child processes are likely
@@ -67,10 +70,21 @@ unsigned long badness(struct task_struct
 	list_for_each(tsk, &p->children) {
 		struct task_struct *chld;
 		chld = list_entry(tsk, struct task_struct, sibling);
-		if (chld->mm != p->mm && chld->mm)
-			points += chld->mm->total_vm/2 + 1;
+
+		if (chld->mm == mm)
+			continue;
+
+		child_mm = get_task_mm(chld);
+
+		if (child_mm == NULL)
+			continue;
+
+		points += child_mm->total_vm/2 + 1;
+		mmput(child_mm);
 	}
 
+	mmput(mm);
+
 	/*
 	 * CPU time is in tens of seconds and run time is in thousands
          * of seconds. There is no particular reason for this other than

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-13 21:52 [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c Dave Peterson
@ 2006-04-13 23:24 ` Andrew Morton
  2006-04-14  0:44   ` Dave Peterson
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2006-04-13 23:24 UTC (permalink / raw)
  To: Dave Peterson; +Cc: linux-mm, linux-kernel, riel

Dave Peterson <dsp@llnl.gov> wrote:
>
> The patch below fixes some mm_struct reference counting bugs in
> badness().

hm, OK, afaict the code _is_ racy.

But you're now calling mmput() inside read_lock(&tasklist_lock), and
mmput() can sleep in exit_aio() or in exit_mmap()->unmap_vmas().  So
sterner stuff will be needed.

I'll put a might_sleep() into mmput - it's a bit unexpected.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-13 23:24 ` Andrew Morton
@ 2006-04-14  0:44   ` Dave Peterson
  2006-04-14  7:26     ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Peterson @ 2006-04-14  0:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel

On Thursday 13 April 2006 16:24, Andrew Morton wrote:
> Dave Peterson <dsp@llnl.gov> wrote:
> > The patch below fixes some mm_struct reference counting bugs in
> > badness().
>
> hm, OK, afaict the code _is_ racy.
>
> But you're now calling mmput() inside read_lock(&tasklist_lock), and
> mmput() can sleep in exit_aio() or in exit_mmap()->unmap_vmas().  So
> sterner stuff will be needed.
>
> I'll put a might_sleep() into mmput - it's a bit unexpected.

Hmm... fixing this looks rather tricky.  If get_task_mm()/mmput() was
only being done on a single mm_struct then I suppose badness() could
do something a bit ugly like passing the reference back to its caller
and letting the caller do the mmput() once tasklist_lock is no longer
held.  However here we are iterating over a bunch of child tasks,
potentially doing a get_task_mm()/mmput() for a number of them.

I have a suggestion for a possible solution.  Currently mmput() is
implemented as follows:

    01 void mmput(struct mm_struct *mm)
    02 {
    03         if (atomic_dec_and_lock(&mm->mm_users, &mmlist_lock)) {
    04                 list_del(&mm->mmlist);
    05                 mmlist_nr--;
    06                 spin_unlock(&mmlist_lock);
    07                 exit_aio(mm);
    08                 exit_mmap(mm);
    09                 put_swap_token(mm);
    10                 mmdrop(mm);
    11         }
    12 }

Suppose we replace lines 07-10 with a little piece of code that adds
the mm_struct to a list.  Then a kernel thread empties the list
(perhaps via the work queue mechanism), doing the stuff in lines
07-10 for each mm_struct.  This would eliminate the possibility of
mmput() sleeping, potentially making things easier for other callers
of mmput() and causing fewer surprises.  Any comments?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-14  0:44   ` Dave Peterson
@ 2006-04-14  7:26     ` Andrew Morton
  2006-04-14 19:14       ` Dave Peterson
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2006-04-14  7:26 UTC (permalink / raw)
  To: Dave Peterson; +Cc: linux-mm, linux-kernel, riel

Dave Peterson <dsp@llnl.gov> wrote:
>
> On Thursday 13 April 2006 16:24, Andrew Morton wrote:
> > Dave Peterson <dsp@llnl.gov> wrote:
> > > The patch below fixes some mm_struct reference counting bugs in
> > > badness().
> >
> > hm, OK, afaict the code _is_ racy.
> >
> > But you're now calling mmput() inside read_lock(&tasklist_lock), and
> > mmput() can sleep in exit_aio() or in exit_mmap()->unmap_vmas().  So
> > sterner stuff will be needed.
> >
> > I'll put a might_sleep() into mmput - it's a bit unexpected.
> 
> Hmm... fixing this looks rather tricky.  If get_task_mm()/mmput() was
> only being done on a single mm_struct then I suppose badness() could
> do something a bit ugly like passing the reference back to its caller
> and letting the caller do the mmput() once tasklist_lock is no longer
> held.  However here we are iterating over a bunch of child tasks,
> potentially doing a get_task_mm()/mmput() for a number of them.
> 
> I have a suggestion for a possible solution.  Currently mmput() is
> implemented as follows:
> 
>     01 void mmput(struct mm_struct *mm)
>     02 {
>     03         if (atomic_dec_and_lock(&mm->mm_users, &mmlist_lock)) {
>     04                 list_del(&mm->mmlist);
>     05                 mmlist_nr--;
>     06                 spin_unlock(&mmlist_lock);
>     07                 exit_aio(mm);
>     08                 exit_mmap(mm);
>     09                 put_swap_token(mm);
>     10                 mmdrop(mm);
>     11         }
>     12 }
> 
> Suppose we replace lines 07-10 with a little piece of code that adds
> the mm_struct to a list.  Then a kernel thread empties the list
> (perhaps via the work queue mechanism), doing the stuff in lines
> 07-10 for each mm_struct.  This would eliminate the possibility of
> mmput() sleeping, potentially making things easier for other callers
> of mmput() and causing fewer surprises.  Any comments?

task_lock() can be used to pin a task's ->mm.  To use task_lock() in
badness() we'd need to either

a) nest task_lock()s.  I don't know if we're doing that anywhere else,
   but the parent->child ordering is a natural one.  or

b) take a ref on the parent's mm_struct, drop the parent's task_lock()
   while we walk the children, then do mmput() on the parent's mm outside
   tasklist_lock.  This is probably better.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-14  7:26     ` Andrew Morton
@ 2006-04-14 19:14       ` Dave Peterson
  2006-04-14 19:45         ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Peterson @ 2006-04-14 19:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel

On Friday 14 April 2006 00:26, Andrew Morton wrote:
> task_lock() can be used to pin a task's ->mm.  To use task_lock() in
> badness() we'd need to either
>
> a) nest task_lock()s.  I don't know if we're doing that anywhere else,
>    but the parent->child ordering is a natural one.  or
>
> b) take a ref on the parent's mm_struct, drop the parent's task_lock()
>    while we walk the children, then do mmput() on the parent's mm outside
>    tasklist_lock.  This is probably better.

Looking a bit more closely at the code, I see that
select_bad_process() iterates over all tasks, repeatedly calling
badness().  This would complicate option 'b' since the iteration is
done while holding tasklist_lock.  An alternative to option 'a' that
avoids nesting task_lock()s would be to define a couple of new
functions that might look something like this:

    void mmput_atomic(struct mm_struct *mm)
    {
            if (atomic_dec_and_test(&mm->mm_users)) {
                    add mm to a global list of expired mm_structs
            }
    }

    void mmput_atomic_cleanup(void)
    {
            empty the global list of expired mm_structs and do
            cleanup stuff for each one
    }

Then you could call mmput_atomic() an arbitrary # of times in places
where sleeping is not permitted, as long as mmput_atomic_cleanup() is
later called in a place where sleeping is permissible.  In the case
of the OOM killer code, a call to mmput_atomic_cleanup() could be
added to out_of_memory() in a place where we no longer hold
tasklist_lock.  Let me know if you have a preference for either of
these options, or if you have other suggestions.

Thanks,
Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-14 19:14       ` Dave Peterson
@ 2006-04-14 19:45         ` Andrew Morton
  2006-04-14 20:49           ` Dave Peterson
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2006-04-14 19:45 UTC (permalink / raw)
  To: Dave Peterson; +Cc: linux-mm, linux-kernel, riel

Dave Peterson <dsp@llnl.gov> wrote:
>
> On Friday 14 April 2006 00:26, Andrew Morton wrote:
> > task_lock() can be used to pin a task's ->mm.  To use task_lock() in
> > badness() we'd need to either
> >
> > a) nest task_lock()s.  I don't know if we're doing that anywhere else,
> >    but the parent->child ordering is a natural one.  or
> >
> > b) take a ref on the parent's mm_struct, drop the parent's task_lock()
> >    while we walk the children, then do mmput() on the parent's mm outside
> >    tasklist_lock.  This is probably better.
> 
> Looking a bit more closely at the code, I see that
> select_bad_process() iterates over all tasks, repeatedly calling
> badness().  This would complicate option 'b' since the iteration is
> done while holding tasklist_lock.  An alternative to option 'a' that
> avoids nesting task_lock()s would be to define a couple of new
> functions that might look something like this:
> 
>     void mmput_atomic(struct mm_struct *mm)
>     {
>             if (atomic_dec_and_test(&mm->mm_users)) {
>                     add mm to a global list of expired mm_structs
>             }
>     }
> 
>     void mmput_atomic_cleanup(void)
>     {
>             empty the global list of expired mm_structs and do
>             cleanup stuff for each one
>     }

I think that's way too elaborate.

What's wrong with this?


--- 25/mm/oom_kill.c~a	Fri Apr 14 12:37:51 2006
+++ 25-akpm/mm/oom_kill.c	Fri Apr 14 12:44:49 2006
@@ -47,15 +47,25 @@ int sysctl_panic_on_oom;
 unsigned long badness(struct task_struct *p, unsigned long uptime)
 {
 	unsigned long points, cpu_time, run_time, s;
-	struct list_head *tsk;
+	struct mm_struct *mm;
+	struct task_struct *child;
 
-	if (!p->mm)
+	task_lock(p);
+	mm = p->mm;
+	if (!mm) {
+		task_unlock(p);
 		return 0;
+	}
 
 	/*
 	 * The memory size of the process is the basis for the badness.
 	 */
-	points = p->mm->total_vm;
+	points = mm->total_vm;
+
+	/*
+	 * After this unlock we can no longer dereference local variable `mm'
+	 */
+	task_unlock(p);
 
 	/*
 	 * Processes which fork a lot of child processes are likely
@@ -65,11 +75,11 @@ unsigned long badness(struct task_struct
 	 * child is eating the vast majority of memory, adding only half
 	 * to the parents will make the child our kill candidate of choice.
 	 */
-	list_for_each(tsk, &p->children) {
-		struct task_struct *chld;
-		chld = list_entry(tsk, struct task_struct, sibling);
-		if (chld->mm != p->mm && chld->mm)
-			points += chld->mm->total_vm/2 + 1;
+	list_for_each_entry(child, &p->children, sibling) {
+		task_lock(child);
+		if (child->mm != mm && child->mm)
+			points += child->mm->total_vm/2 + 1;
+		task_unlock(child);
 	}
 
 	/*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-14 19:45         ` Andrew Morton
@ 2006-04-14 20:49           ` Dave Peterson
  2006-04-14 21:31             ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Peterson @ 2006-04-14 20:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel

On Friday 14 April 2006 12:45, Andrew Morton wrote:
> Dave Peterson <dsp@llnl.gov> wrote:
> > On Friday 14 April 2006 00:26, Andrew Morton wrote:
> > > task_lock() can be used to pin a task's ->mm.  To use task_lock() in
> > > badness() we'd need to either
> > >
> > > a) nest task_lock()s.  I don't know if we're doing that anywhere else,
> > >    but the parent->child ordering is a natural one.  or
> > >
> > > b) take a ref on the parent's mm_struct, drop the parent's task_lock()
> > >    while we walk the children, then do mmput() on the parent's mm
> > > outside tasklist_lock.  This is probably better.
> >
> > Looking a bit more closely at the code, I see that
> > select_bad_process() iterates over all tasks, repeatedly calling
> > badness().  This would complicate option 'b' since the iteration is
> > done while holding tasklist_lock.  An alternative to option 'a' that
> > avoids nesting task_lock()s would be to define a couple of new
> > functions that might look something like this:
> >
> >     void mmput_atomic(struct mm_struct *mm)
> >     {
> >             if (atomic_dec_and_test(&mm->mm_users)) {
> >                     add mm to a global list of expired mm_structs
> >             }
> >     }
> >
> >     void mmput_atomic_cleanup(void)
> >     {
> >             empty the global list of expired mm_structs and do
> >             cleanup stuff for each one
> >     }
>
> I think that's way too elaborate.
>
> What's wrong with this?

Yes of course... no need to nest task_lock()s after all.  Looks good
to me.

Another thing I noticed: oom_kill_task() calls mmput() while holding
tasklist_lock.  Here the calls to get_task_mm() and mmput() appear to
be unnecessary.  We shouldn't need to use any kind of locking or
reference counting since oom_kill_task() doesn't dereference into the
mm_struct or require the value of p->mm to stay constant.  I believe
the following (untested) code changes should fix the problem (and
simplify some other parts of the code).  Does this look correct?


diff -urNp -X /home/dsp/dontdiff linux-2.6.17-rc1/mm/oom_kill.c linux-2.6.17-rc1-fix/mm/oom_kill.c
--- linux-2.6.17-rc1/mm/oom_kill.c	2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.17-rc1-fix/mm/oom_kill.c	2006-04-14 13:22:15.000000000 -0700
@@ -244,17 +244,15 @@ static void __oom_kill_task(task_t *p, c
 	force_sig(SIGKILL, p);
 }
 
-static struct mm_struct *oom_kill_task(task_t *p, const char *message)
+static int oom_kill_task(task_t *p, const char *message)
 {
-	struct mm_struct *mm = get_task_mm(p);
+	struct mm_struct *mm;
 	task_t * g, * q;
 
-	if (!mm)
-		return NULL;
-	if (mm == &init_mm) {
-		mmput(mm);
-		return NULL;
-	}
+	mm = p->mm;
+
+	if ((mm == NULL) || (mm == &init_mm))
+		return 1;
 
 	__oom_kill_task(p, message);
 	/*
@@ -266,13 +264,12 @@ static struct mm_struct *oom_kill_task(t
 			__oom_kill_task(q, message);
 	while_each_thread(g, q);
 
-	return mm;
+	return 0;
 }
 
-static struct mm_struct *oom_kill_process(struct task_struct *p,
-				unsigned long points, const char *message)
+static int oom_kill_process(struct task_struct *p, unsigned long points,
+		const char *message)
 {
- 	struct mm_struct *mm;
 	struct task_struct *c;
 	struct list_head *tsk;
 
@@ -283,9 +280,8 @@ static struct mm_struct *oom_kill_proces
 		c = list_entry(tsk, struct task_struct, sibling);
 		if (c->mm == p->mm)
 			continue;
-		mm = oom_kill_task(c, message);
-		if (mm)
-			return mm;
+		if (!oom_kill_task(c, message))
+			return 0;
 	}
 	return oom_kill_task(p, message);
 }
@@ -300,7 +296,6 @@ static struct mm_struct *oom_kill_proces
  */
 void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order)
 {
-	struct mm_struct *mm = NULL;
 	task_t *p;
 	unsigned long points = 0;
 
@@ -320,12 +315,12 @@ void out_of_memory(struct zonelist *zone
 	 */
 	switch (constrained_alloc(zonelist, gfp_mask)) {
 	case CONSTRAINT_MEMORY_POLICY:
-		mm = oom_kill_process(current, points,
+		oom_kill_process(current, points,
 				"No available memory (MPOL_BIND)");
 		break;
 
 	case CONSTRAINT_CPUSET:
-		mm = oom_kill_process(current, points,
+		oom_kill_process(current, points,
 				"No available memory in cpuset");
 		break;
 
@@ -347,8 +342,7 @@ retry:
 			panic("Out of memory and no killable processes...\n");
 		}
 
-		mm = oom_kill_process(p, points, "Out of memory");
-		if (!mm)
+		if (oom_kill_process(p, points, "Out of memory"))
 			goto retry;
 
 		break;
@@ -357,8 +351,6 @@ retry:
 out:
 	read_unlock(&tasklist_lock);
 	cpuset_unlock();
-	if (mm)
-		mmput(mm);
 
 	/*
 	 * Give "p" a good chance of killing itself before we

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-14 20:49           ` Dave Peterson
@ 2006-04-14 21:31             ` Andrew Morton
  2006-04-14 23:52               ` Dave Peterson
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2006-04-14 21:31 UTC (permalink / raw)
  To: Dave Peterson; +Cc: linux-mm, linux-kernel, riel

Dave Peterson <dsp@llnl.gov> wrote:
>
> Another thing I noticed: oom_kill_task() calls mmput() while holding
> tasklist_lock.

Yes, that'll make my new might_sleep() get upset.

oom_kill_task() doesn't _have_ to run mmput() there - we could propagate
the mm back to the top-level and do the mmput() there.

>  Here the calls to get_task_mm() and mmput() appear to
> be unnecessary.  We shouldn't need to use any kind of locking or
> reference counting since oom_kill_task() doesn't dereference into the
> mm_struct or require the value of p->mm to stay constant.  I believe
> the following (untested) code changes should fix the problem (and
> simplify some other parts of the code).  Does this look correct?

But yes, this looks better.

> 
> diff -urNp -X /home/dsp/dontdiff linux-2.6.17-rc1/mm/oom_kill.c linux-2.6.17-rc1-fix/mm/oom_kill.c
> --- linux-2.6.17-rc1/mm/oom_kill.c	2006-03-19 21:53:29.000000000 -0800
> +++ linux-2.6.17-rc1-fix/mm/oom_kill.c	2006-04-14 13:22:15.000000000 -0700
> @@ -244,17 +244,15 @@ static void __oom_kill_task(task_t *p, c
>  	force_sig(SIGKILL, p);
>  }
>  
> -static struct mm_struct *oom_kill_task(task_t *p, const char *message)
> +static int oom_kill_task(task_t *p, const char *message)
>  {
> -	struct mm_struct *mm = get_task_mm(p);
> +	struct mm_struct *mm;
>  	task_t * g, * q;

Please put a loud comment in here explaining that `mm' may not be dereferenced.

> -	if (!mm)
> -		return NULL;
> -	if (mm == &init_mm) {
> -		mmput(mm);
> -		return NULL;
> -	}
> +	mm = p->mm;
> +
> +	if ((mm == NULL) || (mm == &init_mm))

I think the parenthesisation here is going a bit far ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-14 21:31             ` Andrew Morton
@ 2006-04-14 23:52               ` Dave Peterson
  2006-04-15  0:00                 ` Dave Peterson
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Peterson @ 2006-04-14 23:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel

The patch below fixes oom_kill_task() so it doesn't call mmput()
(which may sleep) while holding tasklist_lock.

Signed-Off-By: David S. Peterson <dsp@llnl.gov>
---

Below is a revised version of my previous OOM killer patch.  I expect
to be away from my computer until Monday, 4/24.  If you have any
problems merging this into your tree, let me know and I'll fix things
up when I get back.

Dave



diff -urNp -X /home/dsp/dontdiff linux-2.6.17-rc1/mm/oom_kill.c linux-2.6.17-rc1-fix/mm/oom_kill.c
--- linux-2.6.17-rc1/mm/oom_kill.c	2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.17-rc1-fix/mm/oom_kill.c	2006-04-14 13:22:15.000000000 -0700
@@ -244,17 +244,15 @@ static void __oom_kill_task(task_t *p, c
 	force_sig(SIGKILL, p);
 }
 
-static struct mm_struct *oom_kill_task(task_t *p, const char *message)
+static int oom_kill_task(task_t *p, const char *message)
 {
-	struct mm_struct *mm = get_task_mm(p);
+	struct mm_struct *mm;
 	task_t * g, * q;
 
-	if (!mm)
-		return NULL;
-	if (mm == &init_mm) {
-		mmput(mm);
-		return NULL;
-	}
+	mm = p->mm;
+
+	if ((mm == NULL) || (mm == &init_mm))
+		return 1;
 
 	__oom_kill_task(p, message);
 	/*
@@ -266,13 +264,12 @@ static struct mm_struct *oom_kill_task(t
 			__oom_kill_task(q, message);
 	while_each_thread(g, q);
 
-	return mm;
+	return 0;
 }
 
-static struct mm_struct *oom_kill_process(struct task_struct *p,
-				unsigned long points, const char *message)
+static int oom_kill_process(struct task_struct *p, unsigned long points,
+		const char *message)
 {
- 	struct mm_struct *mm;
 	struct task_struct *c;
 	struct list_head *tsk;
 
@@ -283,9 +280,8 @@ static struct mm_struct *oom_kill_proces
 		c = list_entry(tsk, struct task_struct, sibling);
 		if (c->mm == p->mm)
 			continue;
-		mm = oom_kill_task(c, message);
-		if (mm)
-			return mm;
+		if (!oom_kill_task(c, message))
+			return 0;
 	}
 	return oom_kill_task(p, message);
 }
@@ -300,7 +296,6 @@ static struct mm_struct *oom_kill_proces
  */
 void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order)
 {
-	struct mm_struct *mm = NULL;
 	task_t *p;
 	unsigned long points = 0;
 
@@ -320,12 +315,12 @@ void out_of_memory(struct zonelist *zone
 	 */
 	switch (constrained_alloc(zonelist, gfp_mask)) {
 	case CONSTRAINT_MEMORY_POLICY:
-		mm = oom_kill_process(current, points,
+		oom_kill_process(current, points,
 				"No available memory (MPOL_BIND)");
 		break;
 
 	case CONSTRAINT_CPUSET:
-		mm = oom_kill_process(current, points,
+		oom_kill_process(current, points,
 				"No available memory in cpuset");
 		break;
 
@@ -347,8 +342,7 @@ retry:
 			panic("Out of memory and no killable processes...\n");
 		}
 
-		mm = oom_kill_process(p, points, "Out of memory");
-		if (!mm)
+		if (oom_kill_process(p, points, "Out of memory"))
 			goto retry;
 
 		break;
@@ -357,8 +351,6 @@ retry:
 out:
 	read_unlock(&tasklist_lock);
 	cpuset_unlock();
-	if (mm)
-		mmput(mm);
 
 	/*
 	 * Give "p" a good chance of killing itself before we

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
  2006-04-14 23:52               ` Dave Peterson
@ 2006-04-15  0:00                 ` Dave Peterson
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Peterson @ 2006-04-15  0:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel

The patch below fixes oom_kill_task() so it doesn't call mmput()
(which may sleep) while holding tasklist_lock.

Signed-Off-By: David S. Peterson <dsp@llnl.gov>
---

On Friday 14 April 2006 16:52, I wrote:
> Below is a revised version of my previous OOM killer patch.  I expect
> to be away from my computer until Monday, 4/24.  If you have any
> problems merging this into your tree, let me know and I'll fix things
> up when I get back.

Arrgghh... I sent the wrong patch by mistake.  Please ignore the previous
one and merge the patch below instead.

Thanks,
Dave



Index: work/mm/oom_kill.c
===================================================================
--- work.orig/mm/oom_kill.c	2006-04-14 16:21:05.000000000 -0700
+++ work/mm/oom_kill.c	2006-04-14 16:34:31.000000000 -0700
@@ -254,17 +254,24 @@ static void __oom_kill_task(task_t *p, c
 	force_sig(SIGKILL, p);
 }
 
-static struct mm_struct *oom_kill_task(task_t *p, const char *message)
+static int oom_kill_task(task_t *p, const char *message)
 {
-	struct mm_struct *mm = get_task_mm(p);
+	struct mm_struct *mm;
 	task_t * g, * q;
 
-	if (!mm)
-		return NULL;
-	if (mm == &init_mm) {
-		mmput(mm);
-		return NULL;
-	}
+	mm = p->mm;
+
+	/* WARNING: mm may not be dereferenced since we did not obtain its
+	 * value from get_task_mm(p).  This is OK since all we need to do is
+	 * compare mm to q->mm below.
+	 *
+	 * Furthermore, even if mm contains a non-NULL value, p->mm may
+	 * change to NULL at any time since we do not hold task_lock(p).
+	 * However, this is of no concern to us.
+	 */
+
+	if (mm == NULL || mm == &init_mm)
+		return 1;
 
 	__oom_kill_task(p, message);
 	/*
@@ -276,13 +283,12 @@ static struct mm_struct *oom_kill_task(t
 			__oom_kill_task(q, message);
 	while_each_thread(g, q);
 
-	return mm;
+	return 0;
 }
 
-static struct mm_struct *oom_kill_process(struct task_struct *p,
-				unsigned long points, const char *message)
+static int oom_kill_process(struct task_struct *p, unsigned long points,
+		const char *message)
 {
- 	struct mm_struct *mm;
 	struct task_struct *c;
 	struct list_head *tsk;
 
@@ -293,9 +299,8 @@ static struct mm_struct *oom_kill_proces
 		c = list_entry(tsk, struct task_struct, sibling);
 		if (c->mm == p->mm)
 			continue;
-		mm = oom_kill_task(c, message);
-		if (mm)
-			return mm;
+		if (!oom_kill_task(c, message))
+			return 0;
 	}
 	return oom_kill_task(p, message);
 }
@@ -310,7 +315,6 @@ static struct mm_struct *oom_kill_proces
  */
 void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order)
 {
-	struct mm_struct *mm = NULL;
 	task_t *p;
 	unsigned long points = 0;
 
@@ -330,12 +334,12 @@ void out_of_memory(struct zonelist *zone
 	 */
 	switch (constrained_alloc(zonelist, gfp_mask)) {
 	case CONSTRAINT_MEMORY_POLICY:
-		mm = oom_kill_process(current, points,
+		oom_kill_process(current, points,
 				"No available memory (MPOL_BIND)");
 		break;
 
 	case CONSTRAINT_CPUSET:
-		mm = oom_kill_process(current, points,
+		oom_kill_process(current, points,
 				"No available memory in cpuset");
 		break;
 
@@ -357,8 +361,7 @@ retry:
 			panic("Out of memory and no killable processes...\n");
 		}
 
-		mm = oom_kill_process(p, points, "Out of memory");
-		if (!mm)
+		if (oom_kill_process(p, points, "Out of memory"))
 			goto retry;
 
 		break;
@@ -367,8 +370,6 @@ retry:
 out:
 	read_unlock(&tasklist_lock);
 	cpuset_unlock();
-	if (mm)
-		mmput(mm);
 
 	/*
 	 * Give "p" a good chance of killing itself before we


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-04-15  0:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-13 21:52 [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c Dave Peterson
2006-04-13 23:24 ` Andrew Morton
2006-04-14  0:44   ` Dave Peterson
2006-04-14  7:26     ` Andrew Morton
2006-04-14 19:14       ` Dave Peterson
2006-04-14 19:45         ` Andrew Morton
2006-04-14 20:49           ` Dave Peterson
2006-04-14 21:31             ` Andrew Morton
2006-04-14 23:52               ` Dave Peterson
2006-04-15  0:00                 ` Dave Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).