* [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c
@ 2006-04-13 21:52 Dave Peterson
2006-04-13 23:24 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Dave Peterson @ 2006-04-13 21:52 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, riel, akpm
The patch below fixes some mm_struct reference counting bugs in
badness().
Signed-Off-By: David S. Peterson <dsp@llnl.gov>
---
Index: linux-2.6.17-rc1-oom/mm/oom_kill.c
===================================================================
--- linux-2.6.17-rc1-oom.orig/mm/oom_kill.c 2006-04-13 14:25:16.000000000 -0700
+++ linux-2.6.17-rc1-oom/mm/oom_kill.c 2006-04-13 14:31:47.000000000 -0700
@@ -47,14 +47,17 @@ unsigned long badness(struct task_struct
{
unsigned long points, cpu_time, run_time, s;
struct list_head *tsk;
+ struct mm_struct *mm, *child_mm;
- if (!p->mm)
+ mm = get_task_mm(p);
+
+ if (mm == NULL)
return 0;
/*
* The memory size of the process is the basis for the badness.
*/
- points = p->mm->total_vm;
+ points = mm->total_vm;
/*
* Processes which fork a lot of child processes are likely
@@ -67,10 +70,21 @@ unsigned long badness(struct task_struct
list_for_each(tsk, &p->children) {
struct task_struct *chld;
chld = list_entry(tsk, struct task_struct, sibling);
- if (chld->mm != p->mm && chld->mm)
- points += chld->mm->total_vm/2 + 1;
+
+ if (chld->mm == mm)
+ continue;
+
+ child_mm = get_task_mm(chld);
+
+ if (child_mm == NULL)
+ continue;
+
+ points += child_mm->total_vm/2 + 1;
+ mmput(child_mm);
}
+ mmput(mm);
+
/*
* CPU time is in tens of seconds and run time is in thousands
* of seconds. There is no particular reason for this other than
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-13 21:52 [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c Dave Peterson @ 2006-04-13 23:24 ` Andrew Morton 2006-04-14 0:44 ` Dave Peterson 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2006-04-13 23:24 UTC (permalink / raw) To: Dave Peterson; +Cc: linux-mm, linux-kernel, riel Dave Peterson <dsp@llnl.gov> wrote: > > The patch below fixes some mm_struct reference counting bugs in > badness(). hm, OK, afaict the code _is_ racy. But you're now calling mmput() inside read_lock(&tasklist_lock), and mmput() can sleep in exit_aio() or in exit_mmap()->unmap_vmas(). So sterner stuff will be needed. I'll put a might_sleep() into mmput - it's a bit unexpected. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-13 23:24 ` Andrew Morton @ 2006-04-14 0:44 ` Dave Peterson 2006-04-14 7:26 ` Andrew Morton 0 siblings, 1 reply; 10+ messages in thread From: Dave Peterson @ 2006-04-14 0:44 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel On Thursday 13 April 2006 16:24, Andrew Morton wrote: > Dave Peterson <dsp@llnl.gov> wrote: > > The patch below fixes some mm_struct reference counting bugs in > > badness(). > > hm, OK, afaict the code _is_ racy. > > But you're now calling mmput() inside read_lock(&tasklist_lock), and > mmput() can sleep in exit_aio() or in exit_mmap()->unmap_vmas(). So > sterner stuff will be needed. > > I'll put a might_sleep() into mmput - it's a bit unexpected. Hmm... fixing this looks rather tricky. If get_task_mm()/mmput() was only being done on a single mm_struct then I suppose badness() could do something a bit ugly like passing the reference back to its caller and letting the caller do the mmput() once tasklist_lock is no longer held. However here we are iterating over a bunch of child tasks, potentially doing a get_task_mm()/mmput() for a number of them. I have a suggestion for a possible solution. Currently mmput() is implemented as follows: 01 void mmput(struct mm_struct *mm) 02 { 03 if (atomic_dec_and_lock(&mm->mm_users, &mmlist_lock)) { 04 list_del(&mm->mmlist); 05 mmlist_nr--; 06 spin_unlock(&mmlist_lock); 07 exit_aio(mm); 08 exit_mmap(mm); 09 put_swap_token(mm); 10 mmdrop(mm); 11 } 12 } Suppose we replace lines 07-10 with a little piece of code that adds the mm_struct to a list. Then a kernel thread empties the list (perhaps via the work queue mechanism), doing the stuff in lines 07-10 for each mm_struct. This would eliminate the possibility of mmput() sleeping, potentially making things easier for other callers of mmput() and causing fewer surprises. Any comments? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-14 0:44 ` Dave Peterson @ 2006-04-14 7:26 ` Andrew Morton 2006-04-14 19:14 ` Dave Peterson 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2006-04-14 7:26 UTC (permalink / raw) To: Dave Peterson; +Cc: linux-mm, linux-kernel, riel Dave Peterson <dsp@llnl.gov> wrote: > > On Thursday 13 April 2006 16:24, Andrew Morton wrote: > > Dave Peterson <dsp@llnl.gov> wrote: > > > The patch below fixes some mm_struct reference counting bugs in > > > badness(). > > > > hm, OK, afaict the code _is_ racy. > > > > But you're now calling mmput() inside read_lock(&tasklist_lock), and > > mmput() can sleep in exit_aio() or in exit_mmap()->unmap_vmas(). So > > sterner stuff will be needed. > > > > I'll put a might_sleep() into mmput - it's a bit unexpected. > > Hmm... fixing this looks rather tricky. If get_task_mm()/mmput() was > only being done on a single mm_struct then I suppose badness() could > do something a bit ugly like passing the reference back to its caller > and letting the caller do the mmput() once tasklist_lock is no longer > held. However here we are iterating over a bunch of child tasks, > potentially doing a get_task_mm()/mmput() for a number of them. > > I have a suggestion for a possible solution. Currently mmput() is > implemented as follows: > > 01 void mmput(struct mm_struct *mm) > 02 { > 03 if (atomic_dec_and_lock(&mm->mm_users, &mmlist_lock)) { > 04 list_del(&mm->mmlist); > 05 mmlist_nr--; > 06 spin_unlock(&mmlist_lock); > 07 exit_aio(mm); > 08 exit_mmap(mm); > 09 put_swap_token(mm); > 10 mmdrop(mm); > 11 } > 12 } > > Suppose we replace lines 07-10 with a little piece of code that adds > the mm_struct to a list. Then a kernel thread empties the list > (perhaps via the work queue mechanism), doing the stuff in lines > 07-10 for each mm_struct. This would eliminate the possibility of > mmput() sleeping, potentially making things easier for other callers > of mmput() and causing fewer surprises. Any comments? task_lock() can be used to pin a task's ->mm. To use task_lock() in badness() we'd need to either a) nest task_lock()s. I don't know if we're doing that anywhere else, but the parent->child ordering is a natural one. or b) take a ref on the parent's mm_struct, drop the parent's task_lock() while we walk the children, then do mmput() on the parent's mm outside tasklist_lock. This is probably better. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-14 7:26 ` Andrew Morton @ 2006-04-14 19:14 ` Dave Peterson 2006-04-14 19:45 ` Andrew Morton 0 siblings, 1 reply; 10+ messages in thread From: Dave Peterson @ 2006-04-14 19:14 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel On Friday 14 April 2006 00:26, Andrew Morton wrote: > task_lock() can be used to pin a task's ->mm. To use task_lock() in > badness() we'd need to either > > a) nest task_lock()s. I don't know if we're doing that anywhere else, > but the parent->child ordering is a natural one. or > > b) take a ref on the parent's mm_struct, drop the parent's task_lock() > while we walk the children, then do mmput() on the parent's mm outside > tasklist_lock. This is probably better. Looking a bit more closely at the code, I see that select_bad_process() iterates over all tasks, repeatedly calling badness(). This would complicate option 'b' since the iteration is done while holding tasklist_lock. An alternative to option 'a' that avoids nesting task_lock()s would be to define a couple of new functions that might look something like this: void mmput_atomic(struct mm_struct *mm) { if (atomic_dec_and_test(&mm->mm_users)) { add mm to a global list of expired mm_structs } } void mmput_atomic_cleanup(void) { empty the global list of expired mm_structs and do cleanup stuff for each one } Then you could call mmput_atomic() an arbitrary # of times in places where sleeping is not permitted, as long as mmput_atomic_cleanup() is later called in a place where sleeping is permissible. In the case of the OOM killer code, a call to mmput_atomic_cleanup() could be added to out_of_memory() in a place where we no longer hold tasklist_lock. Let me know if you have a preference for either of these options, or if you have other suggestions. Thanks, Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-14 19:14 ` Dave Peterson @ 2006-04-14 19:45 ` Andrew Morton 2006-04-14 20:49 ` Dave Peterson 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2006-04-14 19:45 UTC (permalink / raw) To: Dave Peterson; +Cc: linux-mm, linux-kernel, riel Dave Peterson <dsp@llnl.gov> wrote: > > On Friday 14 April 2006 00:26, Andrew Morton wrote: > > task_lock() can be used to pin a task's ->mm. To use task_lock() in > > badness() we'd need to either > > > > a) nest task_lock()s. I don't know if we're doing that anywhere else, > > but the parent->child ordering is a natural one. or > > > > b) take a ref on the parent's mm_struct, drop the parent's task_lock() > > while we walk the children, then do mmput() on the parent's mm outside > > tasklist_lock. This is probably better. > > Looking a bit more closely at the code, I see that > select_bad_process() iterates over all tasks, repeatedly calling > badness(). This would complicate option 'b' since the iteration is > done while holding tasklist_lock. An alternative to option 'a' that > avoids nesting task_lock()s would be to define a couple of new > functions that might look something like this: > > void mmput_atomic(struct mm_struct *mm) > { > if (atomic_dec_and_test(&mm->mm_users)) { > add mm to a global list of expired mm_structs > } > } > > void mmput_atomic_cleanup(void) > { > empty the global list of expired mm_structs and do > cleanup stuff for each one > } I think that's way too elaborate. What's wrong with this? --- 25/mm/oom_kill.c~a Fri Apr 14 12:37:51 2006 +++ 25-akpm/mm/oom_kill.c Fri Apr 14 12:44:49 2006 @@ -47,15 +47,25 @@ int sysctl_panic_on_oom; unsigned long badness(struct task_struct *p, unsigned long uptime) { unsigned long points, cpu_time, run_time, s; - struct list_head *tsk; + struct mm_struct *mm; + struct task_struct *child; - if (!p->mm) + task_lock(p); + mm = p->mm; + if (!mm) { + task_unlock(p); return 0; + } /* * The memory size of the process is the basis for the badness. */ - points = p->mm->total_vm; + points = mm->total_vm; + + /* + * After this unlock we can no longer dereference local variable `mm' + */ + task_unlock(p); /* * Processes which fork a lot of child processes are likely @@ -65,11 +75,11 @@ unsigned long badness(struct task_struct * child is eating the vast majority of memory, adding only half * to the parents will make the child our kill candidate of choice. */ - list_for_each(tsk, &p->children) { - struct task_struct *chld; - chld = list_entry(tsk, struct task_struct, sibling); - if (chld->mm != p->mm && chld->mm) - points += chld->mm->total_vm/2 + 1; + list_for_each_entry(child, &p->children, sibling) { + task_lock(child); + if (child->mm != mm && child->mm) + points += child->mm->total_vm/2 + 1; + task_unlock(child); } /* _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-14 19:45 ` Andrew Morton @ 2006-04-14 20:49 ` Dave Peterson 2006-04-14 21:31 ` Andrew Morton 0 siblings, 1 reply; 10+ messages in thread From: Dave Peterson @ 2006-04-14 20:49 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel On Friday 14 April 2006 12:45, Andrew Morton wrote: > Dave Peterson <dsp@llnl.gov> wrote: > > On Friday 14 April 2006 00:26, Andrew Morton wrote: > > > task_lock() can be used to pin a task's ->mm. To use task_lock() in > > > badness() we'd need to either > > > > > > a) nest task_lock()s. I don't know if we're doing that anywhere else, > > > but the parent->child ordering is a natural one. or > > > > > > b) take a ref on the parent's mm_struct, drop the parent's task_lock() > > > while we walk the children, then do mmput() on the parent's mm > > > outside tasklist_lock. This is probably better. > > > > Looking a bit more closely at the code, I see that > > select_bad_process() iterates over all tasks, repeatedly calling > > badness(). This would complicate option 'b' since the iteration is > > done while holding tasklist_lock. An alternative to option 'a' that > > avoids nesting task_lock()s would be to define a couple of new > > functions that might look something like this: > > > > void mmput_atomic(struct mm_struct *mm) > > { > > if (atomic_dec_and_test(&mm->mm_users)) { > > add mm to a global list of expired mm_structs > > } > > } > > > > void mmput_atomic_cleanup(void) > > { > > empty the global list of expired mm_structs and do > > cleanup stuff for each one > > } > > I think that's way too elaborate. > > What's wrong with this? Yes of course... no need to nest task_lock()s after all. Looks good to me. Another thing I noticed: oom_kill_task() calls mmput() while holding tasklist_lock. Here the calls to get_task_mm() and mmput() appear to be unnecessary. We shouldn't need to use any kind of locking or reference counting since oom_kill_task() doesn't dereference into the mm_struct or require the value of p->mm to stay constant. I believe the following (untested) code changes should fix the problem (and simplify some other parts of the code). Does this look correct? diff -urNp -X /home/dsp/dontdiff linux-2.6.17-rc1/mm/oom_kill.c linux-2.6.17-rc1-fix/mm/oom_kill.c --- linux-2.6.17-rc1/mm/oom_kill.c 2006-03-19 21:53:29.000000000 -0800 +++ linux-2.6.17-rc1-fix/mm/oom_kill.c 2006-04-14 13:22:15.000000000 -0700 @@ -244,17 +244,15 @@ static void __oom_kill_task(task_t *p, c force_sig(SIGKILL, p); } -static struct mm_struct *oom_kill_task(task_t *p, const char *message) +static int oom_kill_task(task_t *p, const char *message) { - struct mm_struct *mm = get_task_mm(p); + struct mm_struct *mm; task_t * g, * q; - if (!mm) - return NULL; - if (mm == &init_mm) { - mmput(mm); - return NULL; - } + mm = p->mm; + + if ((mm == NULL) || (mm == &init_mm)) + return 1; __oom_kill_task(p, message); /* @@ -266,13 +264,12 @@ static struct mm_struct *oom_kill_task(t __oom_kill_task(q, message); while_each_thread(g, q); - return mm; + return 0; } -static struct mm_struct *oom_kill_process(struct task_struct *p, - unsigned long points, const char *message) +static int oom_kill_process(struct task_struct *p, unsigned long points, + const char *message) { - struct mm_struct *mm; struct task_struct *c; struct list_head *tsk; @@ -283,9 +280,8 @@ static struct mm_struct *oom_kill_proces c = list_entry(tsk, struct task_struct, sibling); if (c->mm == p->mm) continue; - mm = oom_kill_task(c, message); - if (mm) - return mm; + if (!oom_kill_task(c, message)) + return 0; } return oom_kill_task(p, message); } @@ -300,7 +296,6 @@ static struct mm_struct *oom_kill_proces */ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order) { - struct mm_struct *mm = NULL; task_t *p; unsigned long points = 0; @@ -320,12 +315,12 @@ void out_of_memory(struct zonelist *zone */ switch (constrained_alloc(zonelist, gfp_mask)) { case CONSTRAINT_MEMORY_POLICY: - mm = oom_kill_process(current, points, + oom_kill_process(current, points, "No available memory (MPOL_BIND)"); break; case CONSTRAINT_CPUSET: - mm = oom_kill_process(current, points, + oom_kill_process(current, points, "No available memory in cpuset"); break; @@ -347,8 +342,7 @@ retry: panic("Out of memory and no killable processes...\n"); } - mm = oom_kill_process(p, points, "Out of memory"); - if (!mm) + if (oom_kill_process(p, points, "Out of memory")) goto retry; break; @@ -357,8 +351,6 @@ retry: out: read_unlock(&tasklist_lock); cpuset_unlock(); - if (mm) - mmput(mm); /* * Give "p" a good chance of killing itself before we -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-14 20:49 ` Dave Peterson @ 2006-04-14 21:31 ` Andrew Morton 2006-04-14 23:52 ` Dave Peterson 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2006-04-14 21:31 UTC (permalink / raw) To: Dave Peterson; +Cc: linux-mm, linux-kernel, riel Dave Peterson <dsp@llnl.gov> wrote: > > Another thing I noticed: oom_kill_task() calls mmput() while holding > tasklist_lock. Yes, that'll make my new might_sleep() get upset. oom_kill_task() doesn't _have_ to run mmput() there - we could propagate the mm back to the top-level and do the mmput() there. > Here the calls to get_task_mm() and mmput() appear to > be unnecessary. We shouldn't need to use any kind of locking or > reference counting since oom_kill_task() doesn't dereference into the > mm_struct or require the value of p->mm to stay constant. I believe > the following (untested) code changes should fix the problem (and > simplify some other parts of the code). Does this look correct? But yes, this looks better. > > diff -urNp -X /home/dsp/dontdiff linux-2.6.17-rc1/mm/oom_kill.c linux-2.6.17-rc1-fix/mm/oom_kill.c > --- linux-2.6.17-rc1/mm/oom_kill.c 2006-03-19 21:53:29.000000000 -0800 > +++ linux-2.6.17-rc1-fix/mm/oom_kill.c 2006-04-14 13:22:15.000000000 -0700 > @@ -244,17 +244,15 @@ static void __oom_kill_task(task_t *p, c > force_sig(SIGKILL, p); > } > > -static struct mm_struct *oom_kill_task(task_t *p, const char *message) > +static int oom_kill_task(task_t *p, const char *message) > { > - struct mm_struct *mm = get_task_mm(p); > + struct mm_struct *mm; > task_t * g, * q; Please put a loud comment in here explaining that `mm' may not be dereferenced. > - if (!mm) > - return NULL; > - if (mm == &init_mm) { > - mmput(mm); > - return NULL; > - } > + mm = p->mm; > + > + if ((mm == NULL) || (mm == &init_mm)) I think the parenthesisation here is going a bit far ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-14 21:31 ` Andrew Morton @ 2006-04-14 23:52 ` Dave Peterson 2006-04-15 0:00 ` Dave Peterson 0 siblings, 1 reply; 10+ messages in thread From: Dave Peterson @ 2006-04-14 23:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel The patch below fixes oom_kill_task() so it doesn't call mmput() (which may sleep) while holding tasklist_lock. Signed-Off-By: David S. Peterson <dsp@llnl.gov> --- Below is a revised version of my previous OOM killer patch. I expect to be away from my computer until Monday, 4/24. If you have any problems merging this into your tree, let me know and I'll fix things up when I get back. Dave diff -urNp -X /home/dsp/dontdiff linux-2.6.17-rc1/mm/oom_kill.c linux-2.6.17-rc1-fix/mm/oom_kill.c --- linux-2.6.17-rc1/mm/oom_kill.c 2006-03-19 21:53:29.000000000 -0800 +++ linux-2.6.17-rc1-fix/mm/oom_kill.c 2006-04-14 13:22:15.000000000 -0700 @@ -244,17 +244,15 @@ static void __oom_kill_task(task_t *p, c force_sig(SIGKILL, p); } -static struct mm_struct *oom_kill_task(task_t *p, const char *message) +static int oom_kill_task(task_t *p, const char *message) { - struct mm_struct *mm = get_task_mm(p); + struct mm_struct *mm; task_t * g, * q; - if (!mm) - return NULL; - if (mm == &init_mm) { - mmput(mm); - return NULL; - } + mm = p->mm; + + if ((mm == NULL) || (mm == &init_mm)) + return 1; __oom_kill_task(p, message); /* @@ -266,13 +264,12 @@ static struct mm_struct *oom_kill_task(t __oom_kill_task(q, message); while_each_thread(g, q); - return mm; + return 0; } -static struct mm_struct *oom_kill_process(struct task_struct *p, - unsigned long points, const char *message) +static int oom_kill_process(struct task_struct *p, unsigned long points, + const char *message) { - struct mm_struct *mm; struct task_struct *c; struct list_head *tsk; @@ -283,9 +280,8 @@ static struct mm_struct *oom_kill_proces c = list_entry(tsk, struct task_struct, sibling); if (c->mm == p->mm) continue; - mm = oom_kill_task(c, message); - if (mm) - return mm; + if (!oom_kill_task(c, message)) + return 0; } return oom_kill_task(p, message); } @@ -300,7 +296,6 @@ static struct mm_struct *oom_kill_proces */ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order) { - struct mm_struct *mm = NULL; task_t *p; unsigned long points = 0; @@ -320,12 +315,12 @@ void out_of_memory(struct zonelist *zone */ switch (constrained_alloc(zonelist, gfp_mask)) { case CONSTRAINT_MEMORY_POLICY: - mm = oom_kill_process(current, points, + oom_kill_process(current, points, "No available memory (MPOL_BIND)"); break; case CONSTRAINT_CPUSET: - mm = oom_kill_process(current, points, + oom_kill_process(current, points, "No available memory in cpuset"); break; @@ -347,8 +342,7 @@ retry: panic("Out of memory and no killable processes...\n"); } - mm = oom_kill_process(p, points, "Out of memory"); - if (!mm) + if (oom_kill_process(p, points, "Out of memory")) goto retry; break; @@ -357,8 +351,6 @@ retry: out: read_unlock(&tasklist_lock); cpuset_unlock(); - if (mm) - mmput(mm); /* * Give "p" a good chance of killing itself before we -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-14 23:52 ` Dave Peterson @ 2006-04-15 0:00 ` Dave Peterson 0 siblings, 0 replies; 10+ messages in thread From: Dave Peterson @ 2006-04-15 0:00 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, linux-kernel, riel The patch below fixes oom_kill_task() so it doesn't call mmput() (which may sleep) while holding tasklist_lock. Signed-Off-By: David S. Peterson <dsp@llnl.gov> --- On Friday 14 April 2006 16:52, I wrote: > Below is a revised version of my previous OOM killer patch. I expect > to be away from my computer until Monday, 4/24. If you have any > problems merging this into your tree, let me know and I'll fix things > up when I get back. Arrgghh... I sent the wrong patch by mistake. Please ignore the previous one and merge the patch below instead. Thanks, Dave Index: work/mm/oom_kill.c =================================================================== --- work.orig/mm/oom_kill.c 2006-04-14 16:21:05.000000000 -0700 +++ work/mm/oom_kill.c 2006-04-14 16:34:31.000000000 -0700 @@ -254,17 +254,24 @@ static void __oom_kill_task(task_t *p, c force_sig(SIGKILL, p); } -static struct mm_struct *oom_kill_task(task_t *p, const char *message) +static int oom_kill_task(task_t *p, const char *message) { - struct mm_struct *mm = get_task_mm(p); + struct mm_struct *mm; task_t * g, * q; - if (!mm) - return NULL; - if (mm == &init_mm) { - mmput(mm); - return NULL; - } + mm = p->mm; + + /* WARNING: mm may not be dereferenced since we did not obtain its + * value from get_task_mm(p). This is OK since all we need to do is + * compare mm to q->mm below. + * + * Furthermore, even if mm contains a non-NULL value, p->mm may + * change to NULL at any time since we do not hold task_lock(p). + * However, this is of no concern to us. + */ + + if (mm == NULL || mm == &init_mm) + return 1; __oom_kill_task(p, message); /* @@ -276,13 +283,12 @@ static struct mm_struct *oom_kill_task(t __oom_kill_task(q, message); while_each_thread(g, q); - return mm; + return 0; } -static struct mm_struct *oom_kill_process(struct task_struct *p, - unsigned long points, const char *message) +static int oom_kill_process(struct task_struct *p, unsigned long points, + const char *message) { - struct mm_struct *mm; struct task_struct *c; struct list_head *tsk; @@ -293,9 +299,8 @@ static struct mm_struct *oom_kill_proces c = list_entry(tsk, struct task_struct, sibling); if (c->mm == p->mm) continue; - mm = oom_kill_task(c, message); - if (mm) - return mm; + if (!oom_kill_task(c, message)) + return 0; } return oom_kill_task(p, message); } @@ -310,7 +315,6 @@ static struct mm_struct *oom_kill_proces */ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order) { - struct mm_struct *mm = NULL; task_t *p; unsigned long points = 0; @@ -330,12 +334,12 @@ void out_of_memory(struct zonelist *zone */ switch (constrained_alloc(zonelist, gfp_mask)) { case CONSTRAINT_MEMORY_POLICY: - mm = oom_kill_process(current, points, + oom_kill_process(current, points, "No available memory (MPOL_BIND)"); break; case CONSTRAINT_CPUSET: - mm = oom_kill_process(current, points, + oom_kill_process(current, points, "No available memory in cpuset"); break; @@ -357,8 +361,7 @@ retry: panic("Out of memory and no killable processes...\n"); } - mm = oom_kill_process(p, points, "Out of memory"); - if (!mm) + if (oom_kill_process(p, points, "Out of memory")) goto retry; break; @@ -367,8 +370,6 @@ retry: out: read_unlock(&tasklist_lock); cpuset_unlock(); - if (mm) - mmput(mm); /* * Give "p" a good chance of killing itself before we -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-04-15 0:00 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-13 21:52 [PATCH 2/2] mm: fix mm_struct reference counting bugs in mm/oom_kill.c Dave Peterson 2006-04-13 23:24 ` Andrew Morton 2006-04-14 0:44 ` Dave Peterson 2006-04-14 7:26 ` Andrew Morton 2006-04-14 19:14 ` Dave Peterson 2006-04-14 19:45 ` Andrew Morton 2006-04-14 20:49 ` Dave Peterson 2006-04-14 21:31 ` Andrew Morton 2006-04-14 23:52 ` Dave Peterson 2006-04-15 0:00 ` Dave Peterson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).