* [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed
@ 2010-08-15 4:30 David Rientjes
2010-08-15 4:31 ` [patch 2/2] oom: kill all threads sharing oom killed task's mm David Rientjes
2010-08-15 15:18 ` [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed Oleg Nesterov
0 siblings, 2 replies; 9+ messages in thread
From: David Rientjes @ 2010-08-15 4:30 UTC (permalink / raw)
To: Andrew Morton
Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Oleg Nesterov, Rik van Riel,
linux-mm
The oom killer's goal is to kill a memory-hogging task so that it may
exit, free its memory, and allow the current context to allocate the
memory that triggered it in the first place. Thus, killing a task is
pointless if other threads sharing its mm cannot be killed because of its
/proc/pid/oom_adj or /proc/pid/oom_score_adj value.
This patch checks all user threads on the system to determine whether
oom_badness(p) should return 0 for p, which means it should not be killed.
If a thread shares p's mm and is unkillable, p is considered to be
unkillable as well.
Kthreads are not considered toward this rule since they only temporarily
assume a task's mm via use_mm().
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/oom_kill.c | 30 +++++++++++++++++++++++-------
1 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -83,6 +83,27 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
#endif /* CONFIG_NUMA */
/*
+ * Determines whether an mm is unfreeable since a user thread attached to
+ * it cannot be killed. Kthreads only temporarily assume a thread's mm,
+ * so they are not considered.
+ *
+ * mm need not be protected by task_lock() since it will not be
+ * dereferened.
+ */
+static bool is_mm_unfreeable(struct mm_struct *mm)
+{
+ struct task_struct *g, *q;
+
+ do_each_thread(g, q) {
+ if (q->mm == mm && !(q->flags & PF_KTHREAD) &&
+ q->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
+ return true;
+ } while_each_thread(g, q);
+
+ return false;
+}
+
+/*
* If this is a system OOM (not a memcg OOM) and the task selected to be
* killed is not already running at high (RT) priorities, speed up the
* recovery by boosting the dying task to the lowest FIFO priority.
@@ -160,12 +181,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
p = find_lock_task_mm(p);
if (!p)
return 0;
-
- /*
- * Shortcut check for OOM_SCORE_ADJ_MIN so the entire heuristic doesn't
- * need to be executed for something that cannot be killed.
- */
- if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
+ if (is_mm_unfreeable(p->mm)) {
task_unlock(p);
return 0;
}
@@ -675,7 +691,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
read_lock(&tasklist_lock);
if (sysctl_oom_kill_allocating_task &&
!oom_unkillable_task(current, NULL, nodemask) &&
- (current->signal->oom_adj != OOM_DISABLE)) {
+ !is_mm_unfreeable(current->mm)) {
/*
* oom_kill_process() needs tasklist_lock held. If it returns
* non-zero, current could not be killed so we must fallback to
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 2/2] oom: kill all threads sharing oom killed task's mm
2010-08-15 4:30 [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed David Rientjes
@ 2010-08-15 4:31 ` David Rientjes
2010-08-15 15:45 ` Oleg Nesterov
2010-08-15 15:18 ` [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed Oleg Nesterov
1 sibling, 1 reply; 9+ messages in thread
From: David Rientjes @ 2010-08-15 4:31 UTC (permalink / raw)
To: Andrew Morton
Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Oleg Nesterov, Rik van Riel,
linux-mm
It's necessary to kill all threads that share an oom killed task's mm if
the goal is to lead to future memory freeing.
This patch reintroduces the code removed in 8c5cd6f3 (oom: oom_kill
doesn't kill vfork parent (or child)) since it is obsoleted.
It's now guaranteed that any task passed to oom_kill_task() does not
share an mm with any thread that is unkillable. Thus, we're safe to
issue a SIGKILL to any thread sharing the same mm.
This is especially necessary to solve an mm->mmap_sem livelock issue
whereas an oom killed thread must acquire the lock in the exit path while
another thread is holding it in the page allocator while trying to
allocate memory itself (and will preempt the oom killer since a task was
already killed). Since tasks with pending fatal signals are now granted
access to memory reserves, the thread holding the lock may quickly
allocate and release the lock so that the oom killed task may exit.
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/oom_kill.c | 22 +++++++++++++++++++++-
1 files changed, 21 insertions(+), 1 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -416,18 +416,24 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
#define K(x) ((x) << (PAGE_SHIFT-10))
static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
{
+ struct task_struct *g, *q;
+ struct mm_struct *mm;
+
p = find_lock_task_mm(p);
if (!p) {
task_unlock(p);
return 1;
}
+
+ /* mm cannot be safely dereferenced after task_unlock(p) */
+ mm = p->mm;
+
pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
task_pid_nr(p), p->comm, K(p->mm->total_vm),
K(get_mm_counter(p->mm, MM_ANONPAGES)),
K(get_mm_counter(p->mm, MM_FILEPAGES)));
task_unlock(p);
-
set_tsk_thread_flag(p, TIF_MEMDIE);
force_sig(SIGKILL, p);
@@ -438,6 +444,20 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
*/
boost_dying_task_prio(p, mem);
+ /*
+ * Kill all threads sharing p->mm in other thread groups, if any. They
+ * don't get access to memory reserves or a higher scheduler priority,
+ * though, to avoid depletion of all memory or task starvation. This
+ * prevents mm->mmap_sem livelock when an oom killed task cannot exit
+ * because it requires the semaphore and its contended by another
+ * thread trying to allocate memory itself. That thread will now get
+ * access to memory reserves since it has a pending fatal signal.
+ */
+ do_each_thread(g, q) {
+ if (q->mm == mm && !same_thread_group(q, p))
+ force_sig(SIGKILL, q);
+ } while_each_thread(g, q);
+
return 0;
}
#undef K
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/2] oom: kill all threads sharing oom killed task's mm
2010-08-15 4:31 ` [patch 2/2] oom: kill all threads sharing oom killed task's mm David Rientjes
@ 2010-08-15 15:45 ` Oleg Nesterov
2010-08-15 21:28 ` David Rientjes
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-08-15 15:45 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
linux-mm
Again, I do not know how the code looks without the patch, but
On 08/14, David Rientjes wrote:
>
> static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> {
> + struct task_struct *g, *q;
> + struct mm_struct *mm;
> +
> p = find_lock_task_mm(p);
> if (!p) {
> task_unlock(p);
> return 1;
> }
> +
> + /* mm cannot be safely dereferenced after task_unlock(p) */
Yes. But also we can't trust this pointer, see below.
> + mm = p->mm;
> +
> pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
> task_pid_nr(p), p->comm, K(p->mm->total_vm),
> K(get_mm_counter(p->mm, MM_ANONPAGES)),
> K(get_mm_counter(p->mm, MM_FILEPAGES)));
> task_unlock(p);
>
> -
> set_tsk_thread_flag(p, TIF_MEMDIE);
> force_sig(SIGKILL, p);
So, we killed this process. It is very possible it was the only user
of this ->mm. exit_mm() can free this mmemory. After that another task
execs, exec_mmap() can allocate the same memory again.
Now,
> @@ -438,6 +444,20 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> */
> boost_dying_task_prio(p, mem);
>
> + /*
> + * Kill all threads sharing p->mm in other thread groups, if any. They
> + * don't get access to memory reserves or a higher scheduler priority,
> + * though, to avoid depletion of all memory or task starvation. This
> + * prevents mm->mmap_sem livelock when an oom killed task cannot exit
> + * because it requires the semaphore and its contended by another
> + * thread trying to allocate memory itself. That thread will now get
> + * access to memory reserves since it has a pending fatal signal.
> + */
> + do_each_thread(g, q) {
> + if (q->mm == mm && !same_thread_group(q, p))
> + force_sig(SIGKILL, q);
> + } while_each_thread(g, q);
We can kill the wrong task. "q->mm == mm" doesn't necessarily mean
we found the task which shares ->mm with p (see above).
This needs atomic_inc(mm_users). And please do not use do_each_thread.
David, I apologize in advance if I won't reply to your futher emails.
I don't have the time for the kernel hacking at all.
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/2] oom: kill all threads sharing oom killed task's mm
2010-08-15 15:45 ` Oleg Nesterov
@ 2010-08-15 21:28 ` David Rientjes
2010-08-16 6:00 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: David Rientjes @ 2010-08-15 21:28 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
linux-mm
On Sun, 15 Aug 2010, Oleg Nesterov wrote:
> Again, I do not know how the code looks without the patch, but
>
Why not? This series is based on Linus' tree.
> > static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> > {
> > + struct task_struct *g, *q;
> > + struct mm_struct *mm;
> > +
> > p = find_lock_task_mm(p);
> > if (!p) {
> > task_unlock(p);
> > return 1;
> > }
> > +
> > + /* mm cannot be safely dereferenced after task_unlock(p) */
>
> Yes. But also we can't trust this pointer, see below.
>
> > + mm = p->mm;
> > +
> > pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
> > task_pid_nr(p), p->comm, K(p->mm->total_vm),
> > K(get_mm_counter(p->mm, MM_ANONPAGES)),
> > K(get_mm_counter(p->mm, MM_FILEPAGES)));
> > task_unlock(p);
> >
> > -
> > set_tsk_thread_flag(p, TIF_MEMDIE);
> > force_sig(SIGKILL, p);
>
> So, we killed this process. It is very possible it was the only user
> of this ->mm. exit_mm() can free this mmemory. After that another task
> execs, exec_mmap() can allocate the same memory again.
>
Right, this was a race in the original code as well before it was removed
in 8c5cd6f3 and existed for years.
> > @@ -438,6 +444,20 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> > */
> > boost_dying_task_prio(p, mem);
> >
> > + /*
> > + * Kill all threads sharing p->mm in other thread groups, if any. They
> > + * don't get access to memory reserves or a higher scheduler priority,
> > + * though, to avoid depletion of all memory or task starvation. This
> > + * prevents mm->mmap_sem livelock when an oom killed task cannot exit
> > + * because it requires the semaphore and its contended by another
> > + * thread trying to allocate memory itself. That thread will now get
> > + * access to memory reserves since it has a pending fatal signal.
> > + */
> > + do_each_thread(g, q) {
> > + if (q->mm == mm && !same_thread_group(q, p))
> > + force_sig(SIGKILL, q);
> > + } while_each_thread(g, q);
>
> We can kill the wrong task. "q->mm == mm" doesn't necessarily mean
> we found the task which shares ->mm with p (see above).
>
> This needs atomic_inc(mm_users). And please do not use do_each_thread.
>
Instead of using mm_users to pin the mm, we could simply do this iteration
with for_each_process() before sending the SIGKILL to p.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/2] oom: kill all threads sharing oom killed task's mm
2010-08-15 21:28 ` David Rientjes
@ 2010-08-16 6:00 ` Oleg Nesterov
0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2010-08-16 6:00 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
linux-mm
On 08/15, David Rientjes wrote:
>
> On Sun, 15 Aug 2010, Oleg Nesterov wrote:
>
> > Again, I do not know how the code looks without the patch, but
>
> Why not? This series is based on Linus' tree.
OK, thanks...
> > > + do_each_thread(g, q) {
> > > + if (q->mm == mm && !same_thread_group(q, p))
> > > + force_sig(SIGKILL, q);
> > > + } while_each_thread(g, q);
> >
> > We can kill the wrong task. "q->mm == mm" doesn't necessarily mean
> > we found the task which shares ->mm with p (see above).
> >
> > This needs atomic_inc(mm_users). And please do not use do_each_thread.
>
> Instead of using mm_users to pin the mm, we could simply do this iteration
> with for_each_process() before sending the SIGKILL to p.
Yes, this should work too. (I'd prefer to not take ->siglock under
task->alloc_lock, but currently this is correct and happens anyway).
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed
2010-08-15 4:30 [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed David Rientjes
2010-08-15 4:31 ` [patch 2/2] oom: kill all threads sharing oom killed task's mm David Rientjes
@ 2010-08-15 15:18 ` Oleg Nesterov
2010-08-15 21:23 ` David Rientjes
1 sibling, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-08-15 15:18 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
linux-mm
Well. I shouldn't try to comment this patch because I do not know
the state of the current code (and I do not understand the changelog).
Still, it looks a bit strange to me.
On 08/14, David Rientjes wrote:
>
> + * Determines whether an mm is unfreeable since a user thread attached to
> + * it cannot be killed. Kthreads only temporarily assume a thread's mm,
> + * so they are not considered.
> + *
> + * mm need not be protected by task_lock() since it will not be
> + * dereferened.
> + */
> +static bool is_mm_unfreeable(struct mm_struct *mm)
> +{
> + struct task_struct *g, *q;
> +
> + do_each_thread(g, q) {
> + if (q->mm == mm && !(q->flags & PF_KTHREAD) &&
> + q->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
> + return true;
> + } while_each_thread(g, q);
do_each_thread() doesn't look good. All sub-threads have the same ->mm.
for_each_process(p) {
if (p->flags && PF_KTHREAD)
continue;
do {
if (!t->mm)
continue;
if (t->mm != mm)
break;
if (t->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
return true;
} while_each_thread(p, t);
}
return false;
However, even if is_mm_unfreeable() uses for_each_process(),
> @@ -160,12 +181,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> p = find_lock_task_mm(p);
> if (!p)
> return 0;
> -
> - /*
> - * Shortcut check for OOM_SCORE_ADJ_MIN so the entire heuristic doesn't
> - * need to be executed for something that cannot be killed.
> - */
> - if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
> + if (is_mm_unfreeable(p->mm)) {
oom_badness() becomes O(n**2), not good.
And, more importantly. This patch makes me think ->oom_score_adj should
be moved from ->signal to ->mm.
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed
2010-08-15 15:18 ` [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed Oleg Nesterov
@ 2010-08-15 21:23 ` David Rientjes
2010-08-16 5:52 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: David Rientjes @ 2010-08-15 21:23 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
linux-mm
On Sun, 15 Aug 2010, Oleg Nesterov wrote:
> Well. I shouldn't try to comment this patch because I do not know
> the state of the current code (and I do not understand the changelog).
> Still, it looks a bit strange to me.
>
You snipped the changelog, so it's unclear what you don't understand about
it. The goal is to detect if a task A shares its mm with any other thread
that cannot be oom killed; if so, we can't free task A's memory when it
exits. It's then pointless to kill task A in the first place since it
will not solve the oom issue.
> > + * Determines whether an mm is unfreeable since a user thread attached to
> > + * it cannot be killed. Kthreads only temporarily assume a thread's mm,
> > + * so they are not considered.
> > + *
> > + * mm need not be protected by task_lock() since it will not be
> > + * dereferened.
> > + */
> > +static bool is_mm_unfreeable(struct mm_struct *mm)
> > +{
> > + struct task_struct *g, *q;
> > +
> > + do_each_thread(g, q) {
> > + if (q->mm == mm && !(q->flags & PF_KTHREAD) &&
> > + q->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
> > + return true;
> > + } while_each_thread(g, q);
>
> do_each_thread() doesn't look good. All sub-threads have the same ->mm.
>
There's no other way to detect threads in other thread groups that share
the same mm since subthreads of a process can have an oom_score_adj that
differ from that process, this includes the possibility of
OOM_SCORE_ADJ_MIN that we're interested in here.
> > @@ -160,12 +181,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> > p = find_lock_task_mm(p);
> > if (!p)
> > return 0;
> > -
> > - /*
> > - * Shortcut check for OOM_SCORE_ADJ_MIN so the entire heuristic doesn't
> > - * need to be executed for something that cannot be killed.
> > - */
> > - if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
> > + if (is_mm_unfreeable(p->mm)) {
>
> oom_badness() becomes O(n**2), not good.
>
No, oom_badness() becomes O(n) from O(1); select_bad_process() becomes
slower for eligible tasks.
It would be possible to defer this check to oom_kill_process() if
additional logic were added to its callers to retry if it fails:
- move the check for threads sharing an mm with an OOM_SCORE_ADJ_MIN
task to oom_kill_process() and return zero if found,
- callers of oom_kill_process() following select_bad_process() must loop
and select another process to kill with a badness score less than the
one initially selected (this could race based on variation in that
task's memory usage, but would not infinitely select it), and
- callers of oom_kill_process() directly on task (only
oom_kill_allocating_task) would fallback to using the tasklist scan
via select_bad_process().
What do you think?
> And, more importantly. This patch makes me think ->oom_score_adj should
> be moved from ->signal to ->mm.
>
I did that several months ago but people were unhappy with how a parent's
oom_score_adj value would change if it did a vfork() and the child's
oom_score_adj value was changed prior to execve().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed
2010-08-15 21:23 ` David Rientjes
@ 2010-08-16 5:52 ` Oleg Nesterov
2010-08-16 10:56 ` David Rientjes
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2010-08-16 5:52 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
linux-mm
On 08/15, David Rientjes wrote:
>
> On Sun, 15 Aug 2010, Oleg Nesterov wrote:
>
> > Well. I shouldn't try to comment this patch because I do not know
> > the state of the current code (and I do not understand the changelog).
> > Still, it looks a bit strange to me.
> >
>
> You snipped the changelog, so it's unclear what you don't understand about
> it. The goal is to detect if a task A shares its mm with any other thread
> that cannot be oom killed; if so, we can't free task A's memory when it
> exits. It's then pointless to kill task A in the first place since it
> will not solve the oom issue.
Yes, this part is clear.
> > > +static bool is_mm_unfreeable(struct mm_struct *mm)
> > > +{
> > > + struct task_struct *g, *q;
> > > +
> > > + do_each_thread(g, q) {
> > > + if (q->mm == mm && !(q->flags & PF_KTHREAD) &&
> > > + q->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
> > > + return true;
> > > + } while_each_thread(g, q);
> >
> > do_each_thread() doesn't look good. All sub-threads have the same ->mm.
> >
>
> There's no other way to detect threads in other thread groups that share
> the same mm since subthreads of a process can have an oom_score_adj that
> differ from that process, this includes the possibility of
> OOM_SCORE_ADJ_MIN that we're interested in here.
Yes, you are right. Still, at least you can do
for_each_process(p) {
if (p->mm != mm)
continue;
...
to quickly skip the thread group which doesn't share the same ->mm.
> > > - if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
> > > + if (is_mm_unfreeable(p->mm)) {
> >
> > oom_badness() becomes O(n**2), not good.
> >
>
> No, oom_badness() becomes O(n) from O(1); select_bad_process() becomes
> slower for eligible tasks.
I meant, select_bad_process() becomes O(n^2). oom_badness() is O(n), yes.
> It would be possible to defer this check to oom_kill_process() if
> additional logic were added to its callers to retry if it fails:
>
> [...snip...]
>
> What do you think?
Sorry David, I think nothing ;) Please ignore me, I have no time at all.
> > And, more importantly. This patch makes me think ->oom_score_adj should
> > be moved from ->signal to ->mm.
> >
>
> I did that several months ago but people were unhappy with how a parent's
> oom_score_adj value would change if it did a vfork() and the child's
> oom_score_adj value was changed prior to execve().
I see. But this patch in essence moves OOM_SCORE_ADJ_MIN from ->signal
to ->mm (and btw personally I think this makes sense).
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed
2010-08-16 5:52 ` Oleg Nesterov
@ 2010-08-16 10:56 ` David Rientjes
0 siblings, 0 replies; 9+ messages in thread
From: David Rientjes @ 2010-08-16 10:56 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel,
linux-mm
On Mon, 16 Aug 2010, Oleg Nesterov wrote:
> > There's no other way to detect threads in other thread groups that share
> > the same mm since subthreads of a process can have an oom_score_adj that
> > differ from that process, this includes the possibility of
> > OOM_SCORE_ADJ_MIN that we're interested in here.
>
> Yes, you are right. Still, at least you can do
>
> for_each_process(p) {
> if (p->mm != mm)
> continue;
> ...
>
> to quickly skip the thread group which doesn't share the same ->mm.
>
Right, thanks. I'll make that optimization and send out a second version
of this series with the other changes you suggested.
> > > > - if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
> > > > + if (is_mm_unfreeable(p->mm)) {
> > >
> > > oom_badness() becomes O(n**2), not good.
> > >
> >
> > No, oom_badness() becomes O(n) from O(1); select_bad_process() becomes
> > slower for eligible tasks.
>
> I meant, select_bad_process() becomes O(n^2). oom_badness() is O(n), yes.
>
I'll follow my own suggestion for deferring this check to
oom_kill_process() since it's certainly an unusual case if the tasks are
sharing memory. It'll require a second entire tasklist scan when it
occurs, but definitely speeds up the common case.
> > > And, more importantly. This patch makes me think ->oom_score_adj should
> > > be moved from ->signal to ->mm.
> > >
> >
> > I did that several months ago but people were unhappy with how a parent's
> > oom_score_adj value would change if it did a vfork() and the child's
> > oom_score_adj value was changed prior to execve().
>
> I see. But this patch in essence moves OOM_SCORE_ADJ_MIN from ->signal
> to ->mm (and btw personally I think this makes sense).
>
Yes, and I still would have liked to embed it in struct mm_struct like I
originally proposed, but I understand how some people didn't care much for
the vfork() inheritance problem. There are applications in the wild such
as job schedulers that are OOM_DISABLE themselves and fork children and
then reset their oom_adj value prior to exec. So they do vfork() ->
change child's oom_adj -> execve(). That currently works since the
child's ->signal isn't shared (and before that, oom_adj was embedded in
struct task_struct) and we can't change that behavior to also change the
parent's oom_adj value at the same time because it shares an ->mm out from
under them.
Thanks for reviewing the patches Oleg!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-08-16 10:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-15 4:30 [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed David Rientjes
2010-08-15 4:31 ` [patch 2/2] oom: kill all threads sharing oom killed task's mm David Rientjes
2010-08-15 15:45 ` Oleg Nesterov
2010-08-15 21:28 ` David Rientjes
2010-08-16 6:00 ` Oleg Nesterov
2010-08-15 15:18 ` [patch 1/2] oom: avoid killing a task if a thread sharing its mm cannot be killed Oleg Nesterov
2010-08-15 21:23 ` David Rientjes
2010-08-16 5:52 ` Oleg Nesterov
2010-08-16 10:56 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).