* [PATCH] oom: skip frozen tasks
@ 2011-08-23 8:31 Konstantin Khlebnikov
2011-08-23 9:15 ` KAMEZAWA Hiroyuki
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Konstantin Khlebnikov @ 2011-08-23 8:31 UTC (permalink / raw)
To: linux-mm, Andrew Morton, linux-kernel
Cc: KOSAKI Motohiro, Oleg Nesterov, KAMEZAWA Hiroyuki, David Rientjes
All frozen tasks are unkillable, and if one of them has TIF_MEMDIE
we must kill something else to avoid deadlock. After this patch
select_bad_process() will skip frozen task before checking TIF_MEMDIE.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
mm/oom_kill.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 626303b..931ab20 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -138,6 +138,8 @@ static bool oom_unkillable_task(struct task_struct *p,
return true;
if (p->flags & PF_KTHREAD)
return true;
+ if (p->flags & PF_FROZEN)
+ return true;
/* When mem_cgroup_out_of_memory() and p is not member of the group */
if (mem && !task_in_mem_cgroup(p, mem))
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-23 8:31 [PATCH] oom: skip frozen tasks Konstantin Khlebnikov
@ 2011-08-23 9:15 ` KAMEZAWA Hiroyuki
2011-08-23 13:46 ` Michal Hocko
2011-08-23 20:18 ` David Rientjes
2 siblings, 0 replies; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-23 9:15 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: linux-mm, Andrew Morton, linux-kernel, KOSAKI Motohiro,
Oleg Nesterov, David Rientjes
On Tue, 23 Aug 2011 11:31:01 +0300
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> All frozen tasks are unkillable, and if one of them has TIF_MEMDIE
> we must kill something else to avoid deadlock. After this patch
> select_bad_process() will skip frozen task before checking TIF_MEMDIE.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-23 8:31 [PATCH] oom: skip frozen tasks Konstantin Khlebnikov
2011-08-23 9:15 ` KAMEZAWA Hiroyuki
@ 2011-08-23 13:46 ` Michal Hocko
2011-08-23 20:18 ` David Rientjes
2 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2011-08-23 13:46 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: linux-mm, Andrew Morton, linux-kernel, KOSAKI Motohiro,
Oleg Nesterov, KAMEZAWA Hiroyuki, David Rientjes
On Tue 23-08-11 11:31:01, Konstantin Khlebnikov wrote:
> All frozen tasks are unkillable, and if one of them has TIF_MEMDIE
> we must kill something else to avoid deadlock.
This is a livelock rather than a deadlock, isn't it? We are picking the
same process all the time and cannot do any progress.
> After this patch select_bad_process() will skip frozen task before
> checking TIF_MEMDIE.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Anyway the patch looks good to me.
Reviewed-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/oom_kill.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 626303b..931ab20 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -138,6 +138,8 @@ static bool oom_unkillable_task(struct task_struct *p,
> return true;
> if (p->flags & PF_KTHREAD)
> return true;
> + if (p->flags & PF_FROZEN)
> + return true;
>
> /* When mem_cgroup_out_of_memory() and p is not member of the group */
> if (mem && !task_in_mem_cgroup(p, mem))
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-23 8:31 [PATCH] oom: skip frozen tasks Konstantin Khlebnikov
2011-08-23 9:15 ` KAMEZAWA Hiroyuki
2011-08-23 13:46 ` Michal Hocko
@ 2011-08-23 20:18 ` David Rientjes
2011-08-24 10:19 ` Michal Hocko
2 siblings, 1 reply; 44+ messages in thread
From: David Rientjes @ 2011-08-23 20:18 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: linux-mm, Andrew Morton, linux-kernel, KOSAKI Motohiro,
Oleg Nesterov, KAMEZAWA Hiroyuki
On Tue, 23 Aug 2011, Konstantin Khlebnikov wrote:
> All frozen tasks are unkillable, and if one of them has TIF_MEMDIE
> we must kill something else to avoid deadlock. After this patch
> select_bad_process() will skip frozen task before checking TIF_MEMDIE.
>
The caveat is that if the task in the refrigerator is not OOM_DISABLE and
there are no other eligible tasks (system wide, in the cpuset, or in the
memcg) to kill, then the machine will panic as a result of this when, in
the past, we would simply issue the SIGKILL and keep looping in the page
allocator until it is thawed.
So you may actually be trading a stall waiting for this thread to thaw for
what would now be a panic, and that's not clearly better to me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-23 20:18 ` David Rientjes
@ 2011-08-24 10:19 ` Michal Hocko
2011-08-24 19:31 ` David Rientjes
0 siblings, 1 reply; 44+ messages in thread
From: Michal Hocko @ 2011-08-24 10:19 UTC (permalink / raw)
To: David Rientjes
Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton, linux-kernel,
KOSAKI Motohiro, Oleg Nesterov, KAMEZAWA Hiroyuki
On Tue 23-08-11 13:18:14, David Rientjes wrote:
> On Tue, 23 Aug 2011, Konstantin Khlebnikov wrote:
>
> > All frozen tasks are unkillable, and if one of them has TIF_MEMDIE
> > we must kill something else to avoid deadlock. After this patch
> > select_bad_process() will skip frozen task before checking TIF_MEMDIE.
> >
>
> The caveat is that if the task in the refrigerator is not OOM_DISABLE and
> there are no other eligible tasks (system wide, in the cpuset, or in the
> memcg) to kill, then the machine will panic as a result of this when, in
> the past, we would simply issue the SIGKILL and keep looping in the page
> allocator until it is thawed.
mem_cgroup_out_of_memory doesn't panic if nothing has been selected. We
will loop in the charge&reclaim path until somebody frees some memory.
> So you may actually be trading a stall waiting for this thread to thaw for
> what would now be a panic, and that's not clearly better to me.
When we are in the global OOM condition then you are right, we have a
higher chance to panic. I still find the patch an improvement because
encountering a frozen task and looping over it without any progress
(even though there are other tasks that could be killed) is more
probable than having no killable task at all.
On non-NUMA machines there is even not a big chance that somebody would
be able to thaw a task as the system is already on knees.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-24 10:19 ` Michal Hocko
@ 2011-08-24 19:31 ` David Rientjes
2011-08-25 9:19 ` Michal Hocko
0 siblings, 1 reply; 44+ messages in thread
From: David Rientjes @ 2011-08-24 19:31 UTC (permalink / raw)
To: Michal Hocko
Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton, linux-kernel,
KOSAKI Motohiro, Oleg Nesterov, KAMEZAWA Hiroyuki
On Wed, 24 Aug 2011, Michal Hocko wrote:
> When we are in the global OOM condition then you are right, we have a
> higher chance to panic. I still find the patch an improvement because
> encountering a frozen task and looping over it without any progress
> (even though there are other tasks that could be killed) is more
> probable than having no killable task at all.
> On non-NUMA machines there is even not a big chance that somebody would
> be able to thaw a task as the system is already on knees.
>
That's obviously false since we call oom_killer_disable() in
freeze_processes() to disable the oom killer from ever being called in the
first place, so this is something you need to resolve with Rafael before
you cause more machines to panic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-24 19:31 ` David Rientjes
@ 2011-08-25 9:19 ` Michal Hocko
2011-08-25 15:18 ` Oleg Nesterov
0 siblings, 1 reply; 44+ messages in thread
From: Michal Hocko @ 2011-08-25 9:19 UTC (permalink / raw)
To: David Rientjes
Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton, linux-kernel,
KOSAKI Motohiro, Oleg Nesterov, KAMEZAWA Hiroyuki
On Wed 24-08-11 12:31:26, David Rientjes wrote:
> On Wed, 24 Aug 2011, Michal Hocko wrote:
>
> > When we are in the global OOM condition then you are right, we have a
> > higher chance to panic. I still find the patch an improvement because
> > encountering a frozen task and looping over it without any progress
> > (even though there are other tasks that could be killed) is more
> > probable than having no killable task at all.
> > On non-NUMA machines there is even not a big chance that somebody would
> > be able to thaw a task as the system is already on knees.
> >
>
> That's obviously false since we call oom_killer_disable() in
> freeze_processes() to disable the oom killer from ever being called in the
> first place, so this is something you need to resolve with Rafael before
> you cause more machines to panic.
I didn't mean suspend/resume path (that is protected by oom_killer_disabled)
so the patch doesn't make any change.
Other than that you may end up with all tasks frozen by freezer cgroup
(assuming that others, that are killable, would be already killed by
OOM). But in that case who can thaw those processes when we are already
in OOM? If there is no chance to move forward then panic is more
suitable than a livelock IMO.
OK, we might be in OOM on a nodemask (cpuset or mempol) on NUMA and an
allocation on a different nodemask might still succeed and so we can
thaw those processes. This should be addressed.
What if we panicked only if constraint == CONSTRAINT_NONE?
Or am I missing something?
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-25 9:19 ` Michal Hocko
@ 2011-08-25 15:18 ` Oleg Nesterov
2011-08-25 16:47 ` Michal Hocko
2011-08-25 21:03 ` Rafael J. Wysocki
0 siblings, 2 replies; 44+ messages in thread
From: Oleg Nesterov @ 2011-08-25 15:18 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki
On 08/25, Michal Hocko wrote:
>
> On Wed 24-08-11 12:31:26, David Rientjes wrote:
> >
> > That's obviously false since we call oom_killer_disable() in
> > freeze_processes() to disable the oom killer from ever being called in the
> > first place, so this is something you need to resolve with Rafael before
> > you cause more machines to panic.
>
> I didn't mean suspend/resume path (that is protected by oom_killer_disabled)
> so the patch doesn't make any change.
Confused... freeze_processes() does try_to_freeze_tasks() before
oom_killer_disable() ?
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-25 15:18 ` Oleg Nesterov
@ 2011-08-25 16:47 ` Michal Hocko
2011-08-25 21:14 ` David Rientjes
2011-08-25 21:03 ` Rafael J. Wysocki
1 sibling, 1 reply; 44+ messages in thread
From: Michal Hocko @ 2011-08-25 16:47 UTC (permalink / raw)
To: Oleg Nesterov
Cc: David Rientjes, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki
On Thu 25-08-11 17:18:18, Oleg Nesterov wrote:
> On 08/25, Michal Hocko wrote:
> >
> > On Wed 24-08-11 12:31:26, David Rientjes wrote:
> > >
> > > That's obviously false since we call oom_killer_disable() in
> > > freeze_processes() to disable the oom killer from ever being called in the
> > > first place, so this is something you need to resolve with Rafael before
> > > you cause more machines to panic.
> >
> > I didn't mean suspend/resume path (that is protected by oom_killer_disabled)
> > so the patch doesn't make any change.
>
> Confused... freeze_processes() does try_to_freeze_tasks() before
> oom_killer_disable() ?
Yes you are right, I must have been blind.
Now I see the point. We do not want to panic while we are suspending and
the memory is really low just because all the userspace is already in
the the fridge.
Sorry for confusion.
I still do not follow the oom_killer_disable note from David, though.
>
> Oleg.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-25 15:18 ` Oleg Nesterov
2011-08-25 16:47 ` Michal Hocko
@ 2011-08-25 21:03 ` Rafael J. Wysocki
1 sibling, 0 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2011-08-25 21:03 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Michal Hocko, David Rientjes, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki
On Thursday, August 25, 2011, Oleg Nesterov wrote:
> On 08/25, Michal Hocko wrote:
> >
> > On Wed 24-08-11 12:31:26, David Rientjes wrote:
> > >
> > > That's obviously false since we call oom_killer_disable() in
> > > freeze_processes() to disable the oom killer from ever being called in the
> > > first place, so this is something you need to resolve with Rafael before
> > > you cause more machines to panic.
> >
> > I didn't mean suspend/resume path (that is protected by oom_killer_disabled)
> > so the patch doesn't make any change.
>
> Confused... freeze_processes() does try_to_freeze_tasks() before
> oom_killer_disable() ?
Yes, it does.
Thanks,
Rafael
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-25 16:47 ` Michal Hocko
@ 2011-08-25 21:14 ` David Rientjes
2011-08-26 7:09 ` Michal Hocko
2011-08-26 7:35 ` Konstantin Khlebnikov
0 siblings, 2 replies; 44+ messages in thread
From: David Rientjes @ 2011-08-25 21:14 UTC (permalink / raw)
To: Michal Hocko
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Thu, 25 Aug 2011, Michal Hocko wrote:
> > > > That's obviously false since we call oom_killer_disable() in
> > > > freeze_processes() to disable the oom killer from ever being called in the
> > > > first place, so this is something you need to resolve with Rafael before
> > > > you cause more machines to panic.
> > >
> > > I didn't mean suspend/resume path (that is protected by oom_killer_disabled)
> > > so the patch doesn't make any change.
> >
> > Confused... freeze_processes() does try_to_freeze_tasks() before
> > oom_killer_disable() ?
>
> Yes you are right, I must have been blind.
>
> Now I see the point. We do not want to panic while we are suspending and
> the memory is really low just because all the userspace is already in
> the the fridge.
> Sorry for confusion.
>
> I still do not follow the oom_killer_disable note from David, though.
>
oom_killer_disable() was added to that path for a reason when all threads
are frozen: memory allocations still occur in the suspend path in an oom
condition and adding the oom_killer_disable() will cause those
allocations to fail rather than sending pointless SIGKILLs to frozen
threads.
Now consider if the only _eligible_ threads for oom kill (because of
cpusets or mempolicies) are those that are frozen. We certainly do not
want to panic because other cpusets are still getting work done. We'd
either want to add a mem to the cpuset or thaw the processes because the
cpuset is oom.
You can't just selectively skip certain threads when their state can be
temporary without risking a panic. That's why this patch is a
non-starter.
A much better solution would be to lower the badness score that the oom
killer uses for PF_FROZEN threads so that they aren't considered a
priority for kill unless there's nothing else left to kill.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-25 21:14 ` David Rientjes
@ 2011-08-26 7:09 ` Michal Hocko
2011-08-26 8:56 ` Michal Hocko
2011-08-26 7:35 ` Konstantin Khlebnikov
1 sibling, 1 reply; 44+ messages in thread
From: Michal Hocko @ 2011-08-26 7:09 UTC (permalink / raw)
To: David Rientjes
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Thu 25-08-11 14:14:20, David Rientjes wrote:
> On Thu, 25 Aug 2011, Michal Hocko wrote:
>
> > > > > That's obviously false since we call oom_killer_disable() in
> > > > > freeze_processes() to disable the oom killer from ever being called in the
> > > > > first place, so this is something you need to resolve with Rafael before
> > > > > you cause more machines to panic.
> > > >
> > > > I didn't mean suspend/resume path (that is protected by oom_killer_disabled)
> > > > so the patch doesn't make any change.
> > >
> > > Confused... freeze_processes() does try_to_freeze_tasks() before
> > > oom_killer_disable() ?
> >
> > Yes you are right, I must have been blind.
> >
> > Now I see the point. We do not want to panic while we are suspending and
> > the memory is really low just because all the userspace is already in
> > the the fridge.
> > Sorry for confusion.
> >
> > I still do not follow the oom_killer_disable note from David, though.
> >
>
> oom_killer_disable() was added to that path for a reason when all threads
> are frozen: memory allocations still occur in the suspend path in an oom
> condition and adding the oom_killer_disable() will cause those
> allocations to fail rather than sending pointless SIGKILLs to frozen
> threads.
>
> Now consider if the only _eligible_ threads for oom kill (because of
> cpusets or mempolicies) are those that are frozen. We certainly do not
> want to panic because other cpusets are still getting work done. We'd
> either want to add a mem to the cpuset or thaw the processes because the
> cpuset is oom.
Sure.
>
> You can't just selectively skip certain threads when their state can be
> temporary without risking a panic. That's why this patch is a
> non-starter.
>
> A much better solution would be to lower the badness score that the oom
> killer uses for PF_FROZEN threads so that they aren't considered a
> priority for kill unless there's nothing else left to kill.
Yes, sounds better.
Thanks
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-25 21:14 ` David Rientjes
2011-08-26 7:09 ` Michal Hocko
@ 2011-08-26 7:35 ` Konstantin Khlebnikov
2011-08-26 9:09 ` David Rientjes
1 sibling, 1 reply; 44+ messages in thread
From: Konstantin Khlebnikov @ 2011-08-26 7:35 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, Oleg Nesterov, linux-mm@kvack.org, Andrew Morton,
linux-kernel@vger.kernel.org, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
David Rientjes wrote:
> On Thu, 25 Aug 2011, Michal Hocko wrote:
>
>>>>> That's obviously false since we call oom_killer_disable() in
>>>>> freeze_processes() to disable the oom killer from ever being called in the
>>>>> first place, so this is something you need to resolve with Rafael before
>>>>> you cause more machines to panic.
>>>>
>>>> I didn't mean suspend/resume path (that is protected by oom_killer_disabled)
>>>> so the patch doesn't make any change.
>>>
>>> Confused... freeze_processes() does try_to_freeze_tasks() before
>>> oom_killer_disable() ?
>>
>> Yes you are right, I must have been blind.
>>
>> Now I see the point. We do not want to panic while we are suspending and
>> the memory is really low just because all the userspace is already in
>> the the fridge.
>> Sorry for confusion.
>>
>> I still do not follow the oom_killer_disable note from David, though.
>>
>
> oom_killer_disable() was added to that path for a reason when all threads
> are frozen: memory allocations still occur in the suspend path in an oom
> condition and adding the oom_killer_disable() will cause those
> allocations to fail rather than sending pointless SIGKILLs to frozen
> threads.
>
> Now consider if the only _eligible_ threads for oom kill (because of
> cpusets or mempolicies) are those that are frozen. We certainly do not
> want to panic because other cpusets are still getting work done. We'd
> either want to add a mem to the cpuset or thaw the processes because the
> cpuset is oom.
>
> You can't just selectively skip certain threads when their state can be
> temporary without risking a panic. That's why this patch is a
> non-starter.
>
> A much better solution would be to lower the badness score that the oom
> killer uses for PF_FROZEN threads so that they aren't considered a
> priority for kill unless there's nothing else left to kill.
Anyway, oom killer shouldn't loop endlessly if it see TIF_MEMDIE on frozen task,
it must go on and try to kill somebody else. We cannot wait for thawing this task.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 7:09 ` Michal Hocko
@ 2011-08-26 8:56 ` Michal Hocko
2011-08-26 9:21 ` David Rientjes
2011-08-26 10:03 ` Konstantin Khlebnikov
0 siblings, 2 replies; 44+ messages in thread
From: Michal Hocko @ 2011-08-26 8:56 UTC (permalink / raw)
To: David Rientjes
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri 26-08-11 09:09:46, Michal Hocko wrote:
> On Thu 25-08-11 14:14:20, David Rientjes wrote:
> > On Thu, 25 Aug 2011, Michal Hocko wrote:
> >
> > > > > > That's obviously false since we call oom_killer_disable() in
> > > > > > freeze_processes() to disable the oom killer from ever being called in the
> > > > > > first place, so this is something you need to resolve with Rafael before
> > > > > > you cause more machines to panic.
> > > > >
> > > > > I didn't mean suspend/resume path (that is protected by oom_killer_disabled)
> > > > > so the patch doesn't make any change.
> > > >
> > > > Confused... freeze_processes() does try_to_freeze_tasks() before
> > > > oom_killer_disable() ?
> > >
> > > Yes you are right, I must have been blind.
> > >
> > > Now I see the point. We do not want to panic while we are suspending and
> > > the memory is really low just because all the userspace is already in
> > > the the fridge.
> > > Sorry for confusion.
> > >
> > > I still do not follow the oom_killer_disable note from David, though.
> > >
> >
> > oom_killer_disable() was added to that path for a reason when all threads
> > are frozen: memory allocations still occur in the suspend path in an oom
> > condition and adding the oom_killer_disable() will cause those
> > allocations to fail rather than sending pointless SIGKILLs to frozen
> > threads.
> >
> > Now consider if the only _eligible_ threads for oom kill (because of
> > cpusets or mempolicies) are those that are frozen. We certainly do not
> > want to panic because other cpusets are still getting work done. We'd
> > either want to add a mem to the cpuset or thaw the processes because the
> > cpuset is oom.
>
> Sure.
>
> >
> > You can't just selectively skip certain threads when their state can be
> > temporary without risking a panic. That's why this patch is a
> > non-starter.
> >
> > A much better solution would be to lower the badness score that the oom
> > killer uses for PF_FROZEN threads so that they aren't considered a
> > priority for kill unless there's nothing else left to kill.
>
> Yes, sounds better.
.. but still not sufficient. We also have to thaw the process
as well. Just a quick hacked up patch (not tested, just for an
illustration).
Would something like this work?
---
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 7:35 ` Konstantin Khlebnikov
@ 2011-08-26 9:09 ` David Rientjes
2011-08-26 9:59 ` Konstantin Khlebnikov
0 siblings, 1 reply; 44+ messages in thread
From: David Rientjes @ 2011-08-26 9:09 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Michal Hocko, Oleg Nesterov, linux-mm@kvack.org, Andrew Morton,
linux-kernel@vger.kernel.org, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri, 26 Aug 2011, Konstantin Khlebnikov wrote:
> > A much better solution would be to lower the badness score that the oom
> > killer uses for PF_FROZEN threads so that they aren't considered a
> > priority for kill unless there's nothing else left to kill.
>
> Anyway, oom killer shouldn't loop endlessly if it see TIF_MEMDIE on frozen
> task,
> it must go on and try to kill somebody else. We cannot wait for thawing this
> task.
>
Did you read my suggestion? I quoted it above again for you. The badness
heuristic would only select those tasks to kill as a last resort in the
hopes they will eventually be thawed and may exit. Panicking the entire
machine for what could be isolated by a cgroup is insanity.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 8:56 ` Michal Hocko
@ 2011-08-26 9:21 ` David Rientjes
2011-08-26 9:53 ` Michal Hocko
2011-08-26 21:03 ` [PATCH] oom: skip frozen tasks Rafael J. Wysocki
2011-08-26 10:03 ` Konstantin Khlebnikov
1 sibling, 2 replies; 44+ messages in thread
From: David Rientjes @ 2011-08-26 9:21 UTC (permalink / raw)
To: Michal Hocko
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri, 26 Aug 2011, Michal Hocko wrote:
> Let's give all frozen tasks a bonus (OOM_SCORE_ADJ_MAX/2) so that we do
> not consider them unless really necessary and if we really pick up one
> then thaw its threads before we try to kill it.
>
I don't like arbitrary heuristics like this because they polluted the old
oom killer before it was rewritten and made it much more unpredictable.
The only heuristic it includes right now is a bonus for root tasks so that
when two processes have nearly the same amount of memory usage (within 3%
of available memory), the non-root task is chosen instead.
This bonus is actually saying that a single frozen task can use up to 50%
more of the machine's capacity in a system-wide oom condition than the
task that will now be killed instead. That seems excessive.
I do like the idea of automatically thawing the task though and if that's
possible then I don't think we need to manipulate the badness heuristic at
all. I know that wouldn't be feasible when we've frozen _all_ threads and
that's why we have oom_killer_disable(), but we'll have to check with
Rafael to see if something like this could work. Rafael?
> TODO
> - given bonus might be too big?
> - aren't we racing with try_to_freeze_tasks?
> ---
> mm/oom_kill.c | 13 +++++++++++++
> 1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 626303b..fd194bc 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -32,6 +32,7 @@
> #include <linux/mempolicy.h>
> #include <linux/security.h>
> #include <linux/ptrace.h>
> +#include <linux/freezer.h>
>
> int sysctl_panic_on_oom;
> int sysctl_oom_kill_allocating_task;
> @@ -214,6 +215,14 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> points += p->signal->oom_score_adj;
>
> /*
> + * Do not try to kill frozen tasks unless there is nothing else to kill.
> + * We do not want to give it 1 point because we still want to select a good
> + * candidate among all frozen tasks. Let's give it a reasonable bonus.
> + */
> + if (frozen(p))
> + points -= OOM_SCORE_ADJ_MAX/2;
> +
> + /*
> * Never return 0 for an eligible task that may be killed since it's
> * possible that no single user task uses more than 0.1% of memory and
> * no single admin tasks uses more than 3.0%.
> @@ -450,6 +459,10 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> pr_err("Kill process %d (%s) sharing same memory\n",
> task_pid_nr(q), q->comm);
> task_unlock(q);
> +
> + if (frozen(q))
> + thaw_process(q);
> +
> force_sig(SIGKILL, q);
> }
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 9:21 ` David Rientjes
@ 2011-08-26 9:53 ` Michal Hocko
2011-08-26 11:01 ` Michal Hocko
2011-08-26 18:13 ` David Rientjes
2011-08-26 21:03 ` [PATCH] oom: skip frozen tasks Rafael J. Wysocki
1 sibling, 2 replies; 44+ messages in thread
From: Michal Hocko @ 2011-08-26 9:53 UTC (permalink / raw)
To: David Rientjes
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri 26-08-11 02:21:42, David Rientjes wrote:
> On Fri, 26 Aug 2011, Michal Hocko wrote:
>
> > Let's give all frozen tasks a bonus (OOM_SCORE_ADJ_MAX/2) so that we do
> > not consider them unless really necessary and if we really pick up one
> > then thaw its threads before we try to kill it.
> >
>
> I don't like arbitrary heuristics like this because they polluted the old
> oom killer before it was rewritten and made it much more unpredictable.
> The only heuristic it includes right now is a bonus for root tasks so that
> when two processes have nearly the same amount of memory usage (within 3%
> of available memory), the non-root task is chosen instead.
>
> This bonus is actually saying that a single frozen task can use up to 50%
> more of the machine's capacity in a system-wide oom condition than the
> task that will now be killed instead. That seems excessive.
Yes, the number is probably too high. I just wanted to start up with
something. Maybe we can give it another root bonus. But I agree whatever
we use it will be just a random value...
>
> I do like the idea of automatically thawing the task though and if that's
> possible then I don't think we need to manipulate the badness heuristic at
> all. I know that wouldn't be feasible when we've frozen _all_ threads and
Why it wouldn't be feasible for all threads? If you have all tasks
frozen (suspend going on, whole cgroup or all tasks in a cpuset/nodemask
are frozen) then the selection is more natural because all of them are
equal (with or without a bonus). The bonus tries to reduce thawing if
not all of them are frozen.
I am not saying the bonus is necessary, though. It depends on what
the freezer is used for (e.g. freeze a process which went wild and
debug what went wrong wouldn't welcome that somebody killed it or other
(mis)use which relies on D state).
> that's why we have oom_killer_disable(), but we'll have to check with
> Rafael to see if something like this could work. Rafael?
>
> > TODO
> > - given bonus might be too big?
> > - aren't we racing with try_to_freeze_tasks?
> > ---
> > mm/oom_kill.c | 13 +++++++++++++
> > 1 files changed, 13 insertions(+), 0 deletions(-)
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 626303b..fd194bc 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -32,6 +32,7 @@
> > #include <linux/mempolicy.h>
> > #include <linux/security.h>
> > #include <linux/ptrace.h>
> > +#include <linux/freezer.h>
> >
> > int sysctl_panic_on_oom;
> > int sysctl_oom_kill_allocating_task;
> > @@ -214,6 +215,14 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> > points += p->signal->oom_score_adj;
> >
> > /*
> > + * Do not try to kill frozen tasks unless there is nothing else to kill.
> > + * We do not want to give it 1 point because we still want to select a good
> > + * candidate among all frozen tasks. Let's give it a reasonable bonus.
> > + */
> > + if (frozen(p))
> > + points -= OOM_SCORE_ADJ_MAX/2;
> > +
> > + /*
> > * Never return 0 for an eligible task that may be killed since it's
> > * possible that no single user task uses more than 0.1% of memory and
> > * no single admin tasks uses more than 3.0%.
> > @@ -450,6 +459,10 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> > pr_err("Kill process %d (%s) sharing same memory\n",
> > task_pid_nr(q), q->comm);
> > task_unlock(q);
> > +
> > + if (frozen(q))
> > + thaw_process(q);
> > +
> > force_sig(SIGKILL, q);
> > }
> >
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 9:09 ` David Rientjes
@ 2011-08-26 9:59 ` Konstantin Khlebnikov
2011-08-26 18:09 ` David Rientjes
0 siblings, 1 reply; 44+ messages in thread
From: Konstantin Khlebnikov @ 2011-08-26 9:59 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, Oleg Nesterov, linux-mm@kvack.org, Andrew Morton,
linux-kernel@vger.kernel.org, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
David Rientjes wrote:
> On Fri, 26 Aug 2011, Konstantin Khlebnikov wrote:
>
>>> A much better solution would be to lower the badness score that the oom
>>> killer uses for PF_FROZEN threads so that they aren't considered a
>>> priority for kill unless there's nothing else left to kill.
>>
>> Anyway, oom killer shouldn't loop endlessly if it see TIF_MEMDIE on frozen
>> task,
>> it must go on and try to kill somebody else. We cannot wait for thawing this
>> task.
>>
>
> Did you read my suggestion? I quoted it above again for you. The badness
> heuristic would only select those tasks to kill as a last resort in the
> hopes they will eventually be thawed and may exit. Panicking the entire
> machine for what could be isolated by a cgroup is insanity.
Maybe just fix this "panic" logic? OOM killer should panic only on global memory shortage.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 8:56 ` Michal Hocko
2011-08-26 9:21 ` David Rientjes
@ 2011-08-26 10:03 ` Konstantin Khlebnikov
2011-08-26 10:48 ` Michal Hocko
1 sibling, 1 reply; 44+ messages in thread
From: Konstantin Khlebnikov @ 2011-08-26 10:03 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Oleg Nesterov, linux-mm@kvack.org, Andrew Morton,
linux-kernel@vger.kernel.org, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
Michal Hocko wrote:
> @@ -450,6 +459,10 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> pr_err("Kill process %d (%s) sharing same memory\n",
> task_pid_nr(q), q->comm);
> task_unlock(q);
> +
> + if (frozen(q))
> + thaw_process(q);
> +
We must thaw task strictly after sending SIGKILL.
But anyway I think this is a bad idea.
> force_sig(SIGKILL, q);
> }
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 10:03 ` Konstantin Khlebnikov
@ 2011-08-26 10:48 ` Michal Hocko
2011-08-26 12:44 ` Konstantin Khlebnikov
0 siblings, 1 reply; 44+ messages in thread
From: Michal Hocko @ 2011-08-26 10:48 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: David Rientjes, Oleg Nesterov, linux-mm@kvack.org, Andrew Morton,
linux-kernel@vger.kernel.org, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri 26-08-11 14:03:17, Konstantin Khlebnikov wrote:
> Michal Hocko wrote:
>
> >@@ -450,6 +459,10 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> > pr_err("Kill process %d (%s) sharing same memory\n",
> > task_pid_nr(q), q->comm);
> > task_unlock(q);
> >+
> >+ if (frozen(q))
> >+ thaw_process(q);
> >+
>
> We must thaw task strictly after sending SIGKILL.
Sounds reasonable.
> But anyway I think this is a bad idea.
Why?
>
> > force_sig(SIGKILL, q);
> > }
> >
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 9:53 ` Michal Hocko
@ 2011-08-26 11:01 ` Michal Hocko
2011-08-26 18:13 ` David Rientjes
1 sibling, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2011-08-26 11:01 UTC (permalink / raw)
To: David Rientjes
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri 26-08-11 11:53:56, Michal Hocko wrote:
> On Fri 26-08-11 02:21:42, David Rientjes wrote:
> > On Fri, 26 Aug 2011, Michal Hocko wrote:
> >
> > > Let's give all frozen tasks a bonus (OOM_SCORE_ADJ_MAX/2) so that we do
> > > not consider them unless really necessary and if we really pick up one
> > > then thaw its threads before we try to kill it.
> > >
> >
> > I don't like arbitrary heuristics like this because they polluted the old
> > oom killer before it was rewritten and made it much more unpredictable.
> > The only heuristic it includes right now is a bonus for root tasks so that
> > when two processes have nearly the same amount of memory usage (within 3%
> > of available memory), the non-root task is chosen instead.
> >
> > This bonus is actually saying that a single frozen task can use up to 50%
> > more of the machine's capacity in a system-wide oom condition than the
> > task that will now be killed instead. That seems excessive.
>
> Yes, the number is probably too high. I just wanted to start up with
> something. Maybe we can give it another root bonus. But I agree whatever
> we use it will be just a random value...
>
> >
> > I do like the idea of automatically thawing the task though and if that's
> > possible then I don't think we need to manipulate the badness heuristic at
> > all. I know that wouldn't be feasible when we've frozen _all_ threads and
>
> Why it wouldn't be feasible for all threads? If you have all tasks
> frozen (suspend going on, whole cgroup or all tasks in a cpuset/nodemask
> are frozen) then the selection is more natural because all of them are
> equal (with or without a bonus). The bonus tries to reduce thawing if
> not all of them are frozen.
> I am not saying the bonus is necessary, though. It depends on what
> the freezer is used for (e.g. freeze a process which went wild and
> debug what went wrong wouldn't welcome that somebody killed it or other
> (mis)use which relies on D state).
Anyway, I do agree, the two things (bonus and thaw during oom_kill)
should be handled separately.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 10:48 ` Michal Hocko
@ 2011-08-26 12:44 ` Konstantin Khlebnikov
2011-08-26 12:59 ` Michal Hocko
0 siblings, 1 reply; 44+ messages in thread
From: Konstantin Khlebnikov @ 2011-08-26 12:44 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Oleg Nesterov, linux-mm@kvack.org, Andrew Morton,
linux-kernel@vger.kernel.org, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
Michal Hocko wrote:
> On Fri 26-08-11 14:03:17, Konstantin Khlebnikov wrote:
>> Michal Hocko wrote:
>>
>>> @@ -450,6 +459,10 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
>>> pr_err("Kill process %d (%s) sharing same memory\n",
>>> task_pid_nr(q), q->comm);
>>> task_unlock(q);
>>> +
>>> + if (frozen(q))
>>> + thaw_process(q);
>>> +
>>
>> We must thaw task strictly after sending SIGKILL.
>
> Sounds reasonable.
>
>> But anyway I think this is a bad idea.
>
> Why?
Refrigerator may be used for digging in task's internal structures,
so such digger may be very surprised if somebody suddenly thaws this task.
>
>>
>>> force_sig(SIGKILL, q);
>>> }
>>>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 12:44 ` Konstantin Khlebnikov
@ 2011-08-26 12:59 ` Michal Hocko
0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2011-08-26 12:59 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: David Rientjes, Oleg Nesterov, linux-mm@kvack.org, Andrew Morton,
linux-kernel@vger.kernel.org, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri 26-08-11 16:44:49, Konstantin Khlebnikov wrote:
> Michal Hocko wrote:
> >On Fri 26-08-11 14:03:17, Konstantin Khlebnikov wrote:
> >>Michal Hocko wrote:
> >>
> >>>@@ -450,6 +459,10 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> >>> pr_err("Kill process %d (%s) sharing same memory\n",
> >>> task_pid_nr(q), q->comm);
> >>> task_unlock(q);
> >>>+
> >>>+ if (frozen(q))
> >>>+ thaw_process(q);
> >>>+
> >>
> >>We must thaw task strictly after sending SIGKILL.
> >
> >Sounds reasonable.
> >
> >>But anyway I think this is a bad idea.
> >
> >Why?
>
> Refrigerator may be used for digging in task's internal structures,
> so such digger may be very surprised if somebody suddenly thaws this task.
That is something similar why I mentioned that we probably want to
give it some oom bonus. Nevertheless we have to be carefull about
that. Someone could freeze a memory hog just to hide it from OOM which
would have much worse consequences than if such app disappeared while
somebody is looking at it while it is frozen.
This has to be balanced somehow.
>
> >
> >>
> >>> force_sig(SIGKILL, q);
> >>> }
> >>>
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 9:59 ` Konstantin Khlebnikov
@ 2011-08-26 18:09 ` David Rientjes
0 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2011-08-26 18:09 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Michal Hocko, Oleg Nesterov, linux-mm@kvack.org, Andrew Morton,
linux-kernel@vger.kernel.org, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri, 26 Aug 2011, Konstantin Khlebnikov wrote:
> Maybe just fix this "panic" logic? OOM killer should panic only on global
> memory shortage.
>
NO, it shouldn't, we actually rely quite extensively on cpusets filled
with OOM_DISABLE threads to panic because the job scheduler would be
unresponsive in such a condition and it'd much better to panic and reboot
than to brick the machine. I'm not sure where you're getting all your
information from, but please don't pass it off as principles.
You can set the panic logic to be whatever you want with
/proc/sys/vm/panic_on_oom. See Documentation/filesystems/proc.txt for
more information.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 9:53 ` Michal Hocko
2011-08-26 11:01 ` Michal Hocko
@ 2011-08-26 18:13 ` David Rientjes
2011-09-26 8:28 ` [PATCH 1/2] oom: do not live lock on " Michal Hocko
2011-09-26 8:35 ` [PATCH 2/2] oom: give bonus to frozen processes Michal Hocko
1 sibling, 2 replies; 44+ messages in thread
From: David Rientjes @ 2011-08-26 18:13 UTC (permalink / raw)
To: Michal Hocko
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri, 26 Aug 2011, Michal Hocko wrote:
> > I do like the idea of automatically thawing the task though and if that's
> > possible then I don't think we need to manipulate the badness heuristic at
> > all. I know that wouldn't be feasible when we've frozen _all_ threads and
>
> Why it wouldn't be feasible for all threads? If you have all tasks
> frozen (suspend going on, whole cgroup or all tasks in a cpuset/nodemask
> are frozen) then the selection is more natural because all of them are
> equal (with or without a bonus). The bonus tries to reduce thawing if
> not all of them are frozen.
Yeah, this comment wasn't in reference to your heuristic change, it was in
reference to the fact that if all threads are frozen then there is little
hope for us recovering from the situation without a user response. I
think that's why we want oom_killer_disable() so that we avoid looping
forever and can actually fail allocations in the hope that we'll bring
ourselves out of suspend.
> I am not saying the bonus is necessary, though. It depends on what
> the freezer is used for (e.g. freeze a process which went wild and
> debug what went wrong wouldn't welcome that somebody killed it or other
> (mis)use which relies on D state).
>
I'd love to be able to do a thaw on a PF_FROZEN task in the oom killer
followed by a SIGKILL if that task is selected for oom kill without an
heuristic change. Not sure if that's possible, so we'll wait for Rafael
to chime in.
If it actually does come down to a heuristic change, then it need not
happen in the oom killer: the freezing code would need to use
test_set_oom_score_adj() to temporarily reduce the oom_score_adj for that
task until it comes out of the refrigerator. We already use that in ksm
and swapoff to actually prefer threads, but we can use it to bias against
threads as well.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] oom: skip frozen tasks
2011-08-26 9:21 ` David Rientjes
2011-08-26 9:53 ` Michal Hocko
@ 2011-08-26 21:03 ` Rafael J. Wysocki
1 sibling, 0 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2011-08-26 21:03 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki
On Friday, August 26, 2011, David Rientjes wrote:
> On Fri, 26 Aug 2011, Michal Hocko wrote:
>
> > Let's give all frozen tasks a bonus (OOM_SCORE_ADJ_MAX/2) so that we do
> > not consider them unless really necessary and if we really pick up one
> > then thaw its threads before we try to kill it.
> >
>
> I don't like arbitrary heuristics like this because they polluted the old
> oom killer before it was rewritten and made it much more unpredictable.
> The only heuristic it includes right now is a bonus for root tasks so that
> when two processes have nearly the same amount of memory usage (within 3%
> of available memory), the non-root task is chosen instead.
>
> This bonus is actually saying that a single frozen task can use up to 50%
> more of the machine's capacity in a system-wide oom condition than the
> task that will now be killed instead. That seems excessive.
>
> I do like the idea of automatically thawing the task though and if that's
> possible then I don't think we need to manipulate the badness heuristic at
> all. I know that wouldn't be feasible when we've frozen _all_ threads and
> that's why we have oom_killer_disable(), but we'll have to check with
> Rafael to see if something like this could work. Rafael?
That depends a good deal on when the thawing happens and what the thawed
task can do before being killed. For example, if the thawing happens
while devices are suspended and the thawed task accesses a driver through
ioctl(), for example, the purpose of freezing will be defeated.
Thanks,
Rafael
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/2] oom: do not live lock on frozen tasks
2011-08-26 18:13 ` David Rientjes
@ 2011-09-26 8:28 ` Michal Hocko
2011-09-26 8:56 ` David Rientjes
2011-09-26 10:28 ` Rusty Russell
2011-09-26 8:35 ` [PATCH 2/2] oom: give bonus to frozen processes Michal Hocko
1 sibling, 2 replies; 44+ messages in thread
From: Michal Hocko @ 2011-09-26 8:28 UTC (permalink / raw)
To: David Rientjes
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki, Tejun Heo, Rusty Russell
[Let's add some more people to CC list]
Sorry it took so long but I was quite bussy recently.
On Fri 26-08-11 11:13:40, David Rientjes wrote:
> On Fri, 26 Aug 2011, Michal Hocko wrote:
[...]
> > I am not saying the bonus is necessary, though. It depends on what
> > the freezer is used for (e.g. freeze a process which went wild and
> > debug what went wrong wouldn't welcome that somebody killed it or other
> > (mis)use which relies on D state).
> >
>
> I'd love to be able to do a thaw on a PF_FROZEN task in the oom killer
> followed by a SIGKILL if that task is selected for oom kill without an
> heuristic change. Not sure if that's possible, so we'll wait for Rafael
> to chime in.
We have discussed that with Rafael and it should be safe to do that. See
the patch bellow.
The only place I am not entirely sure about is run_guest
(drivers/lguest/core.c). It seems that the code is able to cope with
signals but it also calls lguest_arch_run_guest after try_to_freeze.
---
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] oom: give bonus to frozen processes
2011-08-26 18:13 ` David Rientjes
2011-09-26 8:28 ` [PATCH 1/2] oom: do not live lock on " Michal Hocko
@ 2011-09-26 8:35 ` Michal Hocko
2011-09-26 9:02 ` David Rientjes
1 sibling, 1 reply; 44+ messages in thread
From: Michal Hocko @ 2011-09-26 8:35 UTC (permalink / raw)
To: David Rientjes
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Fri 26-08-11 11:13:40, David Rientjes wrote:
> On Fri, 26 Aug 2011, Michal Hocko wrote:
[...]
> > I am not saying the bonus is necessary, though. It depends on what
> > the freezer is used for (e.g. freeze a process which went wild and
> > debug what went wrong wouldn't welcome that somebody killed it or other
> > (mis)use which relies on D state).
[...]
> If it actually does come down to a heuristic change, then it need not
> happen in the oom killer: the freezing code would need to use
> test_set_oom_score_adj() to temporarily reduce the oom_score_adj for that
> task until it comes out of the refrigerator. We already use that in ksm
> and swapoff to actually prefer threads, but we can use it to bias against
> threads as well.
Let's try it with a heuristic change first. If you really do not like
it, we can move to oom_scode_adj. I like the heuristic change little bit
more because it is at the same place as the root bonus.
---
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 8:28 ` [PATCH 1/2] oom: do not live lock on " Michal Hocko
@ 2011-09-26 8:56 ` David Rientjes
2011-09-26 9:14 ` Michal Hocko
2011-09-26 10:28 ` Rusty Russell
1 sibling, 1 reply; 44+ messages in thread
From: David Rientjes @ 2011-09-26 8:56 UTC (permalink / raw)
To: Michal Hocko
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki, Tejun Heo, Rusty Russell
On Mon, 26 Sep 2011, Michal Hocko wrote:
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 626303b..b9774f3 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -32,6 +32,7 @@
> #include <linux/mempolicy.h>
> #include <linux/security.h>
> #include <linux/ptrace.h>
> +#include <linux/freezer.h>
>
> int sysctl_panic_on_oom;
> int sysctl_oom_kill_allocating_task;
> @@ -451,6 +452,9 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> task_pid_nr(q), q->comm);
> task_unlock(q);
> force_sig(SIGKILL, q);
> +
> + if (frozen(q))
> + thaw_process(q);
> }
>
> set_tsk_thread_flag(p, TIF_MEMDIE);
This is in the wrong place, oom_kill_task() iterates over all threads that
are _not_ in the same thread group as the chosen thread and kills them
without giving them access to memory reserves. The chosen task, p, could
still be frozen and may not exit.
Once that's fixed, feel free to add my
Acked-by: David Rientjes <rientjes@google.com>
once Rafael sends his acked-by or reviewed-by.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] oom: give bonus to frozen processes
2011-09-26 8:35 ` [PATCH 2/2] oom: give bonus to frozen processes Michal Hocko
@ 2011-09-26 9:02 ` David Rientjes
2011-09-26 9:31 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 44+ messages in thread
From: David Rientjes @ 2011-09-26 9:02 UTC (permalink / raw)
To: Michal Hocko
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki
On Mon, 26 Sep 2011, Michal Hocko wrote:
> Let's try it with a heuristic change first. If you really do not like
> it, we can move to oom_scode_adj. I like the heuristic change little bit
> more because it is at the same place as the root bonus.
The problem with the bonus is that, as mentioned previously, it doesn't
protect against ANYTHING for the case you're trying to fix. This won't
panic the machine because all killable threads are guaranteed to have a
non-zero badness score, but it's a very valid configuration to have either
- all eligible threads (system-wide, shared cpuset, shared mempolicy
nodes) are frozen, or
- all eligible frozen threads use <5% of memory whereas all other
eligible killable threads use 1% of available memory.
and that means the oom killer will repeatedly select those threads and the
livelock still exists unless you can guarantee that they are successfully
thawed, that thawing them in all situations is safe, and that once thawed
they will make a timely exit.
Additionally, I don't think biasing against frozen tasks makes sense from
a heusritic standpoint of the oom killer. Why would we want give
non-frozen tasks that are actually getting work done a preference over a
task that is frozen and doing absolutely nothing? It seems like that's
backwards and that we'd actually prefer killing the task doing nothing so
it can free its memory.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 8:56 ` David Rientjes
@ 2011-09-26 9:14 ` Michal Hocko
2011-09-26 9:25 ` KAMEZAWA Hiroyuki
2011-09-26 15:51 ` Rafael J. Wysocki
0 siblings, 2 replies; 44+ messages in thread
From: Michal Hocko @ 2011-09-26 9:14 UTC (permalink / raw)
To: David Rientjes
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki, Tejun Heo, Rusty Russell
On Mon 26-09-11 01:56:57, David Rientjes wrote:
> On Mon, 26 Sep 2011, Michal Hocko wrote:
>
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 626303b..b9774f3 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -32,6 +32,7 @@
> > #include <linux/mempolicy.h>
> > #include <linux/security.h>
> > #include <linux/ptrace.h>
> > +#include <linux/freezer.h>
> >
> > int sysctl_panic_on_oom;
> > int sysctl_oom_kill_allocating_task;
> > @@ -451,6 +452,9 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> > task_pid_nr(q), q->comm);
> > task_unlock(q);
> > force_sig(SIGKILL, q);
> > +
> > + if (frozen(q))
> > + thaw_process(q);
> > }
> >
> > set_tsk_thread_flag(p, TIF_MEMDIE);
>
> This is in the wrong place, oom_kill_task() iterates over all threads that
> are _not_ in the same thread group as the chosen thread and kills them
> without giving them access to memory reserves. The chosen task, p, could
> still be frozen and may not exit.
Ahh, right you are. I ave missed that one. Updated patch bellow.
>
> Once that's fixed, feel free to add my
>
> Acked-by: David Rientjes <rientjes@google.com>
Thanks
>
> once Rafael sends his acked-by or reviewed-by.
---
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 9:14 ` Michal Hocko
@ 2011-09-26 9:25 ` KAMEZAWA Hiroyuki
2011-09-26 9:32 ` Michal Hocko
2011-09-26 15:51 ` Rafael J. Wysocki
1 sibling, 1 reply; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-09-26 9:25 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, Rafael J. Wysocki,
Tejun Heo, Rusty Russell
On Mon, 26 Sep 2011 11:14:40 +0200
Michal Hocko <mhocko@suse.cz> wrote:
> On Mon 26-09-11 01:56:57, David Rientjes wrote:
> > On Mon, 26 Sep 2011, Michal Hocko wrote:
> >
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 626303b..b9774f3 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -32,6 +32,7 @@
> > > #include <linux/mempolicy.h>
> > > #include <linux/security.h>
> > > #include <linux/ptrace.h>
> > > +#include <linux/freezer.h>
> > >
> > > int sysctl_panic_on_oom;
> > > int sysctl_oom_kill_allocating_task;
> > > @@ -451,6 +452,9 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> > > task_pid_nr(q), q->comm);
> > > task_unlock(q);
> > > force_sig(SIGKILL, q);
> > > +
> > > + if (frozen(q))
> > > + thaw_process(q);
> > > }
> > >
> > > set_tsk_thread_flag(p, TIF_MEMDIE);
> >
> > This is in the wrong place, oom_kill_task() iterates over all threads that
> > are _not_ in the same thread group as the chosen thread and kills them
> > without giving them access to memory reserves. The chosen task, p, could
> > still be frozen and may not exit.
>
> Ahh, right you are. I ave missed that one. Updated patch bellow.
>
> >
> > Once that's fixed, feel free to add my
> >
> > Acked-by: David Rientjes <rientjes@google.com>
>
> Thanks
>
> >
> > once Rafael sends his acked-by or reviewed-by.
> ---
> From f935ed4558c2fb033ef5c14e02b28e12a615f80e Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Fri, 16 Sep 2011 11:23:15 +0200
> Subject: [PATCH] oom: do not live lock on frozen tasks
>
> Konstantin Khlebnikov has reported (https://lkml.org/lkml/2011/8/23/45)
> that OOM can end up in a live lock if select_bad_process picks up a frozen
> task.
> Unfortunately we cannot mark such processes as unkillable to ignore them
> because we could panic the system even though there is a chance that
> somebody could thaw the process so we can make a forward process (e.g. a
> process from another cpuset or with a different nodemask).
>
> Let's thaw an OOM selected frozen process right after we've sent fatal
> signal from oom_kill_task.
> Thawing is safe if the frozen task doesn't access any suspended device
> (e.g. by ioctl) on the way out to the userspace where we handle the
> signal and die. Note, we are not interested in the kernel threads because
> they are not oom killable.
>
> Accessing suspended devices by a userspace processes shouldn't be an
> issue because devices are suspended only after userspace is already
> frozen and oom is disabled at that time.
>
> run_guest (drivers/lguest/core.c) calls try_to_freeze with an user
> context but it seems it is able to cope with signals because it
> explicitly checks for pending signals so we should be safe.
>
> Other than that userspace accesses the fridge only from the
> signal handling routines so we are able to handle SIGKILL without any
> negative side effects.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] oom: give bonus to frozen processes
2011-09-26 9:02 ` David Rientjes
@ 2011-09-26 9:31 ` KAMEZAWA Hiroyuki
2011-09-26 9:54 ` Michal Hocko
0 siblings, 1 reply; 44+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-09-26 9:31 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, Rafael J. Wysocki
On Mon, 26 Sep 2011 02:02:59 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:
> On Mon, 26 Sep 2011, Michal Hocko wrote:
>
> > Let's try it with a heuristic change first. If you really do not like
> > it, we can move to oom_scode_adj. I like the heuristic change little bit
> > more because it is at the same place as the root bonus.
>
> The problem with the bonus is that, as mentioned previously, it doesn't
> protect against ANYTHING for the case you're trying to fix. This won't
> panic the machine because all killable threads are guaranteed to have a
> non-zero badness score, but it's a very valid configuration to have either
>
> - all eligible threads (system-wide, shared cpuset, shared mempolicy
> nodes) are frozen, or
>
> - all eligible frozen threads use <5% of memory whereas all other
> eligible killable threads use 1% of available memory.
>
> and that means the oom killer will repeatedly select those threads and the
> livelock still exists unless you can guarantee that they are successfully
> thawed, that thawing them in all situations is safe, and that once thawed
> they will make a timely exit.
>
> Additionally, I don't think biasing against frozen tasks makes sense from
> a heusritic standpoint of the oom killer. Why would we want give
> non-frozen tasks that are actually getting work done a preference over a
> task that is frozen and doing absolutely nothing? It seems like that's
> backwards and that we'd actually prefer killing the task doing nothing so
> it can free its memory.
>
I agree with David.
Why don't you set oom_score_adj as -1000 for processes which never should die ?
You don't freeze processes via user-land using cgroup ?
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 9:25 ` KAMEZAWA Hiroyuki
@ 2011-09-26 9:32 ` Michal Hocko
0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2011-09-26 9:32 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: David Rientjes, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, Rafael J. Wysocki,
Tejun Heo, Rusty Russell
On Mon 26-09-11 18:25:36, KAMEZAWA Hiroyuki wrote:
> On Mon, 26 Sep 2011 11:14:40 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
>
> > On Mon 26-09-11 01:56:57, David Rientjes wrote:
> > > On Mon, 26 Sep 2011, Michal Hocko wrote:
> > >
> > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > > index 626303b..b9774f3 100644
> > > > --- a/mm/oom_kill.c
> > > > +++ b/mm/oom_kill.c
> > > > @@ -32,6 +32,7 @@
> > > > #include <linux/mempolicy.h>
> > > > #include <linux/security.h>
> > > > #include <linux/ptrace.h>
> > > > +#include <linux/freezer.h>
> > > >
> > > > int sysctl_panic_on_oom;
> > > > int sysctl_oom_kill_allocating_task;
> > > > @@ -451,6 +452,9 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> > > > task_pid_nr(q), q->comm);
> > > > task_unlock(q);
> > > > force_sig(SIGKILL, q);
> > > > +
> > > > + if (frozen(q))
> > > > + thaw_process(q);
> > > > }
> > > >
> > > > set_tsk_thread_flag(p, TIF_MEMDIE);
> > >
> > > This is in the wrong place, oom_kill_task() iterates over all threads that
> > > are _not_ in the same thread group as the chosen thread and kills them
> > > without giving them access to memory reserves. The chosen task, p, could
> > > still be frozen and may not exit.
> >
> > Ahh, right you are. I ave missed that one. Updated patch bellow.
> >
> > >
> > > Once that's fixed, feel free to add my
> > >
> > > Acked-by: David Rientjes <rientjes@google.com>
> >
> > Thanks
> >
> > >
> > > once Rafael sends his acked-by or reviewed-by.
> > ---
> > From f935ed4558c2fb033ef5c14e02b28e12a615f80e Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Fri, 16 Sep 2011 11:23:15 +0200
> > Subject: [PATCH] oom: do not live lock on frozen tasks
> >
> > Konstantin Khlebnikov has reported (https://lkml.org/lkml/2011/8/23/45)
> > that OOM can end up in a live lock if select_bad_process picks up a frozen
> > task.
> > Unfortunately we cannot mark such processes as unkillable to ignore them
> > because we could panic the system even though there is a chance that
> > somebody could thaw the process so we can make a forward process (e.g. a
> > process from another cpuset or with a different nodemask).
> >
> > Let's thaw an OOM selected frozen process right after we've sent fatal
> > signal from oom_kill_task.
> > Thawing is safe if the frozen task doesn't access any suspended device
> > (e.g. by ioctl) on the way out to the userspace where we handle the
> > signal and die. Note, we are not interested in the kernel threads because
> > they are not oom killable.
> >
> > Accessing suspended devices by a userspace processes shouldn't be an
> > issue because devices are suspended only after userspace is already
> > frozen and oom is disabled at that time.
> >
> > run_guest (drivers/lguest/core.c) calls try_to_freeze with an user
> > context but it seems it is able to cope with signals because it
> > explicitly checks for pending signals so we should be safe.
> >
> > Other than that userspace accesses the fridge only from the
> > signal handling routines so we are able to handle SIGKILL without any
> > negative side effects.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Thanks
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] oom: give bonus to frozen processes
2011-09-26 9:31 ` KAMEZAWA Hiroyuki
@ 2011-09-26 9:54 ` Michal Hocko
0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2011-09-26 9:54 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: David Rientjes, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, Rafael J. Wysocki
On Mon 26-09-11 18:31:15, KAMEZAWA Hiroyuki wrote:
> On Mon, 26 Sep 2011 02:02:59 -0700 (PDT)
> David Rientjes <rientjes@google.com> wrote:
>
> > On Mon, 26 Sep 2011, Michal Hocko wrote:
> >
> > > Let's try it with a heuristic change first. If you really do not like
> > > it, we can move to oom_scode_adj. I like the heuristic change little bit
> > > more because it is at the same place as the root bonus.
> >
> > The problem with the bonus is that, as mentioned previously, it doesn't
> > protect against ANYTHING for the case you're trying to fix.
Yes, it just makes this less probable.
> > This won't panic the machine because all killable threads are
> > guaranteed to have a non-zero badness score, but it's a very valid
> > configuration to have either
> >
> > - all eligible threads (system-wide, shared cpuset, shared mempolicy
> > nodes) are frozen, or
> >
> > - all eligible frozen threads use <5% of memory whereas all other
> > eligible killable threads use 1% of available memory.
> >
> > and that means the oom killer will repeatedly select those threads and the
> > livelock still exists unless you can guarantee that they are successfully
> > thawed, that thawing them in all situations is safe, and that once thawed
> > they will make a timely exit.
Yes, this is what the first patch is fixing. Thawed tasks should die
almost immediately because they are on the way to userspace anyway.
> >
> > Additionally, I don't think biasing against frozen tasks makes sense from
> > a heusritic standpoint of the oom killer. Why would we want give
> > non-frozen tasks that are actually getting work done a preference over a
> > task that is frozen and doing absolutely nothing?
Because frozen tasks are in that state usually (not considering suspend
path which has OOM disabled) based on an user request (via freezer
cgroup e.g.). I wouldn't be surprised if somebody relied on the D state
and that the task will not get killer.
> > It seems like that's backwards and that we'd actually prefer killing
> > the task doing nothing so it can free its memory.
> >
>
> I agree with David.
> Why don't you set oom_score_adj as -1000 for processes which never should die ?
It is little bit unintuitive to think about OOM killer when you just
want to debug your frozen application.
On the other hand I agree that adding a new heuristic for an use case
that is not entirely clear and which is not 100% anyway is not good.
So, please scratch this patch and let's wait for somebody with a valid
use case.
> You don't freeze processes via user-land using cgroup ?
That was exactly the use case I had in mind. Somebody using freezer
cgroup to freeze a task to debug it.
>
> Thanks,
> -Kame
>
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 8:28 ` [PATCH 1/2] oom: do not live lock on " Michal Hocko
2011-09-26 8:56 ` David Rientjes
@ 2011-09-26 10:28 ` Rusty Russell
2011-09-26 11:05 ` Michal Hocko
1 sibling, 1 reply; 44+ messages in thread
From: Rusty Russell @ 2011-09-26 10:28 UTC (permalink / raw)
To: Michal Hocko, David Rientjes
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki, Tejun Heo
On Mon, 26 Sep 2011 10:28:37 +0200, Michal Hocko <mhocko@suse.cz> wrote:
> On Fri 26-08-11 11:13:40, David Rientjes wrote:
> > I'd love to be able to do a thaw on a PF_FROZEN task in the oom killer
> > followed by a SIGKILL if that task is selected for oom kill without an
> > heuristic change. Not sure if that's possible, so we'll wait for Rafael
> > to chime in.
>
> We have discussed that with Rafael and it should be safe to do that. See
> the patch bellow.
> The only place I am not entirely sure about is run_guest
> (drivers/lguest/core.c). It seems that the code is able to cope with
> signals but it also calls lguest_arch_run_guest after try_to_freeze.
Yes; if you want to kill things in the refrigerator(), then will a
if (cpu->lg->dead || task_is_dead(current))
break;
Work? That break means we return to the read() syscall pretty much
immediately.
Thanks for the CC,
Rusty.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 10:28 ` Rusty Russell
@ 2011-09-26 11:05 ` Michal Hocko
2011-09-27 2:21 ` Rusty Russell
0 siblings, 1 reply; 44+ messages in thread
From: Michal Hocko @ 2011-09-26 11:05 UTC (permalink / raw)
To: Rusty Russell
Cc: David Rientjes, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki, Tejun Heo
On Mon 26-09-11 19:58:50, Rusty Russell wrote:
> On Mon, 26 Sep 2011 10:28:37 +0200, Michal Hocko <mhocko@suse.cz> wrote:
> > On Fri 26-08-11 11:13:40, David Rientjes wrote:
> > > I'd love to be able to do a thaw on a PF_FROZEN task in the oom killer
> > > followed by a SIGKILL if that task is selected for oom kill without an
> > > heuristic change. Not sure if that's possible, so we'll wait for Rafael
> > > to chime in.
> >
> > We have discussed that with Rafael and it should be safe to do that. See
> > the patch bellow.
> > The only place I am not entirely sure about is run_guest
> > (drivers/lguest/core.c). It seems that the code is able to cope with
> > signals but it also calls lguest_arch_run_guest after try_to_freeze.
>
> Yes; if you want to kill things in the refrigerator(), then will a
>
> if (cpu->lg->dead || task_is_dead(current))
> break;
>
> Work?
The task is not dead yet. We should rather check for pending signals.
Can we just move try_to_freeze up before the pending signals check?
diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 2535933..a513509 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -232,6 +232,12 @@ int run_guest(struct lg_cpu *cpu, unsigned long __user *user)
}
}
+ /*
+ * All long-lived kernel loops need to check with this horrible
+ * thing called the freezer. If the Host is trying to suspend,
+ * it stops us.
+ */
+ try_to_freeze();
/* Check for signals */
if (signal_pending(current))
return -ERESTARTSYS;
@@ -246,13 +252,6 @@ int run_guest(struct lg_cpu *cpu, unsigned long __user *user)
try_deliver_interrupt(cpu, irq, more);
/*
- * All long-lived kernel loops need to check with this horrible
- * thing called the freezer. If the Host is trying to suspend,
- * it stops us.
- */
- try_to_freeze();
-
- /*
* Just make absolutely sure the Guest is still alive. One of
* those hypercalls could have been fatal, for example.
*/
> That break means we return to the read() syscall pretty much
> immediately.
>
> Thanks for the CC,
> Rusty.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 9:14 ` Michal Hocko
2011-09-26 9:25 ` KAMEZAWA Hiroyuki
@ 2011-09-26 15:51 ` Rafael J. Wysocki
2011-09-26 18:28 ` Michal Hocko
2011-09-27 1:03 ` David Rientjes
1 sibling, 2 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2011-09-26 15:51 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Tejun Heo, Rusty Russell
On Monday, September 26, 2011, Michal Hocko wrote:
> On Mon 26-09-11 01:56:57, David Rientjes wrote:
> > On Mon, 26 Sep 2011, Michal Hocko wrote:
> >
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 626303b..b9774f3 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -32,6 +32,7 @@
> > > #include <linux/mempolicy.h>
> > > #include <linux/security.h>
> > > #include <linux/ptrace.h>
> > > +#include <linux/freezer.h>
> > >
> > > int sysctl_panic_on_oom;
> > > int sysctl_oom_kill_allocating_task;
> > > @@ -451,6 +452,9 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> > > task_pid_nr(q), q->comm);
> > > task_unlock(q);
> > > force_sig(SIGKILL, q);
> > > +
> > > + if (frozen(q))
> > > + thaw_process(q);
> > > }
> > >
> > > set_tsk_thread_flag(p, TIF_MEMDIE);
> >
> > This is in the wrong place, oom_kill_task() iterates over all threads that
> > are _not_ in the same thread group as the chosen thread and kills them
> > without giving them access to memory reserves. The chosen task, p, could
> > still be frozen and may not exit.
>
> Ahh, right you are. I ave missed that one. Updated patch bellow.
>
> >
> > Once that's fixed, feel free to add my
> >
> > Acked-by: David Rientjes <rientjes@google.com>
>
> Thanks
>
> >
> > once Rafael sends his acked-by or reviewed-by.
> ---
> From f935ed4558c2fb033ef5c14e02b28e12a615f80e Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Fri, 16 Sep 2011 11:23:15 +0200
> Subject: [PATCH] oom: do not live lock on frozen tasks
>
> Konstantin Khlebnikov has reported (https://lkml.org/lkml/2011/8/23/45)
> that OOM can end up in a live lock if select_bad_process picks up a frozen
> task.
> Unfortunately we cannot mark such processes as unkillable to ignore them
> because we could panic the system even though there is a chance that
> somebody could thaw the process so we can make a forward process (e.g. a
> process from another cpuset or with a different nodemask).
>
> Let's thaw an OOM selected frozen process right after we've sent fatal
> signal from oom_kill_task.
> Thawing is safe if the frozen task doesn't access any suspended device
> (e.g. by ioctl) on the way out to the userspace where we handle the
> signal and die. Note, we are not interested in the kernel threads because
> they are not oom killable.
>
> Accessing suspended devices by a userspace processes shouldn't be an
> issue because devices are suspended only after userspace is already
> frozen and oom is disabled at that time.
>
> run_guest (drivers/lguest/core.c) calls try_to_freeze with an user
> context but it seems it is able to cope with signals because it
> explicitly checks for pending signals so we should be safe.
>
> Other than that userspace accesses the fridge only from the
> signal handling routines so we are able to handle SIGKILL without any
> negative side effects.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> mm/oom_kill.c | 6 ++++++
> 1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 626303b..c419a7e 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -32,6 +32,7 @@
> #include <linux/mempolicy.h>
> #include <linux/security.h>
> #include <linux/ptrace.h>
> +#include <linux/freezer.h>
>
> int sysctl_panic_on_oom;
> int sysctl_oom_kill_allocating_task;
> @@ -451,10 +452,15 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> task_pid_nr(q), q->comm);
> task_unlock(q);
> force_sig(SIGKILL, q);
> +
> + if (frozen(q))
> + thaw_process(q);
> }
>
> set_tsk_thread_flag(p, TIF_MEMDIE);
> force_sig(SIGKILL, p);
> + if (frozen(p))
> + thaw_process(p);
>
> return 0;
> }
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 15:51 ` Rafael J. Wysocki
@ 2011-09-26 18:28 ` Michal Hocko
2011-09-27 1:03 ` David Rientjes
1 sibling, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2011-09-26 18:28 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: David Rientjes, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Tejun Heo, Rusty Russell
On Mon 26-09-11 17:51:40, Rafael J. Wysocki wrote:
> On Monday, September 26, 2011, Michal Hocko wrote:
[...]
> > From f935ed4558c2fb033ef5c14e02b28e12a615f80e Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Fri, 16 Sep 2011 11:23:15 +0200
> > Subject: [PATCH] oom: do not live lock on frozen tasks
> >
> > Konstantin Khlebnikov has reported (https://lkml.org/lkml/2011/8/23/45)
> > that OOM can end up in a live lock if select_bad_process picks up a frozen
> > task.
> > Unfortunately we cannot mark such processes as unkillable to ignore them
> > because we could panic the system even though there is a chance that
> > somebody could thaw the process so we can make a forward process (e.g. a
> > process from another cpuset or with a different nodemask).
> >
> > Let's thaw an OOM selected frozen process right after we've sent fatal
> > signal from oom_kill_task.
> > Thawing is safe if the frozen task doesn't access any suspended device
> > (e.g. by ioctl) on the way out to the userspace where we handle the
> > signal and die. Note, we are not interested in the kernel threads because
> > they are not oom killable.
> >
> > Accessing suspended devices by a userspace processes shouldn't be an
> > issue because devices are suspended only after userspace is already
> > frozen and oom is disabled at that time.
> >
> > run_guest (drivers/lguest/core.c) calls try_to_freeze with an user
> > context but it seems it is able to cope with signals because it
> > explicitly checks for pending signals so we should be safe.
> >
> > Other than that userspace accesses the fridge only from the
> > signal handling routines so we are able to handle SIGKILL without any
> > negative side effects.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
>
> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Thanks.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 15:51 ` Rafael J. Wysocki
2011-09-26 18:28 ` Michal Hocko
@ 2011-09-27 1:03 ` David Rientjes
2011-09-27 7:52 ` Michal Hocko
1 sibling, 1 reply; 44+ messages in thread
From: David Rientjes @ 2011-09-27 1:03 UTC (permalink / raw)
To: Rafael J. Wysocki, Michal Hocko
Cc: Oleg Nesterov, Konstantin Khlebnikov, linux-mm, Andrew Morton,
linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki, Tejun Heo,
Rusty Russell
On Mon, 26 Sep 2011, Rafael J. Wysocki wrote:
> > Konstantin Khlebnikov has reported (https://lkml.org/lkml/2011/8/23/45)
> > that OOM can end up in a live lock if select_bad_process picks up a frozen
> > task.
> > Unfortunately we cannot mark such processes as unkillable to ignore them
> > because we could panic the system even though there is a chance that
> > somebody could thaw the process so we can make a forward process (e.g. a
> > process from another cpuset or with a different nodemask).
> >
> > Let's thaw an OOM selected frozen process right after we've sent fatal
> > signal from oom_kill_task.
> > Thawing is safe if the frozen task doesn't access any suspended device
> > (e.g. by ioctl) on the way out to the userspace where we handle the
> > signal and die. Note, we are not interested in the kernel threads because
> > they are not oom killable.
> >
> > Accessing suspended devices by a userspace processes shouldn't be an
> > issue because devices are suspended only after userspace is already
> > frozen and oom is disabled at that time.
> >
> > run_guest (drivers/lguest/core.c) calls try_to_freeze with an user
> > context but it seems it is able to cope with signals because it
> > explicitly checks for pending signals so we should be safe.
> >
> > Other than that userspace accesses the fridge only from the
> > signal handling routines so we are able to handle SIGKILL without any
> > negative side effects.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
>
> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
>
Acked-by: David Rientjes <rientjes@google.com>
Although this still seems to be problematic if the chosen thread gets
frozen before the SIGKILL can be handled. We don't have any checks for
fatal_signal_pending() when freezing threads and waiting for them to exit?
Michal, could you send Andrew your revised patch with all the acked-bys?
Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-26 11:05 ` Michal Hocko
@ 2011-09-27 2:21 ` Rusty Russell
2011-09-27 7:03 ` [PATCH] lguest: move process freezing before pending signals check Michal Hocko
0 siblings, 1 reply; 44+ messages in thread
From: Rusty Russell @ 2011-09-27 2:21 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki, Tejun Heo
On Mon, 26 Sep 2011 13:05:59 +0200, Michal Hocko <mhocko@suse.cz> wrote:
> On Mon 26-09-11 19:58:50, Rusty Russell wrote:
> > On Mon, 26 Sep 2011 10:28:37 +0200, Michal Hocko <mhocko@suse.cz> wrote:
> > > On Fri 26-08-11 11:13:40, David Rientjes wrote:
> > > > I'd love to be able to do a thaw on a PF_FROZEN task in the oom killer
> > > > followed by a SIGKILL if that task is selected for oom kill without an
> > > > heuristic change. Not sure if that's possible, so we'll wait for Rafael
> > > > to chime in.
> > >
> > > We have discussed that with Rafael and it should be safe to do that. See
> > > the patch bellow.
> > > The only place I am not entirely sure about is run_guest
> > > (drivers/lguest/core.c). It seems that the code is able to cope with
> > > signals but it also calls lguest_arch_run_guest after try_to_freeze.
> >
> > Yes; if you want to kill things in the refrigerator(), then will a
> >
> > if (cpu->lg->dead || task_is_dead(current))
> > break;
> >
> > Work?
>
> The task is not dead yet. We should rather check for pending signals.
> Can we just move try_to_freeze up before the pending signals check?
Yep, that works.
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cheers,
Rusty.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH] lguest: move process freezing before pending signals check
2011-09-27 2:21 ` Rusty Russell
@ 2011-09-27 7:03 ` Michal Hocko
0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2011-09-27 7:03 UTC (permalink / raw)
To: Rusty Russell
Cc: David Rientjes, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Rafael J. Wysocki, Tejun Heo
On Tue 27-09-11 11:51:00, Rusty Russell wrote:
> On Mon, 26 Sep 2011 13:05:59 +0200, Michal Hocko <mhocko@suse.cz> wrote:
> > On Mon 26-09-11 19:58:50, Rusty Russell wrote:
> > > On Mon, 26 Sep 2011 10:28:37 +0200, Michal Hocko <mhocko@suse.cz> wrote:
> > > > On Fri 26-08-11 11:13:40, David Rientjes wrote:
> > > > > I'd love to be able to do a thaw on a PF_FROZEN task in the oom killer
> > > > > followed by a SIGKILL if that task is selected for oom kill without an
> > > > > heuristic change. Not sure if that's possible, so we'll wait for Rafael
> > > > > to chime in.
> > > >
> > > > We have discussed that with Rafael and it should be safe to do that. See
> > > > the patch bellow.
> > > > The only place I am not entirely sure about is run_guest
> > > > (drivers/lguest/core.c). It seems that the code is able to cope with
> > > > signals but it also calls lguest_arch_run_guest after try_to_freeze.
> > >
> > > Yes; if you want to kill things in the refrigerator(), then will a
> > >
> > > if (cpu->lg->dead || task_is_dead(current))
> > > break;
> > >
> > > Work?
> >
> > The task is not dead yet. We should rather check for pending signals.
> > Can we just move try_to_freeze up before the pending signals check?
>
> Yep, that works.
>
> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Thanks.
The full patch bellow:
---
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-27 1:03 ` David Rientjes
@ 2011-09-27 7:52 ` Michal Hocko
2011-09-27 18:30 ` David Rientjes
0 siblings, 1 reply; 44+ messages in thread
From: Michal Hocko @ 2011-09-27 7:52 UTC (permalink / raw)
To: David Rientjes
Cc: Rafael J. Wysocki, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Tejun Heo, Rusty Russell
On Mon 26-09-11 18:03:26, David Rientjes wrote:
> On Mon, 26 Sep 2011, Rafael J. Wysocki wrote:
>
> > > Konstantin Khlebnikov has reported (https://lkml.org/lkml/2011/8/23/45)
> > > that OOM can end up in a live lock if select_bad_process picks up a frozen
> > > task.
> > > Unfortunately we cannot mark such processes as unkillable to ignore them
> > > because we could panic the system even though there is a chance that
> > > somebody could thaw the process so we can make a forward process (e.g. a
> > > process from another cpuset or with a different nodemask).
> > >
> > > Let's thaw an OOM selected frozen process right after we've sent fatal
> > > signal from oom_kill_task.
> > > Thawing is safe if the frozen task doesn't access any suspended device
> > > (e.g. by ioctl) on the way out to the userspace where we handle the
> > > signal and die. Note, we are not interested in the kernel threads because
> > > they are not oom killable.
> > >
> > > Accessing suspended devices by a userspace processes shouldn't be an
> > > issue because devices are suspended only after userspace is already
> > > frozen and oom is disabled at that time.
> > >
> > > run_guest (drivers/lguest/core.c) calls try_to_freeze with an user
> > > context but it seems it is able to cope with signals because it
> > > explicitly checks for pending signals so we should be safe.
> > >
> > > Other than that userspace accesses the fridge only from the
> > > signal handling routines so we are able to handle SIGKILL without any
> > > negative side effects.
> > >
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> >
> > Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
> >
>
> Acked-by: David Rientjes <rientjes@google.com>
Thanks!
>
> Although this still seems to be problematic if the chosen thread gets
> frozen before the SIGKILL can be handled. We don't have any checks for
> fatal_signal_pending() when freezing threads and waiting for them to exit?
I guess you mean a situation when select_bad_process picks up a process
which is not marked as frozen yet but we send SIGKILL right before
schedule is called in refrigerator.
In that case either schedule should catch it by signal_pending_state
check or we will pick it up next OOM round when we pick up the same
process (if nothing else is eligible). Or am I missing something?
> Michal, could you send Andrew your revised patch with all the acked-bys?
Yes I will. I would just like to hear back from Konstantin who
originally reported the issue. Maybe he has a test case.
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] oom: do not live lock on frozen tasks
2011-09-27 7:52 ` Michal Hocko
@ 2011-09-27 18:30 ` David Rientjes
0 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2011-09-27 18:30 UTC (permalink / raw)
To: Michal Hocko
Cc: Rafael J. Wysocki, Oleg Nesterov, Konstantin Khlebnikov, linux-mm,
Andrew Morton, linux-kernel, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
Tejun Heo, Rusty Russell
On Tue, 27 Sep 2011, Michal Hocko wrote:
> I guess you mean a situation when select_bad_process picks up a process
> which is not marked as frozen yet but we send SIGKILL right before
> schedule is called in refrigerator.
> In that case either schedule should catch it by signal_pending_state
> check or we will pick it up next OOM round when we pick up the same
> process (if nothing else is eligible). Or am I missing something?
>
That doesn't close the race, the oom killer will see the presence of an
eligible TIF_MEMDIE thread in select_bad_process() and simply return to
the page allocator. You'd need to thaw it there as well and hope that
nothing now or in the future will get into an endless thaw-freeze-thaw
loop in the exit path.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2011-09-27 18:30 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-23 8:31 [PATCH] oom: skip frozen tasks Konstantin Khlebnikov
2011-08-23 9:15 ` KAMEZAWA Hiroyuki
2011-08-23 13:46 ` Michal Hocko
2011-08-23 20:18 ` David Rientjes
2011-08-24 10:19 ` Michal Hocko
2011-08-24 19:31 ` David Rientjes
2011-08-25 9:19 ` Michal Hocko
2011-08-25 15:18 ` Oleg Nesterov
2011-08-25 16:47 ` Michal Hocko
2011-08-25 21:14 ` David Rientjes
2011-08-26 7:09 ` Michal Hocko
2011-08-26 8:56 ` Michal Hocko
2011-08-26 9:21 ` David Rientjes
2011-08-26 9:53 ` Michal Hocko
2011-08-26 11:01 ` Michal Hocko
2011-08-26 18:13 ` David Rientjes
2011-09-26 8:28 ` [PATCH 1/2] oom: do not live lock on " Michal Hocko
2011-09-26 8:56 ` David Rientjes
2011-09-26 9:14 ` Michal Hocko
2011-09-26 9:25 ` KAMEZAWA Hiroyuki
2011-09-26 9:32 ` Michal Hocko
2011-09-26 15:51 ` Rafael J. Wysocki
2011-09-26 18:28 ` Michal Hocko
2011-09-27 1:03 ` David Rientjes
2011-09-27 7:52 ` Michal Hocko
2011-09-27 18:30 ` David Rientjes
2011-09-26 10:28 ` Rusty Russell
2011-09-26 11:05 ` Michal Hocko
2011-09-27 2:21 ` Rusty Russell
2011-09-27 7:03 ` [PATCH] lguest: move process freezing before pending signals check Michal Hocko
2011-09-26 8:35 ` [PATCH 2/2] oom: give bonus to frozen processes Michal Hocko
2011-09-26 9:02 ` David Rientjes
2011-09-26 9:31 ` KAMEZAWA Hiroyuki
2011-09-26 9:54 ` Michal Hocko
2011-08-26 21:03 ` [PATCH] oom: skip frozen tasks Rafael J. Wysocki
2011-08-26 10:03 ` Konstantin Khlebnikov
2011-08-26 10:48 ` Michal Hocko
2011-08-26 12:44 ` Konstantin Khlebnikov
2011-08-26 12:59 ` Michal Hocko
2011-08-26 7:35 ` Konstantin Khlebnikov
2011-08-26 9:09 ` David Rientjes
2011-08-26 9:59 ` Konstantin Khlebnikov
2011-08-26 18:09 ` David Rientjes
2011-08-25 21:03 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).