* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-28 16:48 ` Luis Claudio R. Goncalves
0 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-05-28 16:48 UTC (permalink / raw)
To: Minchan Kim
Cc: KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, David Rientjes, Mel Gorman,
williams
On Sat, May 29, 2010 at 12:45:49AM +0900, Minchan Kim wrote:
| On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
| > On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
...
| > | I think highest RT proirity ins't good solution.
| > | As I mentiond, Some RT functions don't want to be preempted by other processes
| > | which cause memory pressure. It makes RT task broken.
| >
| > For the RT case, if you reached a system OOM situation, your determinism has
| > already been hurt. If the memcg OOM happens on the same memcg your RT task
| > is - what will probably be the case most of time - again, the determinism
| > has deteriorated. For both these cases, giving the dying task SCHED_FIFO
| > MAX_RT_PRIO-1 means a faster recovery.
|
| What I want to say is that determinisic has no relation with OOM.
| Why is some RT task affected by other process's OOM?
|
| Of course, if system has no memory, it is likely to slow down RT task.
| But it's just only thought. If some task scheduled just is exit, we don't need
| to raise OOMed task's priority.
|
| But raising min rt priority on your patch was what I want.
| It doesn't preempt any RT task.
|
| So until now, I have made noise about your patch.
| Really, sorry for that.
| I don't have any objection on raising priority part from now on.
This is the third version of the patch, factoring in your input along with
Peter's comment. Basically the same patch, but using the lowest RT priority
to boost the dying task.
Thanks again for reviewing and commenting.
Luis
oom-killer: give the dying task rt priority (v3)
Give the dying task RT priority so that it can be scheduled quickly and die,
freeing needed memory.
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 84bbba2..2b0204f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
*/
static void __oom_kill_task(struct task_struct *p, int verbose)
{
+ struct sched_param param;
+
if (is_global_init(p)) {
WARN_ON(1);
printk(KERN_WARNING "tried to kill init!\n");
@@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
* exit() and clear out its resources quickly...
*/
p->time_slice = HZ;
+ param.sched_priority = MAX_RT_PRIO-10;
+ sched_setscheduler(p, SCHED_FIFO, ¶m);
set_tsk_thread_flag(p, TIF_MEMDIE);
force_sig(SIGKILL, p);
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-28 16:48 ` Luis Claudio R. Goncalves
@ 2010-05-29 3:59 ` KOSAKI Motohiro
-1 siblings, 0 replies; 110+ messages in thread
From: KOSAKI Motohiro @ 2010-05-29 3:59 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: kosaki.motohiro, Minchan Kim, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
Hi
> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
Almostly acceptable to me. but I have two requests,
- need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
- don't boost priority if it's in mem_cgroup_out_of_memory()
Can you accept this? if not, can you please explain the reason?
Thanks.
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;
> + sched_setscheduler(p, SCHED_FIFO, ¶m);
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
> force_sig(SIGKILL, p);
> --
> [ Luis Claudio R. Goncalves Bass - Gospel - RT ]
> [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-29 3:59 ` KOSAKI Motohiro
0 siblings, 0 replies; 110+ messages in thread
From: KOSAKI Motohiro @ 2010-05-29 3:59 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: kosaki.motohiro, Minchan Kim, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
Hi
> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
Almostly acceptable to me. but I have two requests,
- need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
- don't boost priority if it's in mem_cgroup_out_of_memory()
Can you accept this? if not, can you please explain the reason?
Thanks.
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;
> + sched_setscheduler(p, SCHED_FIFO, ¶m);
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
> force_sig(SIGKILL, p);
> --
> [ Luis Claudio R. Goncalves Bass - Gospel - RT ]
> [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-29 3:59 ` KOSAKI Motohiro
@ 2010-05-31 2:15 ` Luis Claudio R. Goncalves
-1 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-05-31 2:15 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, David Rientjes, Mel Gorman,
williams
On Sat, May 29, 2010 at 12:59:09PM +0900, KOSAKI Motohiro wrote:
| Hi
|
| > oom-killer: give the dying task rt priority (v3)
| >
| > Give the dying task RT priority so that it can be scheduled quickly and die,
| > freeing needed memory.
| >
| > Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
|
| Almostly acceptable to me. but I have two requests,
|
| - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
| - don't boost priority if it's in mem_cgroup_out_of_memory()
|
| Can you accept this? if not, can you please explain the reason?
|
| Thanks.
The last patch I posted was the wrong patch from my queue. Sorry for the
confusion. Here is the last version of the patch, including the suggestions
from Oleg, Peter and Kosaki Motohiro:
oom-kill: give the dying task a higher priority (v4)
In a system under heavy load it was observed that even after the
oom-killer selects a task to die, the task may take a long time to die.
Right before sending a SIGKILL to the task selected by the oom-killer
this task has it's priority increased so that it can exit() exit soon,
freeing memory. That is accomplished by:
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
It sounds plausible giving the dying task an even higher priority to be
sure it will be scheduled sooner and free the desired memory. It was
suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that this
task won't interfere with any running RT task.
Another good suggestion, implemented here, was to avoid boosting the dying
task priority in case of mem_cgroup OOM.
Signed-off-by: Luis Claudio R. Gonçalves <lclaudio@uudg.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 709aedf..6a25293 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -380,7 +380,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
* flag though it's unlikely that we select a process with CAP_SYS_RAW_IO
* set.
*/
-static void __oom_kill_task(struct task_struct *p, int verbose)
+static void __oom_kill_task(struct task_struct *p, struct mem_cgroup *mem,
+ int verbose)
{
if (is_global_init(p)) {
WARN_ON(1);
@@ -413,11 +414,20 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
-
force_sig(SIGKILL, p);
+ /*
+ * If this is a system OOM (not a memcg OOM), speed up the recovery
+ * by boosting the dying task priority to the lowest FIFO priority.
+ * That helps with the recovery and avoids interfering with RT tasks.
+ */
+ if (mem == NULL) {
+ struct sched_param param;
+ param.sched_priority = 1;
+ sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
+ }
}
-static int oom_kill_task(struct task_struct *p)
+static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
{
/* WARNING: mm may not be dereferenced since we did not obtain its
* value from get_task_mm(p). This is OK since all we need to do is
@@ -430,7 +440,7 @@ static int oom_kill_task(struct task_struct *p)
if (!p->mm || p->signal->oom_adj == OOM_DISABLE)
return 1;
- __oom_kill_task(p, 1);
+ __oom_kill_task(p, mem, 1);
return 0;
}
@@ -449,7 +459,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
* its children or threads, just set TIF_MEMDIE so it can die quickly
*/
if (p->flags & PF_EXITING) {
- __oom_kill_task(p, 0);
+ __oom_kill_task(p, mem, 0);
return 0;
}
@@ -462,10 +472,10 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
continue;
if (mem && !task_in_mem_cgroup(c, mem))
continue;
- if (!oom_kill_task(c))
+ if (!oom_kill_task(c, mem))
return 0;
}
- return oom_kill_task(p);
+ return oom_kill_task(p, mem);
}
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
^ permalink raw reply related [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 2:15 ` Luis Claudio R. Goncalves
0 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-05-31 2:15 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, David Rientjes, Mel Gorman,
williams
On Sat, May 29, 2010 at 12:59:09PM +0900, KOSAKI Motohiro wrote:
| Hi
|
| > oom-killer: give the dying task rt priority (v3)
| >
| > Give the dying task RT priority so that it can be scheduled quickly and die,
| > freeing needed memory.
| >
| > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
|
| Almostly acceptable to me. but I have two requests,
|
| - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
| - don't boost priority if it's in mem_cgroup_out_of_memory()
|
| Can you accept this? if not, can you please explain the reason?
|
| Thanks.
The last patch I posted was the wrong patch from my queue. Sorry for the
confusion. Here is the last version of the patch, including the suggestions
from Oleg, Peter and Kosaki Motohiro:
oom-kill: give the dying task a higher priority (v4)
In a system under heavy load it was observed that even after the
oom-killer selects a task to die, the task may take a long time to die.
Right before sending a SIGKILL to the task selected by the oom-killer
this task has it's priority increased so that it can exit() exit soon,
freeing memory. That is accomplished by:
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
It sounds plausible giving the dying task an even higher priority to be
sure it will be scheduled sooner and free the desired memory. It was
suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that this
task won't interfere with any running RT task.
Another good suggestion, implemented here, was to avoid boosting the dying
task priority in case of mem_cgroup OOM.
Signed-off-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 709aedf..6a25293 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -380,7 +380,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
* flag though it's unlikely that we select a process with CAP_SYS_RAW_IO
* set.
*/
-static void __oom_kill_task(struct task_struct *p, int verbose)
+static void __oom_kill_task(struct task_struct *p, struct mem_cgroup *mem,
+ int verbose)
{
if (is_global_init(p)) {
WARN_ON(1);
@@ -413,11 +414,20 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
-
force_sig(SIGKILL, p);
+ /*
+ * If this is a system OOM (not a memcg OOM), speed up the recovery
+ * by boosting the dying task priority to the lowest FIFO priority.
+ * That helps with the recovery and avoids interfering with RT tasks.
+ */
+ if (mem == NULL) {
+ struct sched_param param;
+ param.sched_priority = 1;
+ sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
+ }
}
-static int oom_kill_task(struct task_struct *p)
+static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
{
/* WARNING: mm may not be dereferenced since we did not obtain its
* value from get_task_mm(p). This is OK since all we need to do is
@@ -430,7 +440,7 @@ static int oom_kill_task(struct task_struct *p)
if (!p->mm || p->signal->oom_adj == OOM_DISABLE)
return 1;
- __oom_kill_task(p, 1);
+ __oom_kill_task(p, mem, 1);
return 0;
}
@@ -449,7 +459,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
* its children or threads, just set TIF_MEMDIE so it can die quickly
*/
if (p->flags & PF_EXITING) {
- __oom_kill_task(p, 0);
+ __oom_kill_task(p, mem, 0);
return 0;
}
@@ -462,10 +472,10 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
continue;
if (mem && !task_in_mem_cgroup(c, mem))
continue;
- if (!oom_kill_task(c))
+ if (!oom_kill_task(c, mem))
return 0;
}
- return oom_kill_task(p);
+ return oom_kill_task(p, mem);
}
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-29 3:59 ` KOSAKI Motohiro
@ 2010-05-31 5:06 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 5:06 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Luis Claudio R. Goncalves, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
Hi, Kosaki.
On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi
>
>> oom-killer: give the dying task rt priority (v3)
>>
>> Give the dying task RT priority so that it can be scheduled quickly and die,
>> freeing needed memory.
>>
>> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>
> Almostly acceptable to me. but I have two requests,
>
> - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
> - don't boost priority if it's in mem_cgroup_out_of_memory()
Why do you want to not boost priority if it's path of memcontrol?
If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
mem_cgroup_out_of_memory will select victim task in memcg.
So __oom_kill_task's target task would be in memcg, I think.
As you and memcg guys don't complain this, I would be missing something.
Could you explain it? :)
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 5:06 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 5:06 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Luis Claudio R. Goncalves, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
Hi, Kosaki.
On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi
>
>> oom-killer: give the dying task rt priority (v3)
>>
>> Give the dying task RT priority so that it can be scheduled quickly and die,
>> freeing needed memory.
>>
>> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>
> Almostly acceptable to me. but I have two requests,
>
> - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
> - don't boost priority if it's in mem_cgroup_out_of_memory()
Why do you want to not boost priority if it's path of memcontrol?
If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
mem_cgroup_out_of_memory will select victim task in memcg.
So __oom_kill_task's target task would be in memcg, I think.
As you and memcg guys don't complain this, I would be missing something.
Could you explain it? :)
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 5:06 ` Minchan Kim
@ 2010-05-31 6:35 ` KOSAKI Motohiro
-1 siblings, 0 replies; 110+ messages in thread
From: KOSAKI Motohiro @ 2010-05-31 6:35 UTC (permalink / raw)
To: Minchan Kim
Cc: kosaki.motohiro, Luis Claudio R. Goncalves, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
Hi
> Hi, Kosaki.
>
> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> > Hi
> >
> >> oom-killer: give the dying task rt priority (v3)
> >>
> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> freeing needed memory.
> >>
> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
> >
> > Almostly acceptable to me. but I have two requests,
> >
> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
> > - don't boost priority if it's in mem_cgroup_out_of_memory()
>
> Why do you want to not boost priority if it's path of memcontrol?
>
> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
> mem_cgroup_out_of_memory will select victim task in memcg.
> So __oom_kill_task's target task would be in memcg, I think.
Yep.
But priority boost naturally makes CPU starvation for out of the group
processes.
It seems to break cgroup's isolation concept.
> As you and memcg guys don't complain this, I would be missing something.
> Could you explain it? :)
So, My points are,
1) Usually priority boost is wrong idea. It have various side effect, but
system wide OOM is one of exception. In such case, all tasks aren't
runnable, then, the downside is acceptable.
2) memcg have OOM notification mechanism. If the admin need priority boost,
they can do it by their OOM-daemon.
Thanks.
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 6:35 ` KOSAKI Motohiro
0 siblings, 0 replies; 110+ messages in thread
From: KOSAKI Motohiro @ 2010-05-31 6:35 UTC (permalink / raw)
To: Minchan Kim
Cc: kosaki.motohiro, Luis Claudio R. Goncalves, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
Hi
> Hi, Kosaki.
>
> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> > Hi
> >
> >> oom-killer: give the dying task rt priority (v3)
> >>
> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> freeing needed memory.
> >>
> >> Signed-off-by: Luis Claudio R. GonA?alves <lgoncalv@redhat.com>
> >
> > Almostly acceptable to me. but I have two requests,
> >
> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
> > - don't boost priority if it's in mem_cgroup_out_of_memory()
>
> Why do you want to not boost priority if it's path of memcontrol?
>
> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
> mem_cgroup_out_of_memory will select victim task in memcg.
> So __oom_kill_task's target task would be in memcg, I think.
Yep.
But priority boost naturally makes CPU starvation for out of the group
processes.
It seems to break cgroup's isolation concept.
> As you and memcg guys don't complain this, I would be missing something.
> Could you explain it? :)
So, My points are,
1) Usually priority boost is wrong idea. It have various side effect, but
system wide OOM is one of exception. In such case, all tasks aren't
runnable, then, the downside is acceptable.
2) memcg have OOM notification mechanism. If the admin need priority boost,
they can do it by their OOM-daemon.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 6:35 ` KOSAKI Motohiro
@ 2010-05-31 7:05 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 7:05 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Luis Claudio R. Goncalves, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams, KAMEZAWA Hiroyuki
On Mon, May 31, 2010 at 3:35 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi
>
>> Hi, Kosaki.
>>
>> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> > Hi
>> >
>> >> oom-killer: give the dying task rt priority (v3)
>> >>
>> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> freeing needed memory.
>> >>
>> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >
>> > Almostly acceptable to me. but I have two requests,
>> >
>> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
>> > - don't boost priority if it's in mem_cgroup_out_of_memory()
>>
>> Why do you want to not boost priority if it's path of memcontrol?
>>
>> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
>> mem_cgroup_out_of_memory will select victim task in memcg.
>> So __oom_kill_task's target task would be in memcg, I think.
>
> Yep.
> But priority boost naturally makes CPU starvation for out of the group
> processes.
> It seems to break cgroup's isolation concept.
>
>> As you and memcg guys don't complain this, I would be missing something.
>> Could you explain it? :)
>
> So, My points are,
>
> 1) Usually priority boost is wrong idea. It have various side effect, but
> system wide OOM is one of exception. In such case, all tasks aren't
> runnable, then, the downside is acceptable.
> 2) memcg have OOM notification mechanism. If the admin need priority boost,
> they can do it by their OOM-daemon.
Is it possible kill the hogging task immediately when the daemon send
kill signal?
I mean we can make OOM daemon higher priority than others and it can
send signal to normal process. but when is normal process exited after
receiving kill signal from OOM daemon? Maybe it's when killed task is
executed by scheduler. It's same problem again, I think.
Kame, Do you have an idea?
> Thanks.
>
>
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 7:05 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 7:05 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Luis Claudio R. Goncalves, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams, KAMEZAWA Hiroyuki
On Mon, May 31, 2010 at 3:35 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi
>
>> Hi, Kosaki.
>>
>> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> > Hi
>> >
>> >> oom-killer: give the dying task rt priority (v3)
>> >>
>> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> freeing needed memory.
>> >>
>> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >
>> > Almostly acceptable to me. but I have two requests,
>> >
>> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
>> > - don't boost priority if it's in mem_cgroup_out_of_memory()
>>
>> Why do you want to not boost priority if it's path of memcontrol?
>>
>> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
>> mem_cgroup_out_of_memory will select victim task in memcg.
>> So __oom_kill_task's target task would be in memcg, I think.
>
> Yep.
> But priority boost naturally makes CPU starvation for out of the group
> processes.
> It seems to break cgroup's isolation concept.
>
>> As you and memcg guys don't complain this, I would be missing something.
>> Could you explain it? :)
>
> So, My points are,
>
> 1) Usually priority boost is wrong idea. It have various side effect, but
> system wide OOM is one of exception. In such case, all tasks aren't
> runnable, then, the downside is acceptable.
> 2) memcg have OOM notification mechanism. If the admin need priority boost,
> they can do it by their OOM-daemon.
Is it possible kill the hogging task immediately when the daemon send
kill signal?
I mean we can make OOM daemon higher priority than others and it can
send signal to normal process. but when is normal process exited after
receiving kill signal from OOM daemon? Maybe it's when killed task is
executed by scheduler. It's same problem again, I think.
Kame, Do you have an idea?
> Thanks.
>
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 7:05 ` Minchan Kim
@ 2010-05-31 7:25 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 7:25 UTC (permalink / raw)
To: Minchan Kim
Cc: KOSAKI Motohiro, Luis Claudio R. Goncalves, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, 31 May 2010 16:05:48 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> On Mon, May 31, 2010 at 3:35 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> > Hi
> >
> >> Hi, Kosaki.
> >>
> >> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
> >> <kosaki.motohiro@jp.fujitsu.com> wrote:
> >> > Hi
> >> >
> >> >> oom-killer: give the dying task rt priority (v3)
> >> >>
> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> >> freeing needed memory.
> >> >>
> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
> >> >
> >> > Almostly acceptable to me. but I have two requests,
> >> >
> >> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
> >> > - don't boost priority if it's in mem_cgroup_out_of_memory()
> >>
> >> Why do you want to not boost priority if it's path of memcontrol?
> >>
> >> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
> >> mem_cgroup_out_of_memory will select victim task in memcg.
> >> So __oom_kill_task's target task would be in memcg, I think.
> >
> > Yep.
> > But priority boost naturally makes CPU starvation for out of the group
> > processes.
> > It seems to break cgroup's isolation concept.
> >
> >> As you and memcg guys don't complain this, I would be missing something.
> >> Could you explain it? :)
> >
> > So, My points are,
> >
> > 1) Usually priority boost is wrong idea. It have various side effect, but
> > system wide OOM is one of exception. In such case, all tasks aren't
> > runnable, then, the downside is acceptable.
> > 2) memcg have OOM notification mechanism. If the admin need priority boost,
> > they can do it by their OOM-daemon.
>
> Is it possible kill the hogging task immediately when the daemon send
> kill signal?
> I mean we can make OOM daemon higher priority than others and it can
> send signal to normal process. but when is normal process exited after
> receiving kill signal from OOM daemon? Maybe it's when killed task is
> executed by scheduler. It's same problem again, I think.
>
> Kame, Do you have an idea?
>
This is just an idea and I have no implementaion, yet.
With memcg, oom situation can be recovered by "enlarging limit temporary".
Then, what the daemon has to do is
1. send signal (kill or other signal to abort for coredump.)
2. move a problematic task to a jail if necessary.
3. enlarge limit for indicating "Go"
4. After stabilization, reduce the limit.
This is the fastest. Admin has to think of extra-room or jails and
the daemon should be enough clever. But in most case, I think this works well.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 7:25 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 7:25 UTC (permalink / raw)
To: Minchan Kim
Cc: KOSAKI Motohiro, Luis Claudio R. Goncalves, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, 31 May 2010 16:05:48 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> On Mon, May 31, 2010 at 3:35 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> > Hi
> >
> >> Hi, Kosaki.
> >>
> >> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
> >> <kosaki.motohiro@jp.fujitsu.com> wrote:
> >> > Hi
> >> >
> >> >> oom-killer: give the dying task rt priority (v3)
> >> >>
> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> >> freeing needed memory.
> >> >>
> >> >> Signed-off-by: Luis Claudio R. GonA?alves <lgoncalv@redhat.com>
> >> >
> >> > Almostly acceptable to me. but I have two requests,
> >> >
> >> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
> >> > - don't boost priority if it's in mem_cgroup_out_of_memory()
> >>
> >> Why do you want to not boost priority if it's path of memcontrol?
> >>
> >> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
> >> mem_cgroup_out_of_memory will select victim task in memcg.
> >> So __oom_kill_task's target task would be in memcg, I think.
> >
> > Yep.
> > But priority boost naturally makes CPU starvation for out of the group
> > processes.
> > It seems to break cgroup's isolation concept.
> >
> >> As you and memcg guys don't complain this, I would be missing something.
> >> Could you explain it? :)
> >
> > So, My points are,
> >
> > 1) Usually priority boost is wrong idea. It have various side effect, but
> > A system wide OOM is one of exception. In such case, all tasks aren't
> > A runnable, then, the downside is acceptable.
> > 2) memcg have OOM notification mechanism. If the admin need priority boost,
> > A they can do it by their OOM-daemon.
>
> Is it possible kill the hogging task immediately when the daemon send
> kill signal?
> I mean we can make OOM daemon higher priority than others and it can
> send signal to normal process. but when is normal process exited after
> receiving kill signal from OOM daemon? Maybe it's when killed task is
> executed by scheduler. It's same problem again, I think.
>
> Kame, Do you have an idea?
>
This is just an idea and I have no implementaion, yet.
With memcg, oom situation can be recovered by "enlarging limit temporary".
Then, what the daemon has to do is
1. send signal (kill or other signal to abort for coredump.)
2. move a problematic task to a jail if necessary.
3. enlarge limit for indicating "Go"
4. After stabilization, reduce the limit.
This is the fastest. Admin has to think of extra-room or jails and
the daemon should be enough clever. But in most case, I think this works well.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 7:25 ` KAMEZAWA Hiroyuki
@ 2010-05-31 9:30 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 9:30 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro, Luis Claudio R. Goncalves, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, May 31, 2010 at 4:25 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 31 May 2010 16:05:48 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Mon, May 31, 2010 at 3:35 PM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> > Hi
>> >
>> >> Hi, Kosaki.
>> >>
>> >> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
>> >> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> >> > Hi
>> >> >
>> >> >> oom-killer: give the dying task rt priority (v3)
>> >> >>
>> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> >> freeing needed memory.
>> >> >>
>> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >> >
>> >> > Almostly acceptable to me. but I have two requests,
>> >> >
>> >> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
>> >> > - don't boost priority if it's in mem_cgroup_out_of_memory()
>> >>
>> >> Why do you want to not boost priority if it's path of memcontrol?
>> >>
>> >> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
>> >> mem_cgroup_out_of_memory will select victim task in memcg.
>> >> So __oom_kill_task's target task would be in memcg, I think.
>> >
>> > Yep.
>> > But priority boost naturally makes CPU starvation for out of the group
>> > processes.
>> > It seems to break cgroup's isolation concept.
>> >
>> >> As you and memcg guys don't complain this, I would be missing something.
>> >> Could you explain it? :)
>> >
>> > So, My points are,
>> >
>> > 1) Usually priority boost is wrong idea. It have various side effect, but
>> > system wide OOM is one of exception. In such case, all tasks aren't
>> > runnable, then, the downside is acceptable.
>> > 2) memcg have OOM notification mechanism. If the admin need priority boost,
>> > they can do it by their OOM-daemon.
>>
>> Is it possible kill the hogging task immediately when the daemon send
>> kill signal?
>> I mean we can make OOM daemon higher priority than others and it can
>> send signal to normal process. but when is normal process exited after
>> receiving kill signal from OOM daemon? Maybe it's when killed task is
>> executed by scheduler. It's same problem again, I think.
>>
>> Kame, Do you have an idea?
>>
> This is just an idea and I have no implementaion, yet.
>
> With memcg, oom situation can be recovered by "enlarging limit temporary".
> Then, what the daemon has to do is
>
> 1. send signal (kill or other signal to abort for coredump.)
> 2. move a problematic task to a jail if necessary.
> 3. enlarge limit for indicating "Go"
> 4. After stabilization, reduce the limit.
>
> This is the fastest. Admin has to think of extra-room or jails and
> the daemon should be enough clever. But in most case, I think this works well.
I think it is very hard that how much we have to make extra-room since
we can't expect how many tasks are stuck to allocate memory.
But tend to agree that system-wide OOM problem is more important than
memcg's one.
And memcg's guy doesn't seem to have any problem. So I am not against
this patch any more.
Thanks, Kosaki and Kame.
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 9:30 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 9:30 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro, Luis Claudio R. Goncalves, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, May 31, 2010 at 4:25 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 31 May 2010 16:05:48 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Mon, May 31, 2010 at 3:35 PM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> > Hi
>> >
>> >> Hi, Kosaki.
>> >>
>> >> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
>> >> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> >> > Hi
>> >> >
>> >> >> oom-killer: give the dying task rt priority (v3)
>> >> >>
>> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> >> freeing needed memory.
>> >> >>
>> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >> >
>> >> > Almostly acceptable to me. but I have two requests,
>> >> >
>> >> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
>> >> > - don't boost priority if it's in mem_cgroup_out_of_memory()
>> >>
>> >> Why do you want to not boost priority if it's path of memcontrol?
>> >>
>> >> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
>> >> mem_cgroup_out_of_memory will select victim task in memcg.
>> >> So __oom_kill_task's target task would be in memcg, I think.
>> >
>> > Yep.
>> > But priority boost naturally makes CPU starvation for out of the group
>> > processes.
>> > It seems to break cgroup's isolation concept.
>> >
>> >> As you and memcg guys don't complain this, I would be missing something.
>> >> Could you explain it? :)
>> >
>> > So, My points are,
>> >
>> > 1) Usually priority boost is wrong idea. It have various side effect, but
>> > system wide OOM is one of exception. In such case, all tasks aren't
>> > runnable, then, the downside is acceptable.
>> > 2) memcg have OOM notification mechanism. If the admin need priority boost,
>> > they can do it by their OOM-daemon.
>>
>> Is it possible kill the hogging task immediately when the daemon send
>> kill signal?
>> I mean we can make OOM daemon higher priority than others and it can
>> send signal to normal process. but when is normal process exited after
>> receiving kill signal from OOM daemon? Maybe it's when killed task is
>> executed by scheduler. It's same problem again, I think.
>>
>> Kame, Do you have an idea?
>>
> This is just an idea and I have no implementaion, yet.
>
> With memcg, oom situation can be recovered by "enlarging limit temporary".
> Then, what the daemon has to do is
>
> 1. send signal (kill or other signal to abort for coredump.)
> 2. move a problematic task to a jail if necessary.
> 3. enlarge limit for indicating "Go"
> 4. After stabilization, reduce the limit.
>
> This is the fastest. Admin has to think of extra-room or jails and
> the daemon should be enough clever. But in most case, I think this works well.
I think it is very hard that how much we have to make extra-room since
we can't expect how many tasks are stuck to allocate memory.
But tend to agree that system-wide OOM problem is more important than
memcg's one.
And memcg's guy doesn't seem to have any problem. So I am not against
this patch any more.
Thanks, Kosaki and Kame.
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-28 16:48 ` Luis Claudio R. Goncalves
@ 2010-05-30 15:09 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-30 15:09 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, David Rientjes, Mel Gorman,
williams
On Fri, May 28, 2010 at 01:48:26PM -0300, Luis Claudio R. Goncalves wrote:
> On Sat, May 29, 2010 at 12:45:49AM +0900, Minchan Kim wrote:
> | On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
> | > On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
> ...
> | > | I think highest RT proirity ins't good solution.
> | > | As I mentiond, Some RT functions don't want to be preempted by other processes
> | > | which cause memory pressure. It makes RT task broken.
> | >
> | > For the RT case, if you reached a system OOM situation, your determinism has
> | > already been hurt. If the memcg OOM happens on the same memcg your RT task
> | > is - what will probably be the case most of time - again, the determinism
> | > has deteriorated. For both these cases, giving the dying task SCHED_FIFO
> | > MAX_RT_PRIO-1 means a faster recovery.
> |
> | What I want to say is that determinisic has no relation with OOM.
> | Why is some RT task affected by other process's OOM?
> |
> | Of course, if system has no memory, it is likely to slow down RT task.
> | But it's just only thought. If some task scheduled just is exit, we don't need
> | to raise OOMed task's priority.
> |
> | But raising min rt priority on your patch was what I want.
> | It doesn't preempt any RT task.
> |
> | So until now, I have made noise about your patch.
> | Really, sorry for that.
> | I don't have any objection on raising priority part from now on.
>
> This is the third version of the patch, factoring in your input along with
> Peter's comment. Basically the same patch, but using the lowest RT priority
> to boost the dying task.
>
> Thanks again for reviewing and commenting.
> Luis
>
> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;
I can't understand your point, still.
Why you put the priority as "MAX_RT_PRIO - 10"?
What I and peter mentioned was "1" which is lowest RT priority.
> + sched_setscheduler(p, SCHED_FIFO, ¶m);
Why do you change sched_setscheduler_nocheck with sched_set_scheduler?
It means you can't boost prioity if current context doesn't have permission.
Is it a your intention?
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
> force_sig(SIGKILL, p);
> --
> [ Luis Claudio R. Goncalves Bass - Gospel - RT ]
> [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-30 15:09 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-30 15:09 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, David Rientjes, Mel Gorman,
williams
On Fri, May 28, 2010 at 01:48:26PM -0300, Luis Claudio R. Goncalves wrote:
> On Sat, May 29, 2010 at 12:45:49AM +0900, Minchan Kim wrote:
> | On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
> | > On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
> ...
> | > | I think highest RT proirity ins't good solution.
> | > | As I mentiond, Some RT functions don't want to be preempted by other processes
> | > | which cause memory pressure. It makes RT task broken.
> | >
> | > For the RT case, if you reached a system OOM situation, your determinism has
> | > already been hurt. If the memcg OOM happens on the same memcg your RT task
> | > is - what will probably be the case most of time - again, the determinism
> | > has deteriorated. For both these cases, giving the dying task SCHED_FIFO
> | > MAX_RT_PRIO-1 means a faster recovery.
> |
> | What I want to say is that determinisic has no relation with OOM.
> | Why is some RT task affected by other process's OOM?
> |
> | Of course, if system has no memory, it is likely to slow down RT task.
> | But it's just only thought. If some task scheduled just is exit, we don't need
> | to raise OOMed task's priority.
> |
> | But raising min rt priority on your patch was what I want.
> | It doesn't preempt any RT task.
> |
> | So until now, I have made noise about your patch.
> | Really, sorry for that.
> | I don't have any objection on raising priority part from now on.
>
> This is the third version of the patch, factoring in your input along with
> Peter's comment. Basically the same patch, but using the lowest RT priority
> to boost the dying task.
>
> Thanks again for reviewing and commenting.
> Luis
>
> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;
I can't understand your point, still.
Why you put the priority as "MAX_RT_PRIO - 10"?
What I and peter mentioned was "1" which is lowest RT priority.
> + sched_setscheduler(p, SCHED_FIFO, ¶m);
Why do you change sched_setscheduler_nocheck with sched_set_scheduler?
It means you can't boost prioity if current context doesn't have permission.
Is it a your intention?
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
> force_sig(SIGKILL, p);
> --
> [ Luis Claudio R. Goncalves Bass - Gospel - RT ]
> [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-28 16:48 ` Luis Claudio R. Goncalves
@ 2010-05-31 0:21 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 0:21 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Minchan Kim, KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
On Fri, 28 May 2010 13:48:26 -0300
"Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> On Sat, May 29, 2010 at 12:45:49AM +0900, Minchan Kim wrote:
> | On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
> | > On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
> ...
> | > | I think highest RT proirity ins't good solution.
> | > | As I mentiond, Some RT functions don't want to be preempted by other processes
> | > | which cause memory pressure. It makes RT task broken.
> | >
> | > For the RT case, if you reached a system OOM situation, your determinism has
> | > already been hurt. If the memcg OOM happens on the same memcg your RT task
> | > is - what will probably be the case most of time - again, the determinism
> | > has deteriorated. For both these cases, giving the dying task SCHED_FIFO
> | > MAX_RT_PRIO-1 means a faster recovery.
> |
> | What I want to say is that determinisic has no relation with OOM.
> | Why is some RT task affected by other process's OOM?
> |
> | Of course, if system has no memory, it is likely to slow down RT task.
> | But it's just only thought. If some task scheduled just is exit, we don't need
> | to raise OOMed task's priority.
> |
> | But raising min rt priority on your patch was what I want.
> | It doesn't preempt any RT task.
> |
> | So until now, I have made noise about your patch.
> | Really, sorry for that.
> | I don't have any objection on raising priority part from now on.
>
> This is the third version of the patch, factoring in your input along with
> Peter's comment. Basically the same patch, but using the lowest RT priority
> to boost the dying task.
>
> Thanks again for reviewing and commenting.
> Luis
>
> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;
> + sched_setscheduler(p, SCHED_FIFO, ¶m);
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
BTW, how about the other threads which share mm_struct ?
Thanks,
-Kame
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 0:21 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 0:21 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Minchan Kim, KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
On Fri, 28 May 2010 13:48:26 -0300
"Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> On Sat, May 29, 2010 at 12:45:49AM +0900, Minchan Kim wrote:
> | On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
> | > On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
> ...
> | > | I think highest RT proirity ins't good solution.
> | > | As I mentiond, Some RT functions don't want to be preempted by other processes
> | > | which cause memory pressure. It makes RT task broken.
> | >
> | > For the RT case, if you reached a system OOM situation, your determinism has
> | > already been hurt. If the memcg OOM happens on the same memcg your RT task
> | > is - what will probably be the case most of time - again, the determinism
> | > has deteriorated. For both these cases, giving the dying task SCHED_FIFO
> | > MAX_RT_PRIO-1 means a faster recovery.
> |
> | What I want to say is that determinisic has no relation with OOM.
> | Why is some RT task affected by other process's OOM?
> |
> | Of course, if system has no memory, it is likely to slow down RT task.
> | But it's just only thought. If some task scheduled just is exit, we don't need
> | to raise OOMed task's priority.
> |
> | But raising min rt priority on your patch was what I want.
> | It doesn't preempt any RT task.
> |
> | So until now, I have made noise about your patch.
> | Really, sorry for that.
> | I don't have any objection on raising priority part from now on.
>
> This is the third version of the patch, factoring in your input along with
> Peter's comment. Basically the same patch, but using the lowest RT priority
> to boost the dying task.
>
> Thanks again for reviewing and commenting.
> Luis
>
> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. GonA?alves <lgoncalv@redhat.com>
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;
> + sched_setscheduler(p, SCHED_FIFO, ¶m);
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
BTW, how about the other threads which share mm_struct ?
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 0:21 ` KAMEZAWA Hiroyuki
@ 2010-05-31 5:01 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 5:01 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
Hi, Kame.
On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 28 May 2010 13:48:26 -0300
> "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
>>
>> oom-killer: give the dying task rt priority (v3)
>>
>> Give the dying task RT priority so that it can be scheduled quickly and die,
>> freeing needed memory.
>>
>> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 84bbba2..2b0204f 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>> */
>> static void __oom_kill_task(struct task_struct *p, int verbose)
>> {
>> + struct sched_param param;
>> +
>> if (is_global_init(p)) {
>> WARN_ON(1);
>> printk(KERN_WARNING "tried to kill init!\n");
>> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>> * exit() and clear out its resources quickly...
>> */
>> p->time_slice = HZ;
>> + param.sched_priority = MAX_RT_PRIO-10;
>> + sched_setscheduler(p, SCHED_FIFO, ¶m);
>> set_tsk_thread_flag(p, TIF_MEMDIE);
>>
>
> BTW, how about the other threads which share mm_struct ?
Could you elaborate your intention? :)
>
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 5:01 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 5:01 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
Hi, Kame.
On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 28 May 2010 13:48:26 -0300
> "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
>>
>> oom-killer: give the dying task rt priority (v3)
>>
>> Give the dying task RT priority so that it can be scheduled quickly and die,
>> freeing needed memory.
>>
>> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 84bbba2..2b0204f 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>> */
>> static void __oom_kill_task(struct task_struct *p, int verbose)
>> {
>> + struct sched_param param;
>> +
>> if (is_global_init(p)) {
>> WARN_ON(1);
>> printk(KERN_WARNING "tried to kill init!\n");
>> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>> * exit() and clear out its resources quickly...
>> */
>> p->time_slice = HZ;
>> + param.sched_priority = MAX_RT_PRIO-10;
>> + sched_setscheduler(p, SCHED_FIFO, ¶m);
>> set_tsk_thread_flag(p, TIF_MEMDIE);
>>
>
> BTW, how about the other threads which share mm_struct ?
Could you elaborate your intention? :)
>
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 5:01 ` Minchan Kim
@ 2010-05-31 5:04 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 5:04 UTC (permalink / raw)
To: Minchan Kim
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, 31 May 2010 14:01:03 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> Hi, Kame.
>
> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Fri, 28 May 2010 13:48:26 -0300
> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> >>
> >> oom-killer: give the dying task rt priority (v3)
> >>
> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> freeing needed memory.
> >>
> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
> >>
> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> index 84bbba2..2b0204f 100644
> >> --- a/mm/oom_kill.c
> >> +++ b/mm/oom_kill.c
> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> >> */
> >> static void __oom_kill_task(struct task_struct *p, int verbose)
> >> {
> >> + struct sched_param param;
> >> +
> >> if (is_global_init(p)) {
> >> WARN_ON(1);
> >> printk(KERN_WARNING "tried to kill init!\n");
> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> >> * exit() and clear out its resources quickly...
> >> */
> >> p->time_slice = HZ;
> >> + param.sched_priority = MAX_RT_PRIO-10;
> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
> >> set_tsk_thread_flag(p, TIF_MEMDIE);
> >>
> >
> > BTW, how about the other threads which share mm_struct ?
>
> Could you elaborate your intention? :)
>
IIUC, the purpose of rising priority is to accerate dying thread to exit()
for freeing memory AFAP. But to free memory, exit, all threads which share
mm_struct should exit, too. I'm sorry if I miss something.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 5:04 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 5:04 UTC (permalink / raw)
To: Minchan Kim
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, 31 May 2010 14:01:03 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> Hi, Kame.
>
> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Fri, 28 May 2010 13:48:26 -0300
> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> >>
> >> oom-killer: give the dying task rt priority (v3)
> >>
> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> freeing needed memory.
> >>
> >> Signed-off-by: Luis Claudio R. GonA?alves <lgoncalv@redhat.com>
> >>
> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> index 84bbba2..2b0204f 100644
> >> --- a/mm/oom_kill.c
> >> +++ b/mm/oom_kill.c
> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> >> A */
> >> A static void __oom_kill_task(struct task_struct *p, int verbose)
> >> A {
> >> + A A struct sched_param param;
> >> +
> >> A A A if (is_global_init(p)) {
> >> A A A A A A A WARN_ON(1);
> >> A A A A A A A printk(KERN_WARNING "tried to kill init!\n");
> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> >> A A A A * exit() and clear out its resources quickly...
> >> A A A A */
> >> A A A p->time_slice = HZ;
> >> + A A param.sched_priority = MAX_RT_PRIO-10;
> >> + A A sched_setscheduler(p, SCHED_FIFO, ¶m);
> >> A A A set_tsk_thread_flag(p, TIF_MEMDIE);
> >>
> >
> > BTW, how about the other threads which share mm_struct ?
>
> Could you elaborate your intention? :)
>
IIUC, the purpose of rising priority is to accerate dying thread to exit()
for freeing memory AFAP. But to free memory, exit, all threads which share
mm_struct should exit, too. I'm sorry if I miss something.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 5:04 ` KAMEZAWA Hiroyuki
@ 2010-05-31 5:46 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 5:46 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 31 May 2010 14:01:03 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi, Kame.
>>
>> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Fri, 28 May 2010 13:48:26 -0300
>> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
>> >>
>> >> oom-killer: give the dying task rt priority (v3)
>> >>
>> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> freeing needed memory.
>> >>
>> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >>
>> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> >> index 84bbba2..2b0204f 100644
>> >> --- a/mm/oom_kill.c
>> >> +++ b/mm/oom_kill.c
>> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>> >> */
>> >> static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> {
>> >> + struct sched_param param;
>> >> +
>> >> if (is_global_init(p)) {
>> >> WARN_ON(1);
>> >> printk(KERN_WARNING "tried to kill init!\n");
>> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> * exit() and clear out its resources quickly...
>> >> */
>> >> p->time_slice = HZ;
>> >> + param.sched_priority = MAX_RT_PRIO-10;
>> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
>> >> set_tsk_thread_flag(p, TIF_MEMDIE);
>> >>
>> >
>> > BTW, how about the other threads which share mm_struct ?
>>
>> Could you elaborate your intention? :)
>>
>
> IIUC, the purpose of rising priority is to accerate dying thread to exit()
> for freeing memory AFAP. But to free memory, exit, all threads which share
> mm_struct should exit, too. I'm sorry if I miss something.
How do we kill only some thread and what's the benefit of it?
I think when if some thread receives KILL signal, the process include
the thread will be killed.
I think
> Thanks,
> -Kame
>
>
>
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 5:46 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 5:46 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 31 May 2010 14:01:03 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi, Kame.
>>
>> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Fri, 28 May 2010 13:48:26 -0300
>> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
>> >>
>> >> oom-killer: give the dying task rt priority (v3)
>> >>
>> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> freeing needed memory.
>> >>
>> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >>
>> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> >> index 84bbba2..2b0204f 100644
>> >> --- a/mm/oom_kill.c
>> >> +++ b/mm/oom_kill.c
>> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>> >> */
>> >> static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> {
>> >> + struct sched_param param;
>> >> +
>> >> if (is_global_init(p)) {
>> >> WARN_ON(1);
>> >> printk(KERN_WARNING "tried to kill init!\n");
>> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> * exit() and clear out its resources quickly...
>> >> */
>> >> p->time_slice = HZ;
>> >> + param.sched_priority = MAX_RT_PRIO-10;
>> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
>> >> set_tsk_thread_flag(p, TIF_MEMDIE);
>> >>
>> >
>> > BTW, how about the other threads which share mm_struct ?
>>
>> Could you elaborate your intention? :)
>>
>
> IIUC, the purpose of rising priority is to accerate dying thread to exit()
> for freeing memory AFAP. But to free memory, exit, all threads which share
> mm_struct should exit, too. I'm sorry if I miss something.
How do we kill only some thread and what's the benefit of it?
I think when if some thread receives KILL signal, the process include
the thread will be killed.
I think
> Thanks,
> -Kame
>
>
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 5:46 ` Minchan Kim
@ 2010-05-31 5:54 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 5:54 UTC (permalink / raw)
To: Minchan Kim
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, 31 May 2010 14:46:05 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Mon, 31 May 2010 14:01:03 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> Hi, Kame.
> >>
> >> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> > On Fri, 28 May 2010 13:48:26 -0300
> >> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> >> >>
> >> >> oom-killer: give the dying task rt priority (v3)
> >> >>
> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> >> freeing needed memory.
> >> >>
> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
> >> >>
> >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> >> index 84bbba2..2b0204f 100644
> >> >> --- a/mm/oom_kill.c
> >> >> +++ b/mm/oom_kill.c
> >> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> >> >> */
> >> >> static void __oom_kill_task(struct task_struct *p, int verbose)
> >> >> {
> >> >> + struct sched_param param;
> >> >> +
> >> >> if (is_global_init(p)) {
> >> >> WARN_ON(1);
> >> >> printk(KERN_WARNING "tried to kill init!\n");
> >> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> >> >> * exit() and clear out its resources quickly...
> >> >> */
> >> >> p->time_slice = HZ;
> >> >> + param.sched_priority = MAX_RT_PRIO-10;
> >> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
> >> >> set_tsk_thread_flag(p, TIF_MEMDIE);
> >> >>
> >> >
> >> > BTW, how about the other threads which share mm_struct ?
> >>
> >> Could you elaborate your intention? :)
> >>
> >
> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
> > for freeing memory AFAP. But to free memory, exit, all threads which share
> > mm_struct should exit, too. I'm sorry if I miss something.
>
> How do we kill only some thread and what's the benefit of it?
> I think when if some thread receives KILL signal, the process include
> the thread will be killed.
>
yes, so, if you want a _process_ die quickly, you have to acceralte the whole
threads on a process. Acceralating a thread in a process is not big help.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 5:54 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 5:54 UTC (permalink / raw)
To: Minchan Kim
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, 31 May 2010 14:46:05 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Mon, 31 May 2010 14:01:03 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> Hi, Kame.
> >>
> >> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> > On Fri, 28 May 2010 13:48:26 -0300
> >> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> >> >>
> >> >> oom-killer: give the dying task rt priority (v3)
> >> >>
> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> >> freeing needed memory.
> >> >>
> >> >> Signed-off-by: Luis Claudio R. GonA?alves <lgoncalv@redhat.com>
> >> >>
> >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> >> index 84bbba2..2b0204f 100644
> >> >> --- a/mm/oom_kill.c
> >> >> +++ b/mm/oom_kill.c
> >> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> >> >> A */
> >> >> A static void __oom_kill_task(struct task_struct *p, int verbose)
> >> >> A {
> >> >> + A A struct sched_param param;
> >> >> +
> >> >> A A A if (is_global_init(p)) {
> >> >> A A A A A A A WARN_ON(1);
> >> >> A A A A A A A printk(KERN_WARNING "tried to kill init!\n");
> >> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> >> >> A A A A * exit() and clear out its resources quickly...
> >> >> A A A A */
> >> >> A A A p->time_slice = HZ;
> >> >> + A A param.sched_priority = MAX_RT_PRIO-10;
> >> >> + A A sched_setscheduler(p, SCHED_FIFO, ¶m);
> >> >> A A A set_tsk_thread_flag(p, TIF_MEMDIE);
> >> >>
> >> >
> >> > BTW, how about the other threads which share mm_struct ?
> >>
> >> Could you elaborate your intention? :)
> >>
> >
> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
> > for freeing memory AFAP. But to free memory, exit, all threads which share
> > mm_struct should exit, too. I'm sorry if I miss something.
>
> How do we kill only some thread and what's the benefit of it?
> I think when if some thread receives KILL signal, the process include
> the thread will be killed.
>
yes, so, if you want a _process_ die quickly, you have to acceralte the whole
threads on a process. Acceralating a thread in a process is not big help.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 5:54 ` KAMEZAWA Hiroyuki
@ 2010-05-31 6:09 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 6:09 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 31 May 2010 14:46:05 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Mon, 31 May 2010 14:01:03 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> Hi, Kame.
>> >>
>> >> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
>> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >> > On Fri, 28 May 2010 13:48:26 -0300
>> >> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
>> >> >>
>> >> >> oom-killer: give the dying task rt priority (v3)
>> >> >>
>> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> >> freeing needed memory.
>> >> >>
>> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >> >>
>> >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> >> >> index 84bbba2..2b0204f 100644
>> >> >> --- a/mm/oom_kill.c
>> >> >> +++ b/mm/oom_kill.c
>> >> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>> >> >> */
>> >> >> static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> >> {
>> >> >> + struct sched_param param;
>> >> >> +
>> >> >> if (is_global_init(p)) {
>> >> >> WARN_ON(1);
>> >> >> printk(KERN_WARNING "tried to kill init!\n");
>> >> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> >> * exit() and clear out its resources quickly...
>> >> >> */
>> >> >> p->time_slice = HZ;
>> >> >> + param.sched_priority = MAX_RT_PRIO-10;
>> >> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
>> >> >> set_tsk_thread_flag(p, TIF_MEMDIE);
>> >> >>
>> >> >
>> >> > BTW, how about the other threads which share mm_struct ?
>> >>
>> >> Could you elaborate your intention? :)
>> >>
>> >
>> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
>> > for freeing memory AFAP. But to free memory, exit, all threads which share
>> > mm_struct should exit, too. I'm sorry if I miss something.
>>
>> How do we kill only some thread and what's the benefit of it?
>> I think when if some thread receives KILL signal, the process include
>> the thread will be killed.
>>
> yes, so, if you want a _process_ die quickly, you have to acceralte the whole
> threads on a process. Acceralating a thread in a process is not big help.
Yes.
I see the code.
oom_kill_process is called by
1. mem_cgroup_out_of_memory
2. __out_of_memory
3. out_of_memory
(1,2) calls select_bad_process which select victim task in processes
by do_each_process.
But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
In only the case, couldn't we pass task of process, not one of thread?
>
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 6:09 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 6:09 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 31 May 2010 14:46:05 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Mon, 31 May 2010 14:01:03 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> Hi, Kame.
>> >>
>> >> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
>> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >> > On Fri, 28 May 2010 13:48:26 -0300
>> >> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
>> >> >>
>> >> >> oom-killer: give the dying task rt priority (v3)
>> >> >>
>> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> >> freeing needed memory.
>> >> >>
>> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >> >>
>> >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> >> >> index 84bbba2..2b0204f 100644
>> >> >> --- a/mm/oom_kill.c
>> >> >> +++ b/mm/oom_kill.c
>> >> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>> >> >> */
>> >> >> static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> >> {
>> >> >> + struct sched_param param;
>> >> >> +
>> >> >> if (is_global_init(p)) {
>> >> >> WARN_ON(1);
>> >> >> printk(KERN_WARNING "tried to kill init!\n");
>> >> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> >> * exit() and clear out its resources quickly...
>> >> >> */
>> >> >> p->time_slice = HZ;
>> >> >> + param.sched_priority = MAX_RT_PRIO-10;
>> >> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
>> >> >> set_tsk_thread_flag(p, TIF_MEMDIE);
>> >> >>
>> >> >
>> >> > BTW, how about the other threads which share mm_struct ?
>> >>
>> >> Could you elaborate your intention? :)
>> >>
>> >
>> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
>> > for freeing memory AFAP. But to free memory, exit, all threads which share
>> > mm_struct should exit, too. I'm sorry if I miss something.
>>
>> How do we kill only some thread and what's the benefit of it?
>> I think when if some thread receives KILL signal, the process include
>> the thread will be killed.
>>
> yes, so, if you want a _process_ die quickly, you have to acceralte the whole
> threads on a process. Acceralating a thread in a process is not big help.
Yes.
I see the code.
oom_kill_process is called by
1. mem_cgroup_out_of_memory
2. __out_of_memory
3. out_of_memory
(1,2) calls select_bad_process which select victim task in processes
by do_each_process.
But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
In only the case, couldn't we pass task of process, not one of thread?
>
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 6:09 ` Minchan Kim
@ 2010-05-31 6:51 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 6:51 UTC (permalink / raw)
To: Minchan Kim
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, 31 May 2010 15:09:41 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Mon, 31 May 2010 14:46:05 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> > On Mon, 31 May 2010 14:01:03 +0900
> >> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >> >
> >> >> Hi, Kame.
> >> >>
> >> >> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
> >> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> >> > On Fri, 28 May 2010 13:48:26 -0300
> >> >> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> >> >> >>
> >> >> >> oom-killer: give the dying task rt priority (v3)
> >> >> >>
> >> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> >> >> freeing needed memory.
> >> >> >>
> >> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
> >> >> >>
> >> >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> >> >> index 84bbba2..2b0204f 100644
> >> >> >> --- a/mm/oom_kill.c
> >> >> >> +++ b/mm/oom_kill.c
> >> >> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> >> >> >> */
> >> >> >> static void __oom_kill_task(struct task_struct *p, int verbose)
> >> >> >> {
> >> >> >> + struct sched_param param;
> >> >> >> +
> >> >> >> if (is_global_init(p)) {
> >> >> >> WARN_ON(1);
> >> >> >> printk(KERN_WARNING "tried to kill init!\n");
> >> >> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> >> >> >> * exit() and clear out its resources quickly...
> >> >> >> */
> >> >> >> p->time_slice = HZ;
> >> >> >> + param.sched_priority = MAX_RT_PRIO-10;
> >> >> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
> >> >> >> set_tsk_thread_flag(p, TIF_MEMDIE);
> >> >> >>
> >> >> >
> >> >> > BTW, how about the other threads which share mm_struct ?
> >> >>
> >> >> Could you elaborate your intention? :)
> >> >>
> >> >
> >> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
> >> > for freeing memory AFAP. But to free memory, exit, all threads which share
> >> > mm_struct should exit, too. I'm sorry if I miss something.
> >>
> >> How do we kill only some thread and what's the benefit of it?
> >> I think when if some thread receives KILL signal, the process include
> >> the thread will be killed.
> >>
> > yes, so, if you want a _process_ die quickly, you have to acceralte the whole
> > threads on a process. Acceralating a thread in a process is not big help.
>
> Yes.
>
> I see the code.
> oom_kill_process is called by
>
> 1. mem_cgroup_out_of_memory
> 2. __out_of_memory
> 3. out_of_memory
>
>
> (1,2) calls select_bad_process which select victim task in processes
> by do_each_process.
> But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
> In only the case, couldn't we pass task of process, not one of thread?
>
Hmm, my point is that priority-acceralation is against a thread, not against a process.
So, most of threads in memory-eater will not gain high priority even with this patch
and works slowly.
I have no objections to this patch. I just want to confirm the purpose. If this patch
is for accelating exiting process by SIGKILL, it seems not enough.
If an explanation as "acceralating all thread's priority in a process seems overkill"
is given in changelog or comment, it's ok to me.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 6:51 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 6:51 UTC (permalink / raw)
To: Minchan Kim
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, 31 May 2010 15:09:41 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Mon, 31 May 2010 14:46:05 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> > On Mon, 31 May 2010 14:01:03 +0900
> >> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >> >
> >> >> Hi, Kame.
> >> >>
> >> >> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
> >> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> >> > On Fri, 28 May 2010 13:48:26 -0300
> >> >> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> >> >> >>
> >> >> >> oom-killer: give the dying task rt priority (v3)
> >> >> >>
> >> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> >> >> freeing needed memory.
> >> >> >>
> >> >> >> Signed-off-by: Luis Claudio R. GonA?alves <lgoncalv@redhat.com>
> >> >> >>
> >> >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> >> >> index 84bbba2..2b0204f 100644
> >> >> >> --- a/mm/oom_kill.c
> >> >> >> +++ b/mm/oom_kill.c
> >> >> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> >> >> >> A */
> >> >> >> A static void __oom_kill_task(struct task_struct *p, int verbose)
> >> >> >> A {
> >> >> >> + A A struct sched_param param;
> >> >> >> +
> >> >> >> A A A if (is_global_init(p)) {
> >> >> >> A A A A A A A WARN_ON(1);
> >> >> >> A A A A A A A printk(KERN_WARNING "tried to kill init!\n");
> >> >> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> >> >> >> A A A A * exit() and clear out its resources quickly...
> >> >> >> A A A A */
> >> >> >> A A A p->time_slice = HZ;
> >> >> >> + A A param.sched_priority = MAX_RT_PRIO-10;
> >> >> >> + A A sched_setscheduler(p, SCHED_FIFO, ¶m);
> >> >> >> A A A set_tsk_thread_flag(p, TIF_MEMDIE);
> >> >> >>
> >> >> >
> >> >> > BTW, how about the other threads which share mm_struct ?
> >> >>
> >> >> Could you elaborate your intention? :)
> >> >>
> >> >
> >> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
> >> > for freeing memory AFAP. But to free memory, exit, all threads which share
> >> > mm_struct should exit, too. I'm sorry if I miss something.
> >>
> >> How do we kill only some thread and what's the benefit of it?
> >> I think when if some thread receives A KILL signal, the process include
> >> the thread will be killed.
> >>
> > yes, so, if you want a _process_ die quickly, you have to acceralte the whole
> > threads on a process. Acceralating a thread in a process is not big help.
>
> Yes.
>
> I see the code.
> oom_kill_process is called by
>
> 1. mem_cgroup_out_of_memory
> 2. __out_of_memory
> 3. out_of_memory
>
>
> (1,2) calls select_bad_process which select victim task in processes
> by do_each_process.
> But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
> In only the case, couldn't we pass task of process, not one of thread?
>
Hmm, my point is that priority-acceralation is against a thread, not against a process.
So, most of threads in memory-eater will not gain high priority even with this patch
and works slowly.
I have no objections to this patch. I just want to confirm the purpose. If this patch
is for accelating exiting process by SIGKILL, it seems not enough.
If an explanation as "acceralating all thread's priority in a process seems overkill"
is given in changelog or comment, it's ok to me.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 6:51 ` KAMEZAWA Hiroyuki
@ 2010-05-31 10:33 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 10:33 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, May 31, 2010 at 3:51 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 31 May 2010 15:09:41 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Mon, 31 May 2010 14:46:05 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
>> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >> > On Mon, 31 May 2010 14:01:03 +0900
>> >> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >> >
>> >> >> Hi, Kame.
>> >> >>
>> >> >> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
>> >> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >> >> > On Fri, 28 May 2010 13:48:26 -0300
>> >> >> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
>> >> >> >>
>> >> >> >> oom-killer: give the dying task rt priority (v3)
>> >> >> >>
>> >> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> >> >> freeing needed memory.
>> >> >> >>
>> >> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >> >> >>
>> >> >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> >> >> >> index 84bbba2..2b0204f 100644
>> >> >> >> --- a/mm/oom_kill.c
>> >> >> >> +++ b/mm/oom_kill.c
>> >> >> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>> >> >> >> */
>> >> >> >> static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> >> >> {
>> >> >> >> + struct sched_param param;
>> >> >> >> +
>> >> >> >> if (is_global_init(p)) {
>> >> >> >> WARN_ON(1);
>> >> >> >> printk(KERN_WARNING "tried to kill init!\n");
>> >> >> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> >> >> * exit() and clear out its resources quickly...
>> >> >> >> */
>> >> >> >> p->time_slice = HZ;
>> >> >> >> + param.sched_priority = MAX_RT_PRIO-10;
>> >> >> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
>> >> >> >> set_tsk_thread_flag(p, TIF_MEMDIE);
>> >> >> >>
>> >> >> >
>> >> >> > BTW, how about the other threads which share mm_struct ?
>> >> >>
>> >> >> Could you elaborate your intention? :)
>> >> >>
>> >> >
>> >> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
>> >> > for freeing memory AFAP. But to free memory, exit, all threads which share
>> >> > mm_struct should exit, too. I'm sorry if I miss something.
>> >>
>> >> How do we kill only some thread and what's the benefit of it?
>> >> I think when if some thread receives KILL signal, the process include
>> >> the thread will be killed.
>> >>
>> > yes, so, if you want a _process_ die quickly, you have to acceralte the whole
>> > threads on a process. Acceralating a thread in a process is not big help.
>>
>> Yes.
>>
>> I see the code.
>> oom_kill_process is called by
>>
>> 1. mem_cgroup_out_of_memory
>> 2. __out_of_memory
>> 3. out_of_memory
>>
>>
>> (1,2) calls select_bad_process which select victim task in processes
>> by do_each_process.
>> But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
>> In only the case, couldn't we pass task of process, not one of thread?
>>
>
> Hmm, my point is that priority-acceralation is against a thread, not against a process.
> So, most of threads in memory-eater will not gain high priority even with this patch
> and works slowly.
> I have no objections to this patch. I just want to confirm the purpose. If this patch
> is for accelating exiting process by SIGKILL, it seems not enough.
> If an explanation as "acceralating all thread's priority in a process seems overkill"
> is given in changelog or comment, it's ok to me.
Okay. I got your point.
Kame's concern is proper.
Couldn't we raise priorities of whole threads of the task killed?
>
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 10:33 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-05-31 10:33 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Luis Claudio R. Goncalves, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, Peter Zijlstra,
David Rientjes, Mel Gorman, williams
On Mon, May 31, 2010 at 3:51 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 31 May 2010 15:09:41 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Mon, 31 May 2010 14:46:05 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> On Mon, May 31, 2010 at 2:04 PM, KAMEZAWA Hiroyuki
>> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >> > On Mon, 31 May 2010 14:01:03 +0900
>> >> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >> >
>> >> >> Hi, Kame.
>> >> >>
>> >> >> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
>> >> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >> >> > On Fri, 28 May 2010 13:48:26 -0300
>> >> >> > "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
>> >> >> >>
>> >> >> >> oom-killer: give the dying task rt priority (v3)
>> >> >> >>
>> >> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> >> >> freeing needed memory.
>> >> >> >>
>> >> >> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com>
>> >> >> >>
>> >> >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> >> >> >> index 84bbba2..2b0204f 100644
>> >> >> >> --- a/mm/oom_kill.c
>> >> >> >> +++ b/mm/oom_kill.c
>> >> >> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>> >> >> >> */
>> >> >> >> static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> >> >> {
>> >> >> >> + struct sched_param param;
>> >> >> >> +
>> >> >> >> if (is_global_init(p)) {
>> >> >> >> WARN_ON(1);
>> >> >> >> printk(KERN_WARNING "tried to kill init!\n");
>> >> >> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>> >> >> >> * exit() and clear out its resources quickly...
>> >> >> >> */
>> >> >> >> p->time_slice = HZ;
>> >> >> >> + param.sched_priority = MAX_RT_PRIO-10;
>> >> >> >> + sched_setscheduler(p, SCHED_FIFO, ¶m);
>> >> >> >> set_tsk_thread_flag(p, TIF_MEMDIE);
>> >> >> >>
>> >> >> >
>> >> >> > BTW, how about the other threads which share mm_struct ?
>> >> >>
>> >> >> Could you elaborate your intention? :)
>> >> >>
>> >> >
>> >> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
>> >> > for freeing memory AFAP. But to free memory, exit, all threads which share
>> >> > mm_struct should exit, too. I'm sorry if I miss something.
>> >>
>> >> How do we kill only some thread and what's the benefit of it?
>> >> I think when if some thread receives KILL signal, the process include
>> >> the thread will be killed.
>> >>
>> > yes, so, if you want a _process_ die quickly, you have to acceralte the whole
>> > threads on a process. Acceralating a thread in a process is not big help.
>>
>> Yes.
>>
>> I see the code.
>> oom_kill_process is called by
>>
>> 1. mem_cgroup_out_of_memory
>> 2. __out_of_memory
>> 3. out_of_memory
>>
>>
>> (1,2) calls select_bad_process which select victim task in processes
>> by do_each_process.
>> But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
>> In only the case, couldn't we pass task of process, not one of thread?
>>
>
> Hmm, my point is that priority-acceralation is against a thread, not against a process.
> So, most of threads in memory-eater will not gain high priority even with this patch
> and works slowly.
> I have no objections to this patch. I just want to confirm the purpose. If this patch
> is for accelating exiting process by SIGKILL, it seems not enough.
> If an explanation as "acceralating all thread's priority in a process seems overkill"
> is given in changelog or comment, it's ok to me.
Okay. I got your point.
Kame's concern is proper.
Couldn't we raise priorities of whole threads of the task killed?
>
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 6:51 ` KAMEZAWA Hiroyuki
@ 2010-05-31 13:52 ` Luis Claudio R. Goncalves
-1 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-05-31 13:52 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Minchan Kim, KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
On Mon, May 31, 2010 at 03:51:02PM +0900, KAMEZAWA Hiroyuki wrote:
| On Mon, 31 May 2010 15:09:41 +0900
| Minchan Kim <minchan.kim@gmail.com> wrote:
| > On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
| > <kamezawa.hiroyu@jp.fujitsu.com> wrote:
...
| > >> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
| > >> > for freeing memory AFAP. But to free memory, exit, all threads which share
| > >> > mm_struct should exit, too. I'm sorry if I miss something.
| > >>
| > >> How do we kill only some thread and what's the benefit of it?
| > >> I think when if some thread receives KILL signal, the process include
| > >> the thread will be killed.
| > >>
| > > yes, so, if you want a _process_ die quickly, you have to acceralte the whole
| > > threads on a process. Acceralating a thread in a process is not big help.
| >
| > Yes.
| >
| > I see the code.
| > oom_kill_process is called by
| >
| > 1. mem_cgroup_out_of_memory
| > 2. __out_of_memory
| > 3. out_of_memory
| >
| >
| > (1,2) calls select_bad_process which select victim task in processes
| > by do_each_process.
| > But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
| > In only the case, couldn't we pass task of process, not one of thread?
| >
|
| Hmm, my point is that priority-acceralation is against a thread, not against a process.
| So, most of threads in memory-eater will not gain high priority even with this patch
| and works slowly.
This is a good point...
| I have no objections to this patch. I just want to confirm the purpose. If this patch
| is for accelating exiting process by SIGKILL, it seems not enough.
I understand (from the comments in the code) the badness calculation gives more
points to the siblings in a thread that have their own mm. I wonder if what you
are describing is not a corner case.
Again, your idea sounds like an interesting refinement to the patch. I am
just not sure this change should implemented now or in a second round of
changes.
| If an explanation as "acceralating all thread's priority in a process seems overkill"
| is given in changelog or comment, it's ok to me.
If my understanding of badness() is right, I wouldn't be ashamed of saying
that it seems to be _a bit_ overkill. But I may be wrong in my
interpretation.
While re-reading the code I noticed that in select_bad_process() we can
eventually bump on an already dying task, case in which we just wait for
the task to die and avoid killing other tasks. Maybe we could boost the
priority of the dying task here too.
Luis
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 13:52 ` Luis Claudio R. Goncalves
0 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-05-31 13:52 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Minchan Kim, KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
On Mon, May 31, 2010 at 03:51:02PM +0900, KAMEZAWA Hiroyuki wrote:
| On Mon, 31 May 2010 15:09:41 +0900
| Minchan Kim <minchan.kim@gmail.com> wrote:
| > On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
| > <kamezawa.hiroyu@jp.fujitsu.com> wrote:
...
| > >> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
| > >> > for freeing memory AFAP. But to free memory, exit, all threads which share
| > >> > mm_struct should exit, too. I'm sorry if I miss something.
| > >>
| > >> How do we kill only some thread and what's the benefit of it?
| > >> I think when if some thread receives KILL signal, the process include
| > >> the thread will be killed.
| > >>
| > > yes, so, if you want a _process_ die quickly, you have to acceralte the whole
| > > threads on a process. Acceralating a thread in a process is not big help.
| >
| > Yes.
| >
| > I see the code.
| > oom_kill_process is called by
| >
| > 1. mem_cgroup_out_of_memory
| > 2. __out_of_memory
| > 3. out_of_memory
| >
| >
| > (1,2) calls select_bad_process which select victim task in processes
| > by do_each_process.
| > But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
| > In only the case, couldn't we pass task of process, not one of thread?
| >
|
| Hmm, my point is that priority-acceralation is against a thread, not against a process.
| So, most of threads in memory-eater will not gain high priority even with this patch
| and works slowly.
This is a good point...
| I have no objections to this patch. I just want to confirm the purpose. If this patch
| is for accelating exiting process by SIGKILL, it seems not enough.
I understand (from the comments in the code) the badness calculation gives more
points to the siblings in a thread that have their own mm. I wonder if what you
are describing is not a corner case.
Again, your idea sounds like an interesting refinement to the patch. I am
just not sure this change should implemented now or in a second round of
changes.
| If an explanation as "acceralating all thread's priority in a process seems overkill"
| is given in changelog or comment, it's ok to me.
If my understanding of badness() is right, I wouldn't be ashamed of saying
that it seems to be _a bit_ overkill. But I may be wrong in my
interpretation.
While re-reading the code I noticed that in select_bad_process() we can
eventually bump on an already dying task, case in which we just wait for
the task to die and avoid killing other tasks. Maybe we could boost the
priority of the dying task here too.
Luis
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 13:52 ` Luis Claudio R. Goncalves
@ 2010-05-31 23:50 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 23:50 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Minchan Kim, KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
On Mon, 31 May 2010 10:52:27 -0300
"Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> | If an explanation as "acceralating all thread's priority in a process seems overkill"
> | is given in changelog or comment, it's ok to me.
>
> If my understanding of badness() is right, I wouldn't be ashamed of saying
> that it seems to be _a bit_ overkill. But I may be wrong in my
> interpretation.
>
> While re-reading the code I noticed that in select_bad_process() we can
> eventually bump on an already dying task, case in which we just wait for
> the task to die and avoid killing other tasks. Maybe we could boost the
> priority of the dying task here too.
>
yes, nice catch.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-05-31 23:50 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 110+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-31 23:50 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Minchan Kim, KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
On Mon, 31 May 2010 10:52:27 -0300
"Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
> | If an explanation as "acceralating all thread's priority in a process seems overkill"
> | is given in changelog or comment, it's ok to me.
>
> If my understanding of badness() is right, I wouldn't be ashamed of saying
> that it seems to be _a bit_ overkill. But I may be wrong in my
> interpretation.
>
> While re-reading the code I noticed that in select_bad_process() we can
> eventually bump on an already dying task, case in which we just wait for
> the task to die and avoid killing other tasks. Maybe we could boost the
> priority of the dying task here too.
>
yes, nice catch.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 23:50 ` KAMEZAWA Hiroyuki
@ 2010-06-01 17:35 ` Luis Claudio R. Goncalves
-1 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-06-01 17:35 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Minchan Kim, KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
On Tue, Jun 01, 2010 at 08:50:06AM +0900, KAMEZAWA Hiroyuki wrote:
| On Mon, 31 May 2010 10:52:27 -0300
| "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
|
| > | If an explanation as "acceralating all thread's priority in a process seems overkill"
| > | is given in changelog or comment, it's ok to me.
| >
| > If my understanding of badness() is right, I wouldn't be ashamed of saying
| > that it seems to be _a bit_ overkill. But I may be wrong in my
| > interpretation.
| >
| > While re-reading the code I noticed that in select_bad_process() we can
| > eventually bump on an already dying task, case in which we just wait for
| > the task to die and avoid killing other tasks. Maybe we could boost the
| > priority of the dying task here too.
| >
| yes, nice catch.
Here is a more complete version of the patch, boosting priority on the
three exit points of the OOM-killer. I also avoid touching the priority if
the task is already an RT task. The patch:
oom-kill: give the dying task a higher priority (v5)
In a system under heavy load it was observed that even after the
oom-killer selects a task to die, the task may take a long time to die.
Right before sending a SIGKILL to the task selected by the oom-killer
this task has it's priority increased so that it can exit() exit soon,
freeing memory. That is accomplished by:
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
It sounds plausible giving the dying task an even higher priority to be
sure it will be scheduled sooner and free the desired memory. It was
suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
this task won't interfere with any running RT task.
If the dying task is already an RT task, leave it untouched.
Another good suggestion, implemented here, was to avoid boosting the
dying task priority in case of mem_cgroup OOM.
Signed-off-by: Luis Claudio R. Gonçalves <lclaudio@uudg.org>
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 709aedf..67e18ca 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -52,6 +52,22 @@ static int has_intersects_mems_allowed(struct task_struct *tsk)
return 0;
}
+/*
+ * If this is a system OOM (not a memcg OOM) and the task selected to be
+ * killed is not already running at high (RT) priorities, speed up the
+ * recovery by boosting the dying task to the lowest FIFO priority.
+ * That helps with the recovery and avoids interfering with RT tasks.
+ */
+static void boost_dying_task_prio(struct task_struct *p,
+ struct mem_cgroup *mem)
+{
+ if ((mem == NULL) && !rt_task(p)) {
+ struct sched_param param;
+ param.sched_priority = 1;
+ sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
+ }
+}
+
/**
* badness - calculate a numeric value for how bad this task has been
* @p: task struct of which task we should calculate
@@ -277,8 +293,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
* blocked waiting for another task which itself is waiting
* for memory. Is there a better alternative?
*/
- if (test_tsk_thread_flag(p, TIF_MEMDIE))
+ if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+ boost_dying_task_prio(p, mem);
return ERR_PTR(-1UL);
+ }
/*
* This is in the process of releasing memory so wait for it
@@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
* Otherwise we could get an easy OOM deadlock.
*/
if (p->flags & PF_EXITING) {
- if (p != current)
+ if (p != current) {
+ boost_dying_task_prio(p, mem);
return ERR_PTR(-1UL);
-
+ }
chosen = p;
*ppoints = ULONG_MAX;
}
@@ -380,7 +399,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
* flag though it's unlikely that we select a process with CAP_SYS_RAW_IO
* set.
*/
-static void __oom_kill_task(struct task_struct *p, int verbose)
+static void __oom_kill_task(struct task_struct *p, struct mem_cgroup *mem,
+ int verbose)
{
if (is_global_init(p)) {
WARN_ON(1);
@@ -413,11 +433,11 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
-
force_sig(SIGKILL, p);
+ boost_dying_task_prio(p, mem);
}
-static int oom_kill_task(struct task_struct *p)
+static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
{
/* WARNING: mm may not be dereferenced since we did not obtain its
* value from get_task_mm(p). This is OK since all we need to do is
@@ -430,7 +450,7 @@ static int oom_kill_task(struct task_struct *p)
if (!p->mm || p->signal->oom_adj == OOM_DISABLE)
return 1;
- __oom_kill_task(p, 1);
+ __oom_kill_task(p, mem, 1);
return 0;
}
@@ -449,7 +469,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
* its children or threads, just set TIF_MEMDIE so it can die quickly
*/
if (p->flags & PF_EXITING) {
- __oom_kill_task(p, 0);
+ __oom_kill_task(p, mem, 0);
return 0;
}
@@ -462,10 +482,10 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
continue;
if (mem && !task_in_mem_cgroup(c, mem))
continue;
- if (!oom_kill_task(c))
+ if (!oom_kill_task(c, mem))
return 0;
}
- return oom_kill_task(p);
+ return oom_kill_task(p, mem);
}
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
^ permalink raw reply related [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-01 17:35 ` Luis Claudio R. Goncalves
0 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-06-01 17:35 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Minchan Kim, KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel,
linux-mm, Thomas Gleixner, Peter Zijlstra, David Rientjes,
Mel Gorman, williams
On Tue, Jun 01, 2010 at 08:50:06AM +0900, KAMEZAWA Hiroyuki wrote:
| On Mon, 31 May 2010 10:52:27 -0300
| "Luis Claudio R. Goncalves" <lclaudio@uudg.org> wrote:
|
| > | If an explanation as "acceralating all thread's priority in a process seems overkill"
| > | is given in changelog or comment, it's ok to me.
| >
| > If my understanding of badness() is right, I wouldn't be ashamed of saying
| > that it seems to be _a bit_ overkill. But I may be wrong in my
| > interpretation.
| >
| > While re-reading the code I noticed that in select_bad_process() we can
| > eventually bump on an already dying task, case in which we just wait for
| > the task to die and avoid killing other tasks. Maybe we could boost the
| > priority of the dying task here too.
| >
| yes, nice catch.
Here is a more complete version of the patch, boosting priority on the
three exit points of the OOM-killer. I also avoid touching the priority if
the task is already an RT task. The patch:
oom-kill: give the dying task a higher priority (v5)
In a system under heavy load it was observed that even after the
oom-killer selects a task to die, the task may take a long time to die.
Right before sending a SIGKILL to the task selected by the oom-killer
this task has it's priority increased so that it can exit() exit soon,
freeing memory. That is accomplished by:
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
It sounds plausible giving the dying task an even higher priority to be
sure it will be scheduled sooner and free the desired memory. It was
suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
this task won't interfere with any running RT task.
If the dying task is already an RT task, leave it untouched.
Another good suggestion, implemented here, was to avoid boosting the
dying task priority in case of mem_cgroup OOM.
Signed-off-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 709aedf..67e18ca 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -52,6 +52,22 @@ static int has_intersects_mems_allowed(struct task_struct *tsk)
return 0;
}
+/*
+ * If this is a system OOM (not a memcg OOM) and the task selected to be
+ * killed is not already running at high (RT) priorities, speed up the
+ * recovery by boosting the dying task to the lowest FIFO priority.
+ * That helps with the recovery and avoids interfering with RT tasks.
+ */
+static void boost_dying_task_prio(struct task_struct *p,
+ struct mem_cgroup *mem)
+{
+ if ((mem == NULL) && !rt_task(p)) {
+ struct sched_param param;
+ param.sched_priority = 1;
+ sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
+ }
+}
+
/**
* badness - calculate a numeric value for how bad this task has been
* @p: task struct of which task we should calculate
@@ -277,8 +293,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
* blocked waiting for another task which itself is waiting
* for memory. Is there a better alternative?
*/
- if (test_tsk_thread_flag(p, TIF_MEMDIE))
+ if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+ boost_dying_task_prio(p, mem);
return ERR_PTR(-1UL);
+ }
/*
* This is in the process of releasing memory so wait for it
@@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
* Otherwise we could get an easy OOM deadlock.
*/
if (p->flags & PF_EXITING) {
- if (p != current)
+ if (p != current) {
+ boost_dying_task_prio(p, mem);
return ERR_PTR(-1UL);
-
+ }
chosen = p;
*ppoints = ULONG_MAX;
}
@@ -380,7 +399,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
* flag though it's unlikely that we select a process with CAP_SYS_RAW_IO
* set.
*/
-static void __oom_kill_task(struct task_struct *p, int verbose)
+static void __oom_kill_task(struct task_struct *p, struct mem_cgroup *mem,
+ int verbose)
{
if (is_global_init(p)) {
WARN_ON(1);
@@ -413,11 +433,11 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
-
force_sig(SIGKILL, p);
+ boost_dying_task_prio(p, mem);
}
-static int oom_kill_task(struct task_struct *p)
+static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
{
/* WARNING: mm may not be dereferenced since we did not obtain its
* value from get_task_mm(p). This is OK since all we need to do is
@@ -430,7 +450,7 @@ static int oom_kill_task(struct task_struct *p)
if (!p->mm || p->signal->oom_adj == OOM_DISABLE)
return 1;
- __oom_kill_task(p, 1);
+ __oom_kill_task(p, mem, 1);
return 0;
}
@@ -449,7 +469,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
* its children or threads, just set TIF_MEMDIE so it can die quickly
*/
if (p->flags & PF_EXITING) {
- __oom_kill_task(p, 0);
+ __oom_kill_task(p, mem, 0);
return 0;
}
@@ -462,10 +482,10 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
continue;
if (mem && !task_in_mem_cgroup(c, mem))
continue;
- if (!oom_kill_task(c))
+ if (!oom_kill_task(c, mem))
return 0;
}
- return oom_kill_task(p);
+ return oom_kill_task(p, mem);
}
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-01 17:35 ` Luis Claudio R. Goncalves
@ 2010-06-01 20:49 ` David Rientjes
-1 siblings, 0 replies; 110+ messages in thread
From: David Rientjes @ 2010-06-01 20:49 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: KAMEZAWA Hiroyuki, Minchan Kim, KOSAKI Motohiro, balbir,
Oleg Nesterov, linux-kernel, linux-mm, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, williams
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3350 bytes --]
On Tue, 1 Jun 2010, Luis Claudio R. Goncalves wrote:
> oom-kill: give the dying task a higher priority (v5)
>
> In a system under heavy load it was observed that even after the
> oom-killer selects a task to die, the task may take a long time to die.
>
> Right before sending a SIGKILL to the task selected by the oom-killer
> this task has it's priority increased so that it can exit() exit soon,
> freeing memory. That is accomplished by:
>
> /*
> * We give our sacrificial lamb high priority and access to
> * all the memory it needs. That way it should be able to
> * exit() and clear out its resources quickly...
> */
> p->rt.time_slice = HZ;
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
> It sounds plausible giving the dying task an even higher priority to be
> sure it will be scheduled sooner and free the desired memory. It was
> suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
> this task won't interfere with any running RT task.
>
> If the dying task is already an RT task, leave it untouched.
>
> Another good suggestion, implemented here, was to avoid boosting the
> dying task priority in case of mem_cgroup OOM.
>
> Signed-off-by: Luis Claudio R. Gonçalves <lclaudio@uudg.org>
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 709aedf..67e18ca 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -52,6 +52,22 @@ static int has_intersects_mems_allowed(struct task_struct *tsk)
> return 0;
> }
>
> +/*
> + * If this is a system OOM (not a memcg OOM) and the task selected to be
> + * killed is not already running at high (RT) priorities, speed up the
> + * recovery by boosting the dying task to the lowest FIFO priority.
> + * That helps with the recovery and avoids interfering with RT tasks.
> + */
> +static void boost_dying_task_prio(struct task_struct *p,
> + struct mem_cgroup *mem)
> +{
> + if ((mem == NULL) && !rt_task(p)) {
> + struct sched_param param;
> + param.sched_priority = 1;
> + sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
> + }
> +}
> +
> /**
> * badness - calculate a numeric value for how bad this task has been
> * @p: task struct of which task we should calculate
> @@ -277,8 +293,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> * blocked waiting for another task which itself is waiting
> * for memory. Is there a better alternative?
> */
> - if (test_tsk_thread_flag(p, TIF_MEMDIE))
> + if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
> + boost_dying_task_prio(p, mem);
> return ERR_PTR(-1UL);
> + }
>
> /*
> * This is in the process of releasing memory so wait for it
That's unnecessary, if p already has TIF_MEMDIE set, then
boost_dying_task_prio(p) has already been called.
> @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> * Otherwise we could get an easy OOM deadlock.
> */
> if (p->flags & PF_EXITING) {
> - if (p != current)
> + if (p != current) {
> + boost_dying_task_prio(p, mem);
> return ERR_PTR(-1UL);
> -
> + }
> chosen = p;
> *ppoints = ULONG_MAX;
> }
This has the potential to actually make it harder to free memory if p is
waiting to acquire a writelock on mm->mmap_sem in the exit path while the
thread holding mm->mmap_sem is trying to run.
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-01 20:49 ` David Rientjes
0 siblings, 0 replies; 110+ messages in thread
From: David Rientjes @ 2010-06-01 20:49 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: KAMEZAWA Hiroyuki, Minchan Kim, KOSAKI Motohiro, balbir,
Oleg Nesterov, linux-kernel, linux-mm, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, williams
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3350 bytes --]
On Tue, 1 Jun 2010, Luis Claudio R. Goncalves wrote:
> oom-kill: give the dying task a higher priority (v5)
>
> In a system under heavy load it was observed that even after the
> oom-killer selects a task to die, the task may take a long time to die.
>
> Right before sending a SIGKILL to the task selected by the oom-killer
> this task has it's priority increased so that it can exit() exit soon,
> freeing memory. That is accomplished by:
>
> /*
> * We give our sacrificial lamb high priority and access to
> * all the memory it needs. That way it should be able to
> * exit() and clear out its resources quickly...
> */
> p->rt.time_slice = HZ;
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
> It sounds plausible giving the dying task an even higher priority to be
> sure it will be scheduled sooner and free the desired memory. It was
> suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
> this task won't interfere with any running RT task.
>
> If the dying task is already an RT task, leave it untouched.
>
> Another good suggestion, implemented here, was to avoid boosting the
> dying task priority in case of mem_cgroup OOM.
>
> Signed-off-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 709aedf..67e18ca 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -52,6 +52,22 @@ static int has_intersects_mems_allowed(struct task_struct *tsk)
> return 0;
> }
>
> +/*
> + * If this is a system OOM (not a memcg OOM) and the task selected to be
> + * killed is not already running at high (RT) priorities, speed up the
> + * recovery by boosting the dying task to the lowest FIFO priority.
> + * That helps with the recovery and avoids interfering with RT tasks.
> + */
> +static void boost_dying_task_prio(struct task_struct *p,
> + struct mem_cgroup *mem)
> +{
> + if ((mem == NULL) && !rt_task(p)) {
> + struct sched_param param;
> + param.sched_priority = 1;
> + sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
> + }
> +}
> +
> /**
> * badness - calculate a numeric value for how bad this task has been
> * @p: task struct of which task we should calculate
> @@ -277,8 +293,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> * blocked waiting for another task which itself is waiting
> * for memory. Is there a better alternative?
> */
> - if (test_tsk_thread_flag(p, TIF_MEMDIE))
> + if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
> + boost_dying_task_prio(p, mem);
> return ERR_PTR(-1UL);
> + }
>
> /*
> * This is in the process of releasing memory so wait for it
That's unnecessary, if p already has TIF_MEMDIE set, then
boost_dying_task_prio(p) has already been called.
> @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> * Otherwise we could get an easy OOM deadlock.
> */
> if (p->flags & PF_EXITING) {
> - if (p != current)
> + if (p != current) {
> + boost_dying_task_prio(p, mem);
> return ERR_PTR(-1UL);
> -
> + }
> chosen = p;
> *ppoints = ULONG_MAX;
> }
This has the potential to actually make it harder to free memory if p is
waiting to acquire a writelock on mm->mmap_sem in the exit path while the
thread holding mm->mmap_sem is trying to run.
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-01 20:49 ` David Rientjes
@ 2010-06-02 13:54 ` KOSAKI Motohiro
-1 siblings, 0 replies; 110+ messages in thread
From: KOSAKI Motohiro @ 2010-06-02 13:54 UTC (permalink / raw)
To: David Rientjes
Cc: kosaki.motohiro, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, Mel Gorman, williams
> > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > * Otherwise we could get an easy OOM deadlock.
> > */
> > if (p->flags & PF_EXITING) {
> > - if (p != current)
> > + if (p != current) {
> > + boost_dying_task_prio(p, mem);
> > return ERR_PTR(-1UL);
> > -
> > + }
> > chosen = p;
> > *ppoints = ULONG_MAX;
> > }
>
> This has the potential to actually make it harder to free memory if p is
> waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> thread holding mm->mmap_sem is trying to run.
if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-02 13:54 ` KOSAKI Motohiro
0 siblings, 0 replies; 110+ messages in thread
From: KOSAKI Motohiro @ 2010-06-02 13:54 UTC (permalink / raw)
To: David Rientjes
Cc: kosaki.motohiro, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, Mel Gorman, williams
> > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > * Otherwise we could get an easy OOM deadlock.
> > */
> > if (p->flags & PF_EXITING) {
> > - if (p != current)
> > + if (p != current) {
> > + boost_dying_task_prio(p, mem);
> > return ERR_PTR(-1UL);
> > -
> > + }
> > chosen = p;
> > *ppoints = ULONG_MAX;
> > }
>
> This has the potential to actually make it harder to free memory if p is
> waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> thread holding mm->mmap_sem is trying to run.
if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-02 13:54 ` KOSAKI Motohiro
@ 2010-06-02 14:20 ` Luis Claudio R. Goncalves
-1 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-06-02 14:20 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: David Rientjes, KAMEZAWA Hiroyuki, Minchan Kim, balbir,
Oleg Nesterov, linux-kernel, linux-mm, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, williams
On Wed, Jun 02, 2010 at 10:54:01PM +0900, KOSAKI Motohiro wrote:
| > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
| > > * Otherwise we could get an easy OOM deadlock.
| > > */
| > > if (p->flags & PF_EXITING) {
| > > - if (p != current)
| > > + if (p != current) {
| > > + boost_dying_task_prio(p, mem);
| > > return ERR_PTR(-1UL);
| > > -
| > > + }
| > > chosen = p;
| > > *ppoints = ULONG_MAX;
| > > }
| >
| > This has the potential to actually make it harder to free memory if p is
| > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
| > thread holding mm->mmap_sem is trying to run.
|
| if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
Ok, that was not a good idea after all :)
But I understand the !rt_task(p) test is necessary to avoid decrementing
the priority of an eventual RT task selected to die. Though it may also be
a corner case in badness().
Luis
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-02 14:20 ` Luis Claudio R. Goncalves
0 siblings, 0 replies; 110+ messages in thread
From: Luis Claudio R. Goncalves @ 2010-06-02 14:20 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: David Rientjes, KAMEZAWA Hiroyuki, Minchan Kim, balbir,
Oleg Nesterov, linux-kernel, linux-mm, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, williams
On Wed, Jun 02, 2010 at 10:54:01PM +0900, KOSAKI Motohiro wrote:
| > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
| > > * Otherwise we could get an easy OOM deadlock.
| > > */
| > > if (p->flags & PF_EXITING) {
| > > - if (p != current)
| > > + if (p != current) {
| > > + boost_dying_task_prio(p, mem);
| > > return ERR_PTR(-1UL);
| > > -
| > > + }
| > > chosen = p;
| > > *ppoints = ULONG_MAX;
| > > }
| >
| > This has the potential to actually make it harder to free memory if p is
| > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
| > thread holding mm->mmap_sem is trying to run.
|
| if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
Ok, that was not a good idea after all :)
But I understand the !rt_task(p) test is necessary to avoid decrementing
the priority of an eventual RT task selected to die. Though it may also be
a corner case in badness().
Luis
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-02 13:54 ` KOSAKI Motohiro
@ 2010-06-02 21:11 ` David Rientjes
-1 siblings, 0 replies; 110+ messages in thread
From: David Rientjes @ 2010-06-02 21:11 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki, Minchan Kim, balbir,
Oleg Nesterov, linux-kernel, linux-mm, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, williams
On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:
> > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > > * Otherwise we could get an easy OOM deadlock.
> > > */
> > > if (p->flags & PF_EXITING) {
> > > - if (p != current)
> > > + if (p != current) {
> > > + boost_dying_task_prio(p, mem);
> > > return ERR_PTR(-1UL);
> > > -
> > > + }
> > > chosen = p;
> > > *ppoints = ULONG_MAX;
> > > }
> >
> > This has the potential to actually make it harder to free memory if p is
> > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> > thread holding mm->mmap_sem is trying to run.
>
> if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
>
And that can reduce the runtime of the thread holding a writelock on
mm->mmap_sem, making the exit actually take longer than without the patch
if its priority is significantly higher, especially on smaller machines.
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-02 21:11 ` David Rientjes
0 siblings, 0 replies; 110+ messages in thread
From: David Rientjes @ 2010-06-02 21:11 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki, Minchan Kim, balbir,
Oleg Nesterov, linux-kernel, linux-mm, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, williams
On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:
> > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > > * Otherwise we could get an easy OOM deadlock.
> > > */
> > > if (p->flags & PF_EXITING) {
> > > - if (p != current)
> > > + if (p != current) {
> > > + boost_dying_task_prio(p, mem);
> > > return ERR_PTR(-1UL);
> > > -
> > > + }
> > > chosen = p;
> > > *ppoints = ULONG_MAX;
> > > }
> >
> > This has the potential to actually make it harder to free memory if p is
> > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> > thread holding mm->mmap_sem is trying to run.
>
> if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
>
And that can reduce the runtime of the thread holding a writelock on
mm->mmap_sem, making the exit actually take longer than without the patch
if its priority is significantly higher, especially on smaller machines.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-02 21:11 ` David Rientjes
@ 2010-06-02 23:36 ` KOSAKI Motohiro
-1 siblings, 0 replies; 110+ messages in thread
From: KOSAKI Motohiro @ 2010-06-02 23:36 UTC (permalink / raw)
To: David Rientjes
Cc: kosaki.motohiro, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, Mel Gorman, williams
> On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:
>
> > > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > > > * Otherwise we could get an easy OOM deadlock.
> > > > */
> > > > if (p->flags & PF_EXITING) {
> > > > - if (p != current)
> > > > + if (p != current) {
> > > > + boost_dying_task_prio(p, mem);
> > > > return ERR_PTR(-1UL);
> > > > -
> > > > + }
> > > > chosen = p;
> > > > *ppoints = ULONG_MAX;
> > > > }
> > >
> > > This has the potential to actually make it harder to free memory if p is
> > > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> > > thread holding mm->mmap_sem is trying to run.
> >
> > if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
> >
>
> And that can reduce the runtime of the thread holding a writelock on
> mm->mmap_sem, making the exit actually take longer than without the patch
> if its priority is significantly higher, especially on smaller machines.
If p need mmap_sem, p is going to sleep to wait mmap_sem. if p doesn't,
quickly exit is good thing. In other word, task fairness is not our goal
when oom occur.
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-02 23:36 ` KOSAKI Motohiro
0 siblings, 0 replies; 110+ messages in thread
From: KOSAKI Motohiro @ 2010-06-02 23:36 UTC (permalink / raw)
To: David Rientjes
Cc: kosaki.motohiro, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Peter Zijlstra, Mel Gorman, williams
> On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:
>
> > > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > > > * Otherwise we could get an easy OOM deadlock.
> > > > */
> > > > if (p->flags & PF_EXITING) {
> > > > - if (p != current)
> > > > + if (p != current) {
> > > > + boost_dying_task_prio(p, mem);
> > > > return ERR_PTR(-1UL);
> > > > -
> > > > + }
> > > > chosen = p;
> > > > *ppoints = ULONG_MAX;
> > > > }
> > >
> > > This has the potential to actually make it harder to free memory if p is
> > > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> > > thread holding mm->mmap_sem is trying to run.
> >
> > if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
> >
>
> And that can reduce the runtime of the thread holding a writelock on
> mm->mmap_sem, making the exit actually take longer than without the patch
> if its priority is significantly higher, especially on smaller machines.
If p need mmap_sem, p is going to sleep to wait mmap_sem. if p doesn't,
quickly exit is good thing. In other word, task fairness is not our goal
when oom occur.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-02 23:36 ` KOSAKI Motohiro
@ 2010-06-03 0:52 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-06-03 0:52 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: David Rientjes, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
balbir, Oleg Nesterov, linux-kernel, linux-mm, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, williams
On Thu, Jun 3, 2010 at 8:36 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:
>>
>> > > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>> > > > * Otherwise we could get an easy OOM deadlock.
>> > > > */
>> > > > if (p->flags & PF_EXITING) {
>> > > > - if (p != current)
>> > > > + if (p != current) {
>> > > > + boost_dying_task_prio(p, mem);
>> > > > return ERR_PTR(-1UL);
>> > > > -
>> > > > + }
>> > > > chosen = p;
>> > > > *ppoints = ULONG_MAX;
>> > > > }
>> > >
>> > > This has the potential to actually make it harder to free memory if p is
>> > > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
>> > > thread holding mm->mmap_sem is trying to run.
>> >
>> > if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
>> >
>>
>> And that can reduce the runtime of the thread holding a writelock on
>> mm->mmap_sem, making the exit actually take longer than without the patch
>> if its priority is significantly higher, especially on smaller machines.
>
> If p need mmap_sem, p is going to sleep to wait mmap_sem. if p doesn't,
> quickly exit is good thing. In other word, task fairness is not our goal
> when oom occur.
>
Tend to agree. I didn't agree boosting of whole threads' priority.
Task fairness VS system hang is trade off. task fairness is best
effort but system hang is critical.
Also, we have tried to it.
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
But I think above code is meaningless unless p use SCHED_RR.
So boosting of lowest RT priority with FIFO is to meet above comment's
goal, I think.
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-03 0:52 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-06-03 0:52 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: David Rientjes, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
balbir, Oleg Nesterov, linux-kernel, linux-mm, Thomas Gleixner,
Peter Zijlstra, Mel Gorman, williams
On Thu, Jun 3, 2010 at 8:36 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:
>>
>> > > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>> > > > * Otherwise we could get an easy OOM deadlock.
>> > > > */
>> > > > if (p->flags & PF_EXITING) {
>> > > > - if (p != current)
>> > > > + if (p != current) {
>> > > > + boost_dying_task_prio(p, mem);
>> > > > return ERR_PTR(-1UL);
>> > > > -
>> > > > + }
>> > > > chosen = p;
>> > > > *ppoints = ULONG_MAX;
>> > > > }
>> > >
>> > > This has the potential to actually make it harder to free memory if p is
>> > > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
>> > > thread holding mm->mmap_sem is trying to run.
>> >
>> > if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
>> >
>>
>> And that can reduce the runtime of the thread holding a writelock on
>> mm->mmap_sem, making the exit actually take longer than without the patch
>> if its priority is significantly higher, especially on smaller machines.
>
> If p need mmap_sem, p is going to sleep to wait mmap_sem. if p doesn't,
> quickly exit is good thing. In other word, task fairness is not our goal
> when oom occur.
>
Tend to agree. I didn't agree boosting of whole threads' priority.
Task fairness VS system hang is trade off. task fairness is best
effort but system hang is critical.
Also, we have tried to it.
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
But I think above code is meaningless unless p use SCHED_RR.
So boosting of lowest RT priority with FIFO is to meet above comment's
goal, I think.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-02 21:11 ` David Rientjes
@ 2010-06-03 7:50 ` Peter Zijlstra
-1 siblings, 0 replies; 110+ messages in thread
From: Peter Zijlstra @ 2010-06-03 7:50 UTC (permalink / raw)
To: David Rientjes
Cc: KOSAKI Motohiro, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Mel Gorman, williams
On Wed, 2010-06-02 at 14:11 -0700, David Rientjes wrote:
>
> And that can reduce the runtime of the thread holding a writelock on
> mm->mmap_sem, making the exit actually take longer than without the patch
> if its priority is significantly higher, especially on smaller machines.
/me smells an inversion... on -rt we solved those ;-)
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-03 7:50 ` Peter Zijlstra
0 siblings, 0 replies; 110+ messages in thread
From: Peter Zijlstra @ 2010-06-03 7:50 UTC (permalink / raw)
To: David Rientjes
Cc: KOSAKI Motohiro, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Mel Gorman, williams
On Wed, 2010-06-02 at 14:11 -0700, David Rientjes wrote:
>
> And that can reduce the runtime of the thread holding a writelock on
> mm->mmap_sem, making the exit actually take longer than without the patch
> if its priority is significantly higher, especially on smaller machines.
/me smells an inversion... on -rt we solved those ;-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-03 7:50 ` Peter Zijlstra
@ 2010-06-03 20:32 ` David Rientjes
-1 siblings, 0 replies; 110+ messages in thread
From: David Rientjes @ 2010-06-03 20:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: KOSAKI Motohiro, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Mel Gorman, williams
On Thu, 3 Jun 2010, Peter Zijlstra wrote:
> > And that can reduce the runtime of the thread holding a writelock on
> > mm->mmap_sem, making the exit actually take longer than without the patch
> > if its priority is significantly higher, especially on smaller machines.
>
> /me smells an inversion... on -rt we solved those ;-)
>
Right, but I don't see how increasing an oom killed tasks priority to a
divine priority doesn't impact the priorities of other tasks which may be
blocking the exit of that task, namely a coredumper or holder of
mm->mmap_sem. This patch also doesn't address how it negatively impacts
the priorities of jobs running in different cpusets (although sharing the
same cpus) because one cpuset is oom.
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-03 20:32 ` David Rientjes
0 siblings, 0 replies; 110+ messages in thread
From: David Rientjes @ 2010-06-03 20:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: KOSAKI Motohiro, Luis Claudio R. Goncalves, KAMEZAWA Hiroyuki,
Minchan Kim, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Mel Gorman, williams
On Thu, 3 Jun 2010, Peter Zijlstra wrote:
> > And that can reduce the runtime of the thread holding a writelock on
> > mm->mmap_sem, making the exit actually take longer than without the patch
> > if its priority is significantly higher, especially on smaller machines.
>
> /me smells an inversion... on -rt we solved those ;-)
>
Right, but I don't see how increasing an oom killed tasks priority to a
divine priority doesn't impact the priorities of other tasks which may be
blocking the exit of that task, namely a coredumper or holder of
mm->mmap_sem. This patch also doesn't address how it negatively impacts
the priorities of jobs running in different cpusets (although sharing the
same cpus) because one cpuset is oom.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
2010-05-31 13:52 ` Luis Claudio R. Goncalves
@ 2010-06-01 8:19 ` Minchan Kim
-1 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-06-01 8:19 UTC (permalink / raw)
To: Luis Claudio R. Goncalves, Peter Zijlstra
Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, David Rientjes,
Mel Gorman, williams
On Mon, May 31, 2010 at 10:52 PM, Luis Claudio R. Goncalves
<lclaudio@uudg.org> wrote:
> On Mon, May 31, 2010 at 03:51:02PM +0900, KAMEZAWA Hiroyuki wrote:
> | On Mon, 31 May 2010 15:09:41 +0900
> | Minchan Kim <minchan.kim@gmail.com> wrote:
> | > On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
> | > <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> ...
> | > >> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
> | > >> > for freeing memory AFAP. But to free memory, exit, all threads which share
> | > >> > mm_struct should exit, too. I'm sorry if I miss something.
> | > >>
> | > >> How do we kill only some thread and what's the benefit of it?
> | > >> I think when if some thread receives KILL signal, the process include
> | > >> the thread will be killed.
> | > >>
> | > > yes, so, if you want a _process_ die quickly, you have to acceralte the whole
> | > > threads on a process. Acceralating a thread in a process is not big help.
> | >
> | > Yes.
> | >
> | > I see the code.
> | > oom_kill_process is called by
> | >
> | > 1. mem_cgroup_out_of_memory
> | > 2. __out_of_memory
> | > 3. out_of_memory
> | >
> | >
> | > (1,2) calls select_bad_process which select victim task in processes
> | > by do_each_process.
> | > But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
> | > In only the case, couldn't we pass task of process, not one of thread?
> | >
> |
> | Hmm, my point is that priority-acceralation is against a thread, not against a process.
> | So, most of threads in memory-eater will not gain high priority even with this patch
> | and works slowly.
>
> This is a good point...
>
> | I have no objections to this patch. I just want to confirm the purpose. If this patch
> | is for accelating exiting process by SIGKILL, it seems not enough.
>
> I understand (from the comments in the code) the badness calculation gives more
> points to the siblings in a thread that have their own mm. I wonder if what you
> are describing is not a corner case.
>
> Again, your idea sounds like an interesting refinement to the patch. I am
> just not sure this change should implemented now or in a second round of
> changes.
First of all, I think your patch is first.
That's because I am not sure this logic is effective.
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
Peter changed it in fa717060f1ab.
Now if we change rt.time_slice as HZ, it means the task have high priority?
I am not a scheduler expert. but as I looked through scheduler code,
rt.time_slice is only related to RT scheduler. so if we uses CFS, it
doesn't make task high priority.
Perter, Right?
If it is right, I think Luis patch will fix it.
Secondly, as Kame pointed out, we have to raise whole thread's
priority to kill victim process for reclaiming pages. But I think it
has deadlock problem.
If we raise whole threads's priority and some thread has dependency of
other thread which is blocked, it makes system deadlock. So I think
it's not easy part.
If this part is really big problem, we should consider it more carefully.
>
> | If an explanation as "acceralating all thread's priority in a process seems overkill"
> | is given in changelog or comment, it's ok to me.
>
> If my understanding of badness() is right, I wouldn't be ashamed of saying
> that it seems to be _a bit_ overkill. But I may be wrong in my
> interpretation.
>
> While re-reading the code I noticed that in select_bad_process() we can
> eventually bump on an already dying task, case in which we just wait for
> the task to die and avoid killing other tasks. Maybe we could boost the
> priority of the dying task here too.
Yes. It is good where we boost priority of task, I think.
>
> Luis
> --
> [ Luis Claudio R. Goncalves Bass - Gospel - RT ]
> [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
>
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-01 8:19 ` Minchan Kim
0 siblings, 0 replies; 110+ messages in thread
From: Minchan Kim @ 2010-06-01 8:19 UTC (permalink / raw)
To: Luis Claudio R. Goncalves, Peter Zijlstra
Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, balbir, Oleg Nesterov,
linux-kernel, linux-mm, Thomas Gleixner, David Rientjes,
Mel Gorman, williams
On Mon, May 31, 2010 at 10:52 PM, Luis Claudio R. Goncalves
<lclaudio@uudg.org> wrote:
> On Mon, May 31, 2010 at 03:51:02PM +0900, KAMEZAWA Hiroyuki wrote:
> | On Mon, 31 May 2010 15:09:41 +0900
> | Minchan Kim <minchan.kim@gmail.com> wrote:
> | > On Mon, May 31, 2010 at 2:54 PM, KAMEZAWA Hiroyuki
> | > <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> ...
> | > >> > IIUC, the purpose of rising priority is to accerate dying thread to exit()
> | > >> > for freeing memory AFAP. But to free memory, exit, all threads which share
> | > >> > mm_struct should exit, too. I'm sorry if I miss something.
> | > >>
> | > >> How do we kill only some thread and what's the benefit of it?
> | > >> I think when if some thread receives KILL signal, the process include
> | > >> the thread will be killed.
> | > >>
> | > > yes, so, if you want a _process_ die quickly, you have to acceralte the whole
> | > > threads on a process. Acceralating a thread in a process is not big help.
> | >
> | > Yes.
> | >
> | > I see the code.
> | > oom_kill_process is called by
> | >
> | > 1. mem_cgroup_out_of_memory
> | > 2. __out_of_memory
> | > 3. out_of_memory
> | >
> | >
> | > (1,2) calls select_bad_process which select victim task in processes
> | > by do_each_process.
> | > But 3 isn't In case of CONSTRAINT_MEMORY_POLICY, it kills current.
> | > In only the case, couldn't we pass task of process, not one of thread?
> | >
> |
> | Hmm, my point is that priority-acceralation is against a thread, not against a process.
> | So, most of threads in memory-eater will not gain high priority even with this patch
> | and works slowly.
>
> This is a good point...
>
> | I have no objections to this patch. I just want to confirm the purpose. If this patch
> | is for accelating exiting process by SIGKILL, it seems not enough.
>
> I understand (from the comments in the code) the badness calculation gives more
> points to the siblings in a thread that have their own mm. I wonder if what you
> are describing is not a corner case.
>
> Again, your idea sounds like an interesting refinement to the patch. I am
> just not sure this change should implemented now or in a second round of
> changes.
First of all, I think your patch is first.
That's because I am not sure this logic is effective.
/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
Peter changed it in fa717060f1ab.
Now if we change rt.time_slice as HZ, it means the task have high priority?
I am not a scheduler expert. but as I looked through scheduler code,
rt.time_slice is only related to RT scheduler. so if we uses CFS, it
doesn't make task high priority.
Perter, Right?
If it is right, I think Luis patch will fix it.
Secondly, as Kame pointed out, we have to raise whole thread's
priority to kill victim process for reclaiming pages. But I think it
has deadlock problem.
If we raise whole threads's priority and some thread has dependency of
other thread which is blocked, it makes system deadlock. So I think
it's not easy part.
If this part is really big problem, we should consider it more carefully.
>
> | If an explanation as "acceralating all thread's priority in a process seems overkill"
> | is given in changelog or comment, it's ok to me.
>
> If my understanding of badness() is right, I wouldn't be ashamed of saying
> that it seems to be _a bit_ overkill. But I may be wrong in my
> interpretation.
>
> While re-reading the code I noticed that in select_bad_process() we can
> eventually bump on an already dying task, case in which we just wait for
> the task to die and avoid killing other tasks. Maybe we could boost the
> priority of the dying task here too.
Yes. It is good where we boost priority of task, I think.
>
> Luis
> --
> [ Luis Claudio R. Goncalves Bass - Gospel - RT ]
> [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread* Re: [RFC] oom-kill: give the dying task a higher priority
2010-06-01 8:19 ` Minchan Kim
@ 2010-06-01 18:36 ` David Rientjes
-1 siblings, 0 replies; 110+ messages in thread
From: David Rientjes @ 2010-06-01 18:36 UTC (permalink / raw)
To: Minchan Kim
Cc: Luis Claudio R. Goncalves, Peter Zijlstra, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Mel Gorman, williams
On Tue, 1 Jun 2010, Minchan Kim wrote:
> Secondly, as Kame pointed out, we have to raise whole thread's
> priority to kill victim process for reclaiming pages. But I think it
> has deadlock problem.
Agreed, this has the potential to actually increase the amount of time for
an oom killed task to fully exit: the exit path takes mm->mmap_sem on exit
and if that is held by another thread waiting for the oom killed task to
exit (i.e. reclaim has failed and the oom killer becomes a no-op because
it sees an already killed task) then there's a livelock. That's always
been a problem, but is compounded with increasing the priority of a task
not holding mm->mmap_sem if the thread holding the writelock actually
isn't looking for memory but simply doesn't get a chance to release
because it fails to run.
^ permalink raw reply [flat|nested] 110+ messages in thread
* Re: [RFC] oom-kill: give the dying task a higher priority
@ 2010-06-01 18:36 ` David Rientjes
0 siblings, 0 replies; 110+ messages in thread
From: David Rientjes @ 2010-06-01 18:36 UTC (permalink / raw)
To: Minchan Kim
Cc: Luis Claudio R. Goncalves, Peter Zijlstra, KAMEZAWA Hiroyuki,
KOSAKI Motohiro, balbir, Oleg Nesterov, linux-kernel, linux-mm,
Thomas Gleixner, Mel Gorman, williams
On Tue, 1 Jun 2010, Minchan Kim wrote:
> Secondly, as Kame pointed out, we have to raise whole thread's
> priority to kill victim process for reclaiming pages. But I think it
> has deadlock problem.
Agreed, this has the potential to actually increase the amount of time for
an oom killed task to fully exit: the exit path takes mm->mmap_sem on exit
and if that is held by another thread waiting for the oom killed task to
exit (i.e. reclaim has failed and the oom killer becomes a no-op because
it sees an already killed task) then there's a livelock. That's always
been a problem, but is compounded with increasing the priority of a task
not holding mm->mmap_sem if the thread holding the writelock actually
isn't looking for memory but simply doesn't get a chance to release
because it fails to run.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 110+ messages in thread