linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
@ 2013-09-09 15:30 Sergey Dyasly
  2013-09-09 16:31 ` Oleg Nesterov
  2013-09-09 20:07 ` David Rientjes
  0 siblings, 2 replies; 11+ messages in thread
From: Sergey Dyasly @ 2013-09-09 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, David Rientjes, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov, Sergey Dyasly

If OOM killer finds a task which is about to exit or is already doing so,
there is no need to kill anyone. It should just wait until task dies.

Add missing fatal_signal_pending() check and allow selected task to use memory
reserves in order to exit quickly.

Also remove redundant PF_EXITING check after victim has been selected.

Signed-off-by: Sergey Dyasly <dserrg@gmail.com>
---
 mm/oom_kill.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 98e75f2..ef83b81 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
 	if (oom_task_origin(task))
 		return OOM_SCAN_SELECT;
 
-	if (task->flags & PF_EXITING && !force_kill) {
+	if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
+	    !force_kill) {
 		/*
 		 * If this task is not being ptraced on exit, then wait for it
 		 * to finish before killing some other task unnecessarily.
 		 */
-		if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
+		if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
+			set_tsk_thread_flag(task, TIF_MEMDIE);
 			return OOM_SCAN_ABORT;
+		}
 	}
 	return OOM_SCAN_OK;
 }
@@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
 					      DEFAULT_RATELIMIT_BURST);
 
-	/*
-	 * If the task is already exiting, don't alarm the sysadmin or kill
-	 * its children or threads, just set TIF_MEMDIE so it can die quickly
-	 */
-	if (p->flags & PF_EXITING) {
-		set_tsk_thread_flag(p, TIF_MEMDIE);
-		put_task_struct(p);
-		return;
-	}
-
 	if (__ratelimit(&oom_rs))
 		dump_header(p, gfp_mask, order, memcg, nodemask);
 
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-09 15:30 [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit Sergey Dyasly
@ 2013-09-09 16:31 ` Oleg Nesterov
  2013-09-09 20:11   ` David Rientjes
  2013-09-09 20:07 ` David Rientjes
  1 sibling, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2013-09-09 16:31 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: linux-kernel, linux-mm, Andrew Morton, David Rientjes,
	Michal Hocko, Rusty Russell, Sha Zhengju

Can't really comment this patch, just one off-topic note...

On 09/09, Sergey Dyasly wrote:
>
> @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
>  	if (oom_task_origin(task))
>  		return OOM_SCAN_SELECT;
>  
> -	if (task->flags & PF_EXITING && !force_kill) {
> +	if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
> +	    !force_kill) {
>  		/*
>  		 * If this task is not being ptraced on exit, then wait for it
>  		 * to finish before killing some other task unnecessarily.
>  		 */
> -		if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> +		if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {

can't we finally kill (or fix?) this PT_TRACE_EXIT check?

It was added to fix the exploit I sent. But the patch was wrong,
that exploit could be easily modified to trigger the same problem.

However, now that the coredumping is killable that exploit won't
work, so the original reason has gone away.

So why do we need this check today?

And note that we check ->group_leader, this looks completely wrong.
(again, ->group_leader was used just because the original exploit
 traced the group leader).

David?

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-09 15:30 [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit Sergey Dyasly
  2013-09-09 16:31 ` Oleg Nesterov
@ 2013-09-09 20:07 ` David Rientjes
  2013-09-11 15:06   ` Sergey Dyasly
  1 sibling, 1 reply; 11+ messages in thread
From: David Rientjes @ 2013-09-09 20:07 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov

On Mon, 9 Sep 2013, Sergey Dyasly wrote:

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 98e75f2..ef83b81 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
>  	if (oom_task_origin(task))
>  		return OOM_SCAN_SELECT;
>  
> -	if (task->flags & PF_EXITING && !force_kill) {
> +	if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
> +	    !force_kill) {

This makes sense.

>  		/*
>  		 * If this task is not being ptraced on exit, then wait for it
>  		 * to finish before killing some other task unnecessarily.
>  		 */
> -		if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> +		if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> +			set_tsk_thread_flag(task, TIF_MEMDIE);

This does not, we do not give access to memory reserves unless the process 
needs it to allocate memory.  The task here, which is not current, can 
call into the oom killer and be granted memory reserves if necessary.

>  			return OOM_SCAN_ABORT;
> +		}
>  	}
>  	return OOM_SCAN_OK;
>  }
> @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
>  					      DEFAULT_RATELIMIT_BURST);
>  
> -	/*
> -	 * If the task is already exiting, don't alarm the sysadmin or kill
> -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> -	 */
> -	if (p->flags & PF_EXITING) {
> -		set_tsk_thread_flag(p, TIF_MEMDIE);
> -		put_task_struct(p);
> -		return;
> -	}

I think you misunderstood the point of this; if a selected process is 
already in the exit path then this is simply avoiding dumping oom kill 
lines to the kernel log.  We want to keep doing that.

> -
>  	if (__ratelimit(&oom_rs))
>  		dump_header(p, gfp_mask, order, memcg, nodemask);
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-09 16:31 ` Oleg Nesterov
@ 2013-09-09 20:11   ` David Rientjes
  0 siblings, 0 replies; 11+ messages in thread
From: David Rientjes @ 2013-09-09 20:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Sergey Dyasly, linux-kernel, linux-mm, Andrew Morton,
	Michal Hocko, Rusty Russell, Sha Zhengju

On Mon, 9 Sep 2013, Oleg Nesterov wrote:

> > @@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
> >  	if (oom_task_origin(task))
> >  		return OOM_SCAN_SELECT;
> >  
> > -	if (task->flags & PF_EXITING && !force_kill) {
> > +	if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
> > +	    !force_kill) {
> >  		/*
> >  		 * If this task is not being ptraced on exit, then wait for it
> >  		 * to finish before killing some other task unnecessarily.
> >  		 */
> > -		if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > +		if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> 
> can't we finally kill (or fix?) this PT_TRACE_EXIT check?
> 

Patches are always welcome.

> It was added to fix the exploit I sent. But the patch was wrong,
> that exploit could be easily modified to trigger the same problem.
> 

If the patch prevented your exploit when coredumping was done differently 
then it was not wrong.  It may not have been as inclusive as you would 
have liked, but then again you never proposed any kernel changes to fix it 
yourself either.

> However, now that the coredumping is killable that exploit won't
> work, so the original reason has gone away.
> 
> So why do we need this check today?
> 

If you feel it can be removed, please propose a patch to do so with a 
changelog that describes why it is no longer necessary.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-09 20:07 ` David Rientjes
@ 2013-09-11 15:06   ` Sergey Dyasly
  2013-09-19 15:51     ` Sergey Dyasly
  2013-09-25 20:31     ` David Rientjes
  0 siblings, 2 replies; 11+ messages in thread
From: Sergey Dyasly @ 2013-09-11 15:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov

On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> >  		/*
> >  		 * If this task is not being ptraced on exit, then wait for it
> >  		 * to finish before killing some other task unnecessarily.
> >  		 */
> > -		if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > +		if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > +			set_tsk_thread_flag(task, TIF_MEMDIE);
> 
> This does not, we do not give access to memory reserves unless the process 
> needs it to allocate memory.  The task here, which is not current, can 
> call into the oom killer and be granted memory reserves if necessary.

True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
then?
Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation should
be fast if exiting task needs it.

> > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> >  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> >  					      DEFAULT_RATELIMIT_BURST);
> >  
> > -	/*
> > -	 * If the task is already exiting, don't alarm the sysadmin or kill
> > -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> > -	 */
> > -	if (p->flags & PF_EXITING) {
> > -		set_tsk_thread_flag(p, TIF_MEMDIE);
> > -		put_task_struct(p);
> > -		return;
> > -	}
> 
> I think you misunderstood the point of this; if a selected process is 
> already in the exit path then this is simply avoiding dumping oom kill 
> lines to the kernel log.  We want to keep doing that.

This happens in oom_kill_process() after victim has been selected by
select_bad_process(). But there is already PF_EXITING check in
oom_scan_process_thread() and in this case OOM code won't call oom_kill_process.
There is only a slight chance that victim will become PF_EXITING between
scan and kill.

The only difference is in force_kill flag, and the only case where it's set
is SysRq. And I think in this case OOM killer messages are a good thing to have
even when victim is already exiting, instead of just silence.

-- 
Sergey Dyasly <dserrg@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-11 15:06   ` Sergey Dyasly
@ 2013-09-19 15:51     ` Sergey Dyasly
  2013-09-25 20:31     ` David Rientjes
  1 sibling, 0 replies; 11+ messages in thread
From: Sergey Dyasly @ 2013-09-19 15:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov

Ping :)

On Wed, 11 Sep 2013 19:06:05 +0400
Sergey Dyasly <dserrg@gmail.com> wrote:

> On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
> David Rientjes <rientjes@google.com> wrote:
> 
> > >  		/*
> > >  		 * If this task is not being ptraced on exit, then wait for it
> > >  		 * to finish before killing some other task unnecessarily.
> > >  		 */
> > > -		if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > > +		if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > > +			set_tsk_thread_flag(task, TIF_MEMDIE);
> > 
> > This does not, we do not give access to memory reserves unless the process 
> > needs it to allocate memory.  The task here, which is not current, can 
> > call into the oom killer and be granted memory reserves if necessary.
> 
> True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
> then?
> Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation should
> be fast if exiting task needs it.
> 
> > > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> > >  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> > >  					      DEFAULT_RATELIMIT_BURST);
> > >  
> > > -	/*
> > > -	 * If the task is already exiting, don't alarm the sysadmin or kill
> > > -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> > > -	 */
> > > -	if (p->flags & PF_EXITING) {
> > > -		set_tsk_thread_flag(p, TIF_MEMDIE);
> > > -		put_task_struct(p);
> > > -		return;
> > > -	}
> > 
> > I think you misunderstood the point of this; if a selected process is 
> > already in the exit path then this is simply avoiding dumping oom kill 
> > lines to the kernel log.  We want to keep doing that.
> 
> This happens in oom_kill_process() after victim has been selected by
> select_bad_process(). But there is already PF_EXITING check in
> oom_scan_process_thread() and in this case OOM code won't call oom_kill_process.
> There is only a slight chance that victim will become PF_EXITING between
> scan and kill.
> 
> The only difference is in force_kill flag, and the only case where it's set
> is SysRq. And I think in this case OOM killer messages are a good thing to have
> even when victim is already exiting, instead of just silence.
> 
> -- 
> Sergey Dyasly <dserrg@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-11 15:06   ` Sergey Dyasly
  2013-09-19 15:51     ` Sergey Dyasly
@ 2013-09-25 20:31     ` David Rientjes
  2013-09-27 14:58       ` Sergey Dyasly
  1 sibling, 1 reply; 11+ messages in thread
From: David Rientjes @ 2013-09-25 20:31 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov

On Wed, 11 Sep 2013, Sergey Dyasly wrote:

> > >  		/*
> > >  		 * If this task is not being ptraced on exit, then wait for it
> > >  		 * to finish before killing some other task unnecessarily.
> > >  		 */
> > > -		if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > > +		if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > > +			set_tsk_thread_flag(task, TIF_MEMDIE);
> > 
> > This does not, we do not give access to memory reserves unless the process 
> > needs it to allocate memory.  The task here, which is not current, can 
> > call into the oom killer and be granted memory reserves if necessary.
> 
> True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
> then?

If current needs access to memory reserves while PF_EXITING, it should 
call the page allocator, find that it is out of memory, and call the oom 
killer to silently be granted memory reserves.

> > > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> > >  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> > >  					      DEFAULT_RATELIMIT_BURST);
> > >  
> > > -	/*
> > > -	 * If the task is already exiting, don't alarm the sysadmin or kill
> > > -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> > > -	 */
> > > -	if (p->flags & PF_EXITING) {
> > > -		set_tsk_thread_flag(p, TIF_MEMDIE);
> > > -		put_task_struct(p);
> > > -		return;
> > > -	}
> > 
> > I think you misunderstood the point of this; if a selected process is 
> > already in the exit path then this is simply avoiding dumping oom kill 
> > lines to the kernel log.  We want to keep doing that.
> 
> This happens in oom_kill_process() after victim has been selected by
> select_bad_process(). But there is already PF_EXITING check in
> oom_scan_process_thread() and in this case OOM code won't call oom_kill_process.

select_bad_process() is one of three callers to oom_kill_process().

> The only difference is in force_kill flag, and the only case where it's set
> is SysRq. And I think in this case OOM killer messages are a good thing to have
> even when victim is already exiting, instead of just silence.
> 

Read the comment about why we don't emit anything to the kernel log in 
this case; the process is already exiting, there's no need to kill it or 
make anyone believe that it was killed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-25 20:31     ` David Rientjes
@ 2013-09-27 14:58       ` Sergey Dyasly
  2013-09-30 22:08         ` David Rientjes
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Dyasly @ 2013-09-27 14:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov

On Wed, 25 Sep 2013 13:31:32 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 11 Sep 2013, Sergey Dyasly wrote:
> 
> > > >  		/*
> > > >  		 * If this task is not being ptraced on exit, then wait for it
> > > >  		 * to finish before killing some other task unnecessarily.
> > > >  		 */
> > > > -		if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > > > +		if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > > > +			set_tsk_thread_flag(task, TIF_MEMDIE);
> > > 
> > > This does not, we do not give access to memory reserves unless the process 
> > > needs it to allocate memory.  The task here, which is not current, can 
> > > call into the oom killer and be granted memory reserves if necessary.
> > 
> > True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
> > then?
> 
> If current needs access to memory reserves while PF_EXITING, it should 
> call the page allocator, find that it is out of memory, and call the oom 
> killer to silently be granted memory reserves.

I understand this and you are repeating yourself :)
What you are saying contradicts current OOMk code the way I read it. Comment in
oom_kill_process() says:

"If the task is already exiting ... set TIF_MEMDIE so it can die quickly"

I just want to know the right solution.

> > > > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> > > >  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> > > >  					      DEFAULT_RATELIMIT_BURST);
> > > >  
> > > > -	/*
> > > > -	 * If the task is already exiting, don't alarm the sysadmin or kill
> > > > -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> > > > -	 */
> > > > -	if (p->flags & PF_EXITING) {
> > > > -		set_tsk_thread_flag(p, TIF_MEMDIE);
> > > > -		put_task_struct(p);
> > > > -		return;
> > > > -	}
> > > 
> > > I think you misunderstood the point of this; if a selected process is 
> > > already in the exit path then this is simply avoiding dumping oom kill 
> > > lines to the kernel log.  We want to keep doing that.
> > 
> > This happens in oom_kill_process() after victim has been selected by
> > select_bad_process(). But there is already PF_EXITING check in
> > oom_scan_process_thread() and in this case OOM code won't call oom_kill_process.
> 
> select_bad_process() is one of three callers to oom_kill_process().

You are mistaken, oom_kill_process() is only called from out_of_memory()
and mem_cgroup_out_of_memory().

> > The only difference is in force_kill flag, and the only case where it's set
> > is SysRq. And I think in this case OOM killer messages are a good thing to have
> > even when victim is already exiting, instead of just silence.
> > 
> 
> Read the comment about why we don't emit anything to the kernel log in 
> this case; the process is already exiting, there's no need to kill it or 
> make anyone believe that it was killed.

Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
and in this case oom_kill_process() won't be even called. That's why it's
redundant.

--
Sergey Dyasly <dserrg@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-27 14:58       ` Sergey Dyasly
@ 2013-09-30 22:08         ` David Rientjes
  2013-10-01 15:26           ` Sergey Dyasly
  0 siblings, 1 reply; 11+ messages in thread
From: David Rientjes @ 2013-09-30 22:08 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov

On Fri, 27 Sep 2013, Sergey Dyasly wrote:

> What you are saying contradicts current OOMk code the way I read it. Comment in
> oom_kill_process() says:
> 
> "If the task is already exiting ... set TIF_MEMDIE so it can die quickly"
> 
> I just want to know the right solution.
> 

That's a comment, not code.  The point of the PF_EXITING special handling 
in oom_kill_process() is to avoid telling sysadmins that a process has 
been killed to free memory when it has already called exit() and to avoid 
sacrificing one of its children for the exiting process.

It may or may not need access to memory reserves to actually exit after 
PF_EXITING depending on whether it needs to allocate memory for 
coredumping or anything else.  So instead of waiting for it to recall the 
oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
processes can already get TIF_MEMDIE immediately when their memory 
allocation fails so there's no reason not to set it now as an 
optimization.

But we definitely want to avoid printing anything to the kernel log when 
the process has already called exit() and issuing the SIGKILL at that 
point would be pointless.

> You are mistaken, oom_kill_process() is only called from out_of_memory()
> and mem_cgroup_out_of_memory().
> 

out_of_memory() calls oom_kill_process() in two places, plus the call from 
mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
matters in the slightest, though.

> > Read the comment about why we don't emit anything to the kernel log in 
> > this case; the process is already exiting, there's no need to kill it or 
> > make anyone believe that it was killed.
> 
> Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
> and in this case oom_kill_process() won't be even called. That's why it's
> redundant.
> 

You apparently have no idea how long select_bad_process() runs on a large 
system with thousands of processes.  Keep in mind that SGI requested the 
addition of the oom_kill_allocating_task sysctl specifically because of 
how long select_bad_process() runs.  The PF_EXITING check in 
oom_kill_process() is simply an optimization to return early and with 
access to memory reserves so it can exit as quickly as possible and 
without the kernel stating it's killing something that has already called 
exit().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-09-30 22:08         ` David Rientjes
@ 2013-10-01 15:26           ` Sergey Dyasly
  2013-10-01 22:46             ` David Rientjes
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Dyasly @ 2013-10-01 15:26 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov

It seems to me that we are going nowhere with this discussion...

If you are ok with the first change in my patch regarding fatal_signal_pending,
I can send new patch with just that change.


On Mon, 30 Sep 2013 15:08:25 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Fri, 27 Sep 2013, Sergey Dyasly wrote:
> 
> > What you are saying contradicts current OOMk code the way I read it. Comment in
> > oom_kill_process() says:
> > 
> > "If the task is already exiting ... set TIF_MEMDIE so it can die quickly"
> > 
> > I just want to know the right solution.
> > 
> 
> That's a comment, not code.  The point of the PF_EXITING special handling 
> in oom_kill_process() is to avoid telling sysadmins that a process has 
> been killed to free memory when it has already called exit() and to avoid 
> sacrificing one of its children for the exiting process.
> 
> It may or may not need access to memory reserves to actually exit after 
> PF_EXITING depending on whether it needs to allocate memory for 
> coredumping or anything else.  So instead of waiting for it to recall the 
> oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
> processes can already get TIF_MEMDIE immediately when their memory 
> allocation fails so there's no reason not to set it now as an 
> optimization.
> 
> But we definitely want to avoid printing anything to the kernel log when 
> the process has already called exit() and issuing the SIGKILL at that 
> point would be pointless.
> 
> > You are mistaken, oom_kill_process() is only called from out_of_memory()
> > and mem_cgroup_out_of_memory().
> > 
> 
> out_of_memory() calls oom_kill_process() in two places, plus the call from 
> mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
> matters in the slightest, though.
> 
> > > Read the comment about why we don't emit anything to the kernel log in 
> > > this case; the process is already exiting, there's no need to kill it or 
> > > make anyone believe that it was killed.
> > 
> > Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
> > and in this case oom_kill_process() won't be even called. That's why it's
> > redundant.
> > 
> 
> You apparently have no idea how long select_bad_process() runs on a large 
> system with thousands of processes.  Keep in mind that SGI requested the 
> addition of the oom_kill_allocating_task sysctl specifically because of 
> how long select_bad_process() runs.  The PF_EXITING check in 
> oom_kill_process() is simply an optimization to return early and with 
> access to memory reserves so it can exit as quickly as possible and 
> without the kernel stating it's killing something that has already called 
> exit().


-- 
Sergey Dyasly <dserrg@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit
  2013-10-01 15:26           ` Sergey Dyasly
@ 2013-10-01 22:46             ` David Rientjes
  0 siblings, 0 replies; 11+ messages in thread
From: David Rientjes @ 2013-10-01 22:46 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko,
	Rusty Russell, Sha Zhengju, Oleg Nesterov

On Tue, 1 Oct 2013, Sergey Dyasly wrote:

> If you are ok with the first change in my patch regarding fatal_signal_pending,
> I can send new patch with just that change.
> 

The entire patch is pointless, there's no need to give access to memory 
reserves simply because it is PF_EXITING.  If it needs memory, it will 
call the oom killer itself.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-10-01 22:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-09 15:30 [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit Sergey Dyasly
2013-09-09 16:31 ` Oleg Nesterov
2013-09-09 20:11   ` David Rientjes
2013-09-09 20:07 ` David Rientjes
2013-09-11 15:06   ` Sergey Dyasly
2013-09-19 15:51     ` Sergey Dyasly
2013-09-25 20:31     ` David Rientjes
2013-09-27 14:58       ` Sergey Dyasly
2013-09-30 22:08         ` David Rientjes
2013-10-01 15:26           ` Sergey Dyasly
2013-10-01 22:46             ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).