[PATCH] oom: add pending SIGKILL check for chosen victim

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] oom: add pending SIGKILL check for chosen victim
@ 2013-04-22 15:06 Sergey Dyasly
  2013-04-22 19:51 ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Sergey Dyasly @ 2013-04-22 15:06 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Rientjes, Michal Hocko, KAMEZAWA Hiroyuki,
	Sha Zhengju, Sergey Dyasly

Currently, fatal_signal_pending() check is issued only for task that invoked
oom killer. Add the same check for oom killer's chosen victim.

This eliminates regression with killing multithreaded processes which was
introduced by commit 6b0c81b3be114a93f79bd4c5639ade5107d77c21
(mm, oom: reduce dependency on tasklist_lock). When one of threads
was oom-killed, other threads could also become victims of oom killer, which
can cause an infinite loop.

There is a race with task->thread_group RCU protected list deletion/iteration:
now only a reference to a chosen thread of dying threadgroup is held, so when
the thread doesn't have PF_EXITING flag yet and dump_header() is called
to print info, it already has SIGKILL and can call do_exit(), which removes
the thread from the thread_group list. After printing info, oom killer
is doing while_each_thread() on this thread and it still has next reference
to some other thread, but no other thread has next reference to this one.
This causes the infinite loop with tasklist_lock read held.

When SIGKILL is sent to a task, it's also sent to all tasks in the same
threadgroup. This information can be used to prevent triggering further
oom killers for this threadgroup and avoid the infinite loop.

Signed-off-by: Sergey Dyasly <dserrg@gmail.com>
---
 mm/oom_kill.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 79e451a..5c42dd3 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -413,10 +413,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 					      DEFAULT_RATELIMIT_BURST);

 	/*
-	 * If the task is already exiting, don't alarm the sysadmin or kill
-	 * its children or threads, just set TIF_MEMDIE so it can die quickly
+	 * If the task already has a pending SIGKILL or is exiting, don't alarm
+	 * the sysadmin or kill its children or threads, just set TIF_MEMDIE
+	 * so it can die quickly
 	 */
-	if (p->flags & PF_EXITING) {
+	if (fatal_signal_pending(p) || p->flags & PF_EXITING) {
 		set_tsk_thread_flag(p, TIF_MEMDIE);
 		put_task_struct(p);
 		return;
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-22 15:06 [PATCH] oom: add pending SIGKILL check for chosen victim Sergey Dyasly
@ 2013-04-22 19:51 ` Michal Hocko
  2013-04-23 15:26   ` dserrg
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2013-04-22 19:51 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki,
	Sha Zhengju

On Mon 22-04-13 19:06:24, Sergey Dyasly wrote:
> Currently, fatal_signal_pending() check is issued only for task that invoked
> oom killer. Add the same check for oom killer's chosen victim.
> 
> This eliminates regression with killing multithreaded processes which was
> introduced by commit 6b0c81b3be114a93f79bd4c5639ade5107d77c21
> (mm, oom: reduce dependency on tasklist_lock). When one of threads
> was oom-killed, other threads could also become victims of oom killer, which
> can cause an infinite loop.
> 
> There is a race with task->thread_group RCU protected list deletion/iteration:
> now only a reference to a chosen thread of dying threadgroup is held, so when
> the thread doesn't have PF_EXITING flag yet and dump_header() is called
> to print info, it already has SIGKILL and can call do_exit(), which removes
> the thread from the thread_group list. After printing info, oom killer
> is doing while_each_thread() on this thread and it still has next reference
> to some other thread, but no other thread has next reference to this one.
> This causes the infinite loop with tasklist_lock read held.

I am not sure I understand the race you are describing here.
release_task calls __exit_signal with tasklist_lock held for write. And
we are holding the very same lock for reading around while_each_thread
in oom_kill_process.
 
> When SIGKILL is sent to a task, it's also sent to all tasks in the same
> threadgroup. This information can be used to prevent triggering further
> oom killers for this threadgroup and avoid the infinite loop.
> 
> Signed-off-by: Sergey Dyasly <dserrg@gmail.com>
> ---
>  mm/oom_kill.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 79e451a..5c42dd3 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -413,10 +413,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  					      DEFAULT_RATELIMIT_BURST);
>  
>  	/*
> -	 * If the task is already exiting, don't alarm the sysadmin or kill
> -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> +	 * If the task already has a pending SIGKILL or is exiting, don't alarm
> +	 * the sysadmin or kill its children or threads, just set TIF_MEMDIE
> +	 * so it can die quickly
>  	 */
> -	if (p->flags & PF_EXITING) {
> +	if (fatal_signal_pending(p) || p->flags & PF_EXITING) {
>  		set_tsk_thread_flag(p, TIF_MEMDIE);
>  		put_task_struct(p);
>  		return;
> -- 
> 1.8.1.2
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-22 19:51 ` Michal Hocko
@ 2013-04-23 15:26   ` dserrg
  2013-04-23 15:56     ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: dserrg @ 2013-04-23 15:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki,
	Sha Zhengju

On Mon, 22 Apr 2013 21:51:38 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Mon 22-04-13 19:06:24, Sergey Dyasly wrote:
> > Currently, fatal_signal_pending() check is issued only for task that invoked
> > oom killer. Add the same check for oom killer's chosen victim.
> > 
> > This eliminates regression with killing multithreaded processes which was
> > introduced by commit 6b0c81b3be114a93f79bd4c5639ade5107d77c21
> > (mm, oom: reduce dependency on tasklist_lock). When one of threads
> > was oom-killed, other threads could also become victims of oom killer, which
> > can cause an infinite loop.
> > 
> > There is a race with task->thread_group RCU protected list deletion/iteration:
> > now only a reference to a chosen thread of dying threadgroup is held, so when
> > the thread doesn't have PF_EXITING flag yet and dump_header() is called
> > to print info, it already has SIGKILL and can call do_exit(), which removes
> > the thread from the thread_group list. After printing info, oom killer
> > is doing while_each_thread() on this thread and it still has next reference
> > to some other thread, but no other thread has next reference to this one.
> > This causes the infinite loop with tasklist_lock read held.
> 
> I am not sure I understand the race you are describing here.
> release_task calls __exit_signal with tasklist_lock held for write. And
> we are holding the very same lock for reading around while_each_thread
> in oom_kill_process.

Yes, we are holding tasklist_lock when iterating, but the thread can be deleted
from thread_group list _before_ that. In this case, while_each_thread loop exit
condition will never be true.

Imagine the following situation:
Threadgroup with 4 threads: thread_1, thread_2, thread_3, thread_4.

thread_1 is oom killed and SIGKILL is sent to all threads.

allocation --> no memory --> invoke oom killer
oom killer selects thread_2 as victim:


           OOM killer               |              thread_2
                                    |
  oom_kill_process(thread_2)        |
      thread_2 has PF_EXITING? no   |      (but has pending SIGKILL)
      dump_header()                 |
                                    |
                                    |      do_exit()
                                    |          sets PF_EXITING
                                    |          list_del_rcu(thread_group)
                                    |
      read_lock(tasklist_lock)      |
      while_each_thread()           |

Iteration order: thread_2 --> thread_3 --> thread_4 --> thread_3 --> thread_4...
This will never reach thread_2 again and break loop, as result: infinite loop.

-- 
dserrg <dserrg@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-23 15:26   ` dserrg
@ 2013-04-23 15:56     ` Michal Hocko
  2013-04-24 14:55       ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2013-04-23 15:56 UTC (permalink / raw)
  To: dserrg
  Cc: linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki,
	Sha Zhengju, Oleg Nesterov

[CCing Oleg]

On Tue 23-04-13 19:26:14, dserrg wrote:
> On Mon, 22 Apr 2013 21:51:38 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Mon 22-04-13 19:06:24, Sergey Dyasly wrote:
> > > Currently, fatal_signal_pending() check is issued only for task that invoked
> > > oom killer. Add the same check for oom killer's chosen victim.
> > > 
> > > This eliminates regression with killing multithreaded processes which was
> > > introduced by commit 6b0c81b3be114a93f79bd4c5639ade5107d77c21
> > > (mm, oom: reduce dependency on tasklist_lock). When one of threads
> > > was oom-killed, other threads could also become victims of oom killer, which
> > > can cause an infinite loop.
> > > 
> > > There is a race with task->thread_group RCU protected list deletion/iteration:
> > > now only a reference to a chosen thread of dying threadgroup is held, so when
> > > the thread doesn't have PF_EXITING flag yet and dump_header() is called
> > > to print info, it already has SIGKILL and can call do_exit(), which removes
> > > the thread from the thread_group list. After printing info, oom killer
> > > is doing while_each_thread() on this thread and it still has next reference
> > > to some other thread, but no other thread has next reference to this one.
> > > This causes the infinite loop with tasklist_lock read held.
> > 
> > I am not sure I understand the race you are describing here.
> > release_task calls __exit_signal with tasklist_lock held for write. And
> > we are holding the very same lock for reading around while_each_thread
> > in oom_kill_process.
> 
> Yes, we are holding tasklist_lock when iterating, but the thread can be deleted
> from thread_group list _before_ that. In this case, while_each_thread loop exit
> condition will never be true.
> 
> Imagine the following situation:
> Threadgroup with 4 threads: thread_1, thread_2, thread_3, thread_4.
> 
> thread_1 is oom killed and SIGKILL is sent to all threads.
> 
> allocation --> no memory --> invoke oom killer
> oom killer selects thread_2 as victim:
> 
> 
>            OOM killer               |              thread_2
>                                     |
>   oom_kill_process(thread_2)        |
>       thread_2 has PF_EXITING? no   |      (but has pending SIGKILL)
>       dump_header()                 |
>                                     |
>                                     |      do_exit()
>                                     |          sets PF_EXITING
>                                     |          list_del_rcu(thread_group)
>                                     |
>       read_lock(tasklist_lock)      |
>       while_each_thread()           |
> 
> Iteration order: thread_2 --> thread_3 --> thread_4 --> thread_3 --> thread_4...
> This will never reach thread_2 again and break loop, as result: infinite loop.

Oleg, is there anything that would prevent from this race? Maybe we need
to call thread_group_empty before?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-23 15:56     ` Michal Hocko
@ 2013-04-24 14:55       ` Oleg Nesterov
  2013-04-24 15:22         ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-04-24 14:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: dserrg, linux-mm, Andrew Morton, David Rientjes,
	KAMEZAWA Hiroyuki, Sha Zhengju

On 04/23, Michal Hocko wrote:
>
> [CCing Oleg]
>
> On Tue 23-04-13 19:26:14, dserrg wrote:
> > On Mon, 22 Apr 2013 21:51:38 +0200
> >
> > Yes, we are holding tasklist_lock when iterating, but the thread can be deleted
> > from thread_group list _before_ that. In this case, while_each_thread loop exit
> > condition will never be true.

Yes, while_each_thread() should be only used if know that ->thread_group
list is valid.

For example, if you do find_task_by_vpid() under rcu_read_lock()
while_each_thread() is fine _unless_ you drop rcu lock in between.

This is the common mistake, people often forget about this.

But I can't understand how this patch can fix the problem, I think it
can't.

>From the changelog:

	When SIGKILL is sent to a task, it's also sent to all tasks in the same
	threadgroup. This information can be used to prevent triggering further
	oom killers for this threadgroup and avoid the infinite loop.
                                             ^^^^^^^^^^^^^^^^^^^^^^^

How??


> Oleg, is there anything that would prevent from this race? Maybe we need
> to call thread_group_empty before?

You need to check, say, pid_alive(). Or PF_EXITING.

For example oom_kill_process() looks wrong. It does check PF_EXITING
before while_each_thread(), but this is racy because it should check
it under tasklist_lock.

So I think that oom_kill_process() needs something like below, but
this code really needs the cleanups.

Oleg.


--- x/mm/oom_kill.c
+++ x/mm/oom_kill.c
@@ -436,6 +436,14 @@ void oom_kill_process(struct task_struct
 	 * parent.  This attempts to lose the minimal amount of work done while
 	 * still freeing memory.
 	 */
+	rcu_read_lock();
+	if (!pid_alive(p)) {
+		rcu_read_unlock();
+		set_tsk_thread_flag(p, TIF_MEMDIE);
+		put_task_struct(p);
+		return;
+	}
+
 	read_lock(&tasklist_lock);
 	do {
 		list_for_each_entry(child, &t->children, sibling) {
@@ -458,7 +466,6 @@ void oom_kill_process(struct task_struct
 	} while_each_thread(p, t);
 	read_unlock(&tasklist_lock);
 
-	rcu_read_lock();
 	p = find_lock_task_mm(victim);
 	if (!p) {
 		rcu_read_unlock();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-24 14:55       ` Oleg Nesterov
@ 2013-04-24 15:22         ` Michal Hocko
  2013-04-24 15:42           ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2013-04-24 15:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dserrg, linux-mm, Andrew Morton, David Rientjes,
	KAMEZAWA Hiroyuki, Sha Zhengju

On Wed 24-04-13 16:55:14, Oleg Nesterov wrote:
> On 04/23, Michal Hocko wrote:
> >
> > [CCing Oleg]
> >
> > On Tue 23-04-13 19:26:14, dserrg wrote:
> > > On Mon, 22 Apr 2013 21:51:38 +0200
> > >
> > > Yes, we are holding tasklist_lock when iterating, but the thread can be deleted
> > > from thread_group list _before_ that. In this case, while_each_thread loop exit
> > > condition will never be true.
> 
> Yes, while_each_thread() should be only used if know that ->thread_group
> list is valid.
> 
> For example, if you do find_task_by_vpid() under rcu_read_lock()
> while_each_thread() is fine _unless_ you drop rcu lock in between.
> 
> This is the common mistake, people often forget about this.
> 
> But I can't understand how this patch can fix the problem, I think it
> can't.
> 
> From the changelog:
> 
> 	When SIGKILL is sent to a task, it's also sent to all tasks in the same
> 	threadgroup. This information can be used to prevent triggering further
> 	oom killers for this threadgroup and avoid the infinite loop.
>                                              ^^^^^^^^^^^^^^^^^^^^^^^
> 
> How??

I guess it assumes that fatal_signal_pending() is still true even when
the process is unhashed already. Which sounds like a workaround to me.

> > Oleg, is there anything that would prevent from this race? Maybe we need
> > to call thread_group_empty before?
> 
> You need to check, say, pid_alive(). Or PF_EXITING.
> 
> For example oom_kill_process() looks wrong. It does check PF_EXITING
> before while_each_thread(), but this is racy because it should check
> it under tasklist_lock.

Thanks for the clarification. I have tried to deduce the locking rules
but failed...

> So I think that oom_kill_process() needs something like below, but
> this code really needs the cleanups.
> 
> Oleg.
> 
> 
> --- x/mm/oom_kill.c
> +++ x/mm/oom_kill.c
> @@ -436,6 +436,14 @@ void oom_kill_process(struct task_struct
>  	 * parent.  This attempts to lose the minimal amount of work done while
>  	 * still freeing memory.
>  	 */
> +	rcu_read_lock();
> +	if (!pid_alive(p)) {
> +		rcu_read_unlock();
> +		set_tsk_thread_flag(p, TIF_MEMDIE);
> +		put_task_struct(p);
> +		return;
> +	}
> +
>  	read_lock(&tasklist_lock);
>  	do {
>  		list_for_each_entry(child, &t->children, sibling) {
> @@ -458,7 +466,6 @@ void oom_kill_process(struct task_struct
>  	} while_each_thread(p, t);
>  	read_unlock(&tasklist_lock);
>  
> -	rcu_read_lock();
>  	p = find_lock_task_mm(victim);
>  	if (!p) {
>  		rcu_read_unlock();

Makes sense to me.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-24 15:22         ` Michal Hocko
@ 2013-04-24 15:42           ` Oleg Nesterov
  2013-04-24 19:33             ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-04-24 15:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: dserrg, linux-mm, Andrew Morton, David Rientjes,
	KAMEZAWA Hiroyuki, Sha Zhengju

On 04/24, Michal Hocko wrote:
>
> On Wed 24-04-13 16:55:14, Oleg Nesterov wrote:
> >
> > But I can't understand how this patch can fix the problem, I think it
> > can't.
> >
> > From the changelog:
> >
> > 	When SIGKILL is sent to a task, it's also sent to all tasks in the same
> > 	threadgroup. This information can be used to prevent triggering further
> > 	oom killers for this threadgroup and avoid the infinite loop.
> >                                              ^^^^^^^^^^^^^^^^^^^^^^^
> >
> > How??
>
> I guess it assumes that fatal_signal_pending() is still true even when
> the process is unhashed already.

No, it is not (in general). The task can dequeue this SIGKIL and then
exit. But this doesn't matter.

> Which sounds like a workaround to me.

The task can do everything after we check PF_EXITING or whatever else.
Just suppose it is alive and running, but before we take tasklist_lock
it exits and removes itself from list.

But wait, I forgot that "p" is not necessarily the main thread, so
the patch I sent is not enough...

Oh, and this reminds me again but we can race with exec... but this
is mostly theoretical. should be fixed anyway.

I'll try to think more tomorrow. I need to recall the previous discussion
at least.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-24 15:42           ` Oleg Nesterov
@ 2013-04-24 19:33             ` Andrew Morton
  2013-04-25 14:49               ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2013-04-24 19:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Michal Hocko, dserrg, linux-mm, David Rientjes, KAMEZAWA Hiroyuki,
	Sha Zhengju

On Wed, 24 Apr 2013 17:42:16 +0200 Oleg Nesterov <oleg@redhat.com> wrote:

> On 04/24, Michal Hocko wrote:
> >
> > On Wed 24-04-13 16:55:14, Oleg Nesterov wrote:
> > >
> > > But I can't understand how this patch can fix the problem, I think it
> > > can't.
> > >
> > > From the changelog:
> > >
> > > 	When SIGKILL is sent to a task, it's also sent to all tasks in the same
> > > 	threadgroup. This information can be used to prevent triggering further
> > > 	oom killers for this threadgroup and avoid the infinite loop.
> > >                                              ^^^^^^^^^^^^^^^^^^^^^^^
> > >
> > > How??
> >
> > I guess it assumes that fatal_signal_pending() is still true even when
> > the process is unhashed already.
> 
> No, it is not (in general). The task can dequeue this SIGKIL and then
> exit. But this doesn't matter.
> 
> > Which sounds like a workaround to me.
> 
> The task can do everything after we check PF_EXITING or whatever else.
> Just suppose it is alive and running, but before we take tasklist_lock
> it exits and removes itself from list.
> 
> But wait, I forgot that "p" is not necessarily the main thread, so
> the patch I sent is not enough...
> 
> Oh, and this reminds me again but we can race with exec... but this
> is mostly theoretical. should be fixed anyway.
> 
> I'll try to think more tomorrow. I need to recall the previous discussion
> at least.

Where does this leave us with Sergey's patch?  "Still good, but
requires new changelog"?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-24 19:33             ` Andrew Morton
@ 2013-04-25 14:49               ` Oleg Nesterov
  2013-04-25 15:41                 ` Sergey Dyasly
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-04-25 14:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, dserrg, linux-mm, David Rientjes, KAMEZAWA Hiroyuki,
	Sha Zhengju

On 04/24, Andrew Morton wrote:
>
> Where does this leave us with Sergey's patch?  "Still good, but
> requires new changelog"?

Sergey is certainly right, this needs the fixes (thanks Sergey!).

But afaics the patch can't help, we need another solution.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-25 14:49               ` Oleg Nesterov
@ 2013-04-25 15:41                 ` Sergey Dyasly
  2013-04-25 16:22                   ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Sergey Dyasly @ 2013-04-25 15:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Michal Hocko, linux-mm, David Rientjes,
	KAMEZAWA Hiroyuki, Sha Zhengju

On Thu, 25 Apr 2013 16:49:55 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 04/24, Andrew Morton wrote:
> >
> > Where does this leave us with Sergey's patch?  "Still good, but
> > requires new changelog"?
> 
> Sergey is certainly right, this needs the fixes (thanks Sergey!).
> 
> But afaics the patch can't help, we need another solution.
> 
> Oleg.
> 

Oleg, thanks for your comments!

Indeed, my patch is intended for one particular case (OOM killer and
threadgroups). But in general case there is still a race, so my patch
should be reworked.

Still, I think that fatal_signal_pending() is a sensible check, since the one
for current task is already there, and there is a patch from David Rientjes
that does almost the same [1]. So maybe it's a general condition for not
invoking OOM killer but just setting TIF_MEMDIE for a task?

[1] - http://www.spinics.net/lists/linux-mm/msg54662.html

-- 
Sergey Dyasly <dserrg@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-25 15:41                 ` Sergey Dyasly
@ 2013-04-25 16:22                   ` Oleg Nesterov
  2013-05-02 17:20                     ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-04-25 16:22 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: Andrew Morton, Michal Hocko, linux-mm, David Rientjes,
	KAMEZAWA Hiroyuki, Sha Zhengju

On 04/25, Sergey Dyasly wrote:
>
> But in general case there is still a race,

Yes. Every while_each_thread() in oom-kill is wrong, and I am still not
sure what should/can we do. Will try to think more.

You know, partly this is the known problem. while_each_thread(g,t) is not
safe lockless unless g is the main thread. And another bug is that even
if it is the main thread it can race with exec. The 2nd problem should be
fixed, but when we discussed this previously we were going to disallow
the lockless while_each_thread(sub_thread)...

And yes, I missed these problems when we discussed 6b0c81b3 :/

> Still, I think that fatal_signal_pending() is a sensible check, since the one
> for current task is already there, and there is a patch from David Rientjes
> that does almost the same [1].

Oh, I can't comment this. I leave this to you and David.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-04-25 16:22                   ` Oleg Nesterov
@ 2013-05-02 17:20                     ` Oleg Nesterov
  2013-05-27 15:49                       ` Sergey Dyasly
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-05-02 17:20 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: Andrew Morton, Michal Hocko, linux-mm, David Rientjes,
	KAMEZAWA Hiroyuki, Sha Zhengju

Just to let you know that this time I didn't forget about this problem ;)

On 04/25, Oleg Nesterov wrote:
>
> On 04/25, Sergey Dyasly wrote:
> >
> > But in general case there is still a race,
>
> Yes. Every while_each_thread() in oom-kill is wrong, and I am still not
> sure what should/can we do. Will try to think more.

And I still can't find a simple/clean solution.

OK. I am starting to think we should probably switch to Plan B. We can add
thread_head into task_struct->signal and convert while_each_thread() into
list_for_each_rcu(). This should work, but this is really painful and I was
going to avoid this as much as possible...

I'll try to do something once I return from vacation (May 9). Heh, See also
http://marc.info/?l=linux-kernel&m=127688978121665 and the whole thread.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-05-02 17:20                     ` Oleg Nesterov
@ 2013-05-27 15:49                       ` Sergey Dyasly
  2013-05-27 16:16                         ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Sergey Dyasly @ 2013-05-27 15:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Michal Hocko, linux-mm, David Rientjes,
	KAMEZAWA Hiroyuki, Sha Zhengju

Hi, Oleg

Adding thread_head into task_struct->signal would be the best solution imho.
This way list will be properly protected by rcu_read_lock(). But you called it
"really painful". I guess that's because all users of while_each_thread(g, t)
must be modified with 'g' pointing to the new thread_head. And I've counted
50 usages of while_each_thread() across the kernel.

Is this really that bad?

Regards.


On Thu, 2 May 2013 19:20:22 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> Just to let you know that this time I didn't forget about this problem ;)
> 
> On 04/25, Oleg Nesterov wrote:
> >
> > On 04/25, Sergey Dyasly wrote:
> > >
> > > But in general case there is still a race,
> >
> > Yes. Every while_each_thread() in oom-kill is wrong, and I am still not
> > sure what should/can we do. Will try to think more.
> 
> And I still can't find a simple/clean solution.
> 
> OK. I am starting to think we should probably switch to Plan B. We can add
> thread_head into task_struct->signal and convert while_each_thread() into
> list_for_each_rcu(). This should work, but this is really painful and I was
> going to avoid this as much as possible...
> 
> I'll try to do something once I return from vacation (May 9). Heh, See also
> http://marc.info/?l=linux-kernel&m=127688978121665 and the whole thread.
> 
> Oleg.
> 


-- 
Sergey Dyasly <dserrg@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] oom: add pending SIGKILL check for chosen victim
  2013-05-27 15:49                       ` Sergey Dyasly
@ 2013-05-27 16:16                         ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-05-27 16:16 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: Andrew Morton, Michal Hocko, linux-mm, David Rientjes,
	KAMEZAWA Hiroyuki, Sha Zhengju

Hi Sergey,

Cough... I hoped that I will send at least some changes before another
reminder, but I am late again ;)

On 05/27, Sergey Dyasly wrote:
>
> Adding thread_head into task_struct->signal would be the best solution imho.
> This way list will be properly protected by rcu_read_lock(). But you called it
> "really painful". I guess that's because all users of while_each_thread(g, t)
> must be modified with 'g' pointing to the new thread_head. And I've counted
> 50 usages of while_each_thread() across the kernel.

Plus sometimes we need to iterate the group starting from non-leader.
And we need to keep both old/new lists before we convert all users,
and some other complications.

I hope I'll send some preparational patches today, I have already started
this (today ;).

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-05-27 16:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-22 15:06 [PATCH] oom: add pending SIGKILL check for chosen victim Sergey Dyasly
2013-04-22 19:51 ` Michal Hocko
2013-04-23 15:26   ` dserrg
2013-04-23 15:56     ` Michal Hocko
2013-04-24 14:55       ` Oleg Nesterov
2013-04-24 15:22         ` Michal Hocko
2013-04-24 15:42           ` Oleg Nesterov
2013-04-24 19:33             ` Andrew Morton
2013-04-25 14:49               ` Oleg Nesterov
2013-04-25 15:41                 ` Sergey Dyasly
2013-04-25 16:22                   ` Oleg Nesterov
2013-05-02 17:20                     ` Oleg Nesterov
2013-05-27 15:49                       ` Sergey Dyasly
2013-05-27 16:16                         ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).