[PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
@ 2015-08-25  7:56 Wanpeng Li
  0 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2015-08-25  7:56 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel, Wanpeng Li

[   15.273708] ------------[ cut here ]------------
[   15.274097] WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80()
[   15.274857] Modules linked in:
[   15.275101] CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2
[   15.275674]  00000000 00000000 d21f1d24 c19228b2 00000000 d21f1d58 c1056a3b c1ba00e4
[   15.276084]  00000000 0000000d c1ba17d8 00000484 c10838be 00000484 c10838be d21e5000
[   15.276084]  d2121900 d21e5158 d21f1d68 c1056b12 00000009 00000000 d21f1d7c c10838be
[   15.276084] Call Trace:
[   15.276084]  [<c19228b2>] dump_stack+0x4b/0x75
[   15.276084]  [<c1056a3b>] warn_slowpath_common+0x8b/0xc0
[   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
[   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
[   15.276084]  [<c1056b12>] warn_slowpath_null+0x22/0x30
[   15.276084]  [<c10838be>] do_set_cpus_allowed+0x7e/0x80
[   15.276084]  [<c110154c>] cpuset_cpus_allowed_fallback+0x7c/0x170
[   15.276084]  [<c11014d0>] ? cpuset_cpus_allowed+0x180/0x180
[   15.276084]  [<c1083ae1>] select_fallback_rq+0x221/0x280
[   15.276084]  [<c1085073>] migration_call+0xe3/0x250
[   15.276084]  [<c1079e23>] notifier_call_chain+0x53/0x70
[   15.276084]  [<c1079e5e>] __raw_notifier_call_chain+0x1e/0x30
[   15.276084]  [<c1056cc8>] cpu_notify+0x28/0x50
[   15.276084]  [<c191e4d2>] take_cpu_down+0x22/0x40
[   15.276084]  [<c1102895>] multi_cpu_stop+0xd5/0x140
[   15.276084]  [<c11027c0>] ? __stop_cpus+0x80/0x80
[   15.276084]  [<c11025cc>] cpu_stopper_thread+0xbc/0x170
[   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
[   15.276084]  [<c192b6a7>] ? _raw_spin_unlock_irq+0x37/0x50
[   15.276084]  [<c192b655>] ? _raw_spin_unlock_irqrestore+0x55/0x70
[   15.276084]  [<c10a9074>] ? trace_hardirqs_on_caller+0x144/0x1e0
[   15.276084]  [<c11024a5>] ? cpu_stop_should_run+0x35/0x40
[   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
[   15.276084]  [<c192b641>] ? _raw_spin_unlock_irqrestore+0x41/0x70
[   15.276084]  [<c107c944>] smpboot_thread_fn+0x174/0x2f0
[   15.276084]  [<c107c7d0>] ? sort_range+0x30/0x30
[   15.276084]  [<c1078934>] kthread+0xc4/0xe0
[   15.276084]  [<c192c041>] ret_from_kernel_thread+0x21/0x30
[   15.276084]  [<c1078870>] ? kthread_create_on_node+0x180/0x180
[   15.276084] ---[ end trace 15f4c86d404693b0 ]---

After commit (25834c73f93: sched: Fix a race between __kthread_bind() 
and sched_setaffinity()) do_set_cpus_allowed() should be called w/
p->pi_lock held, however, it is not true in cpuset path currently. 

This patch fix it by holding p->pi_lock in cpuset path.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 kernel/cpuset.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index e414ae9..605ed66 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
+	unsigned long flags;
+
 	rcu_read_lock();
+	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
 	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
+	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
 	rcu_read_unlock();
 
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
@ 2015-08-25  7:59 Wanpeng Li
  2015-08-25  8:13 ` Leo Yan
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Wanpeng Li @ 2015-08-25  7:59 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel, Wanpeng Li

[   15.273708] ------------[ cut here ]------------
[   15.274097] WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80()
[   15.274857] Modules linked in:
[   15.275101] CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2
[   15.275674]  00000000 00000000 d21f1d24 c19228b2 00000000 d21f1d58 c1056a3b c1ba00e4
[   15.276084]  00000000 0000000d c1ba17d8 00000484 c10838be 00000484 c10838be d21e5000
[   15.276084]  d2121900 d21e5158 d21f1d68 c1056b12 00000009 00000000 d21f1d7c c10838be
[   15.276084] Call Trace:
[   15.276084]  [<c19228b2>] dump_stack+0x4b/0x75
[   15.276084]  [<c1056a3b>] warn_slowpath_common+0x8b/0xc0
[   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
[   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
[   15.276084]  [<c1056b12>] warn_slowpath_null+0x22/0x30
[   15.276084]  [<c10838be>] do_set_cpus_allowed+0x7e/0x80
[   15.276084]  [<c110154c>] cpuset_cpus_allowed_fallback+0x7c/0x170
[   15.276084]  [<c11014d0>] ? cpuset_cpus_allowed+0x180/0x180
[   15.276084]  [<c1083ae1>] select_fallback_rq+0x221/0x280
[   15.276084]  [<c1085073>] migration_call+0xe3/0x250
[   15.276084]  [<c1079e23>] notifier_call_chain+0x53/0x70
[   15.276084]  [<c1079e5e>] __raw_notifier_call_chain+0x1e/0x30
[   15.276084]  [<c1056cc8>] cpu_notify+0x28/0x50
[   15.276084]  [<c191e4d2>] take_cpu_down+0x22/0x40
[   15.276084]  [<c1102895>] multi_cpu_stop+0xd5/0x140
[   15.276084]  [<c11027c0>] ? __stop_cpus+0x80/0x80
[   15.276084]  [<c11025cc>] cpu_stopper_thread+0xbc/0x170
[   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
[   15.276084]  [<c192b6a7>] ? _raw_spin_unlock_irq+0x37/0x50
[   15.276084]  [<c192b655>] ? _raw_spin_unlock_irqrestore+0x55/0x70
[   15.276084]  [<c10a9074>] ? trace_hardirqs_on_caller+0x144/0x1e0
[   15.276084]  [<c11024a5>] ? cpu_stop_should_run+0x35/0x40
[   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
[   15.276084]  [<c192b641>] ? _raw_spin_unlock_irqrestore+0x41/0x70
[   15.276084]  [<c107c944>] smpboot_thread_fn+0x174/0x2f0
[   15.276084]  [<c107c7d0>] ? sort_range+0x30/0x30
[   15.276084]  [<c1078934>] kthread+0xc4/0xe0
[   15.276084]  [<c192c041>] ret_from_kernel_thread+0x21/0x30
[   15.276084]  [<c1078870>] ? kthread_create_on_node+0x180/0x180
[   15.276084] ---[ end trace 15f4c86d404693b0 ]---

After commit (25834c73f93: sched: Fix a race between __kthread_bind() 
and sched_setaffinity()) do_set_cpus_allowed() should be called w/
p->pi_lock held, however, it is not true in cpuset path currently. 

This patch fix it by holding p->pi_lock in cpuset path.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 kernel/cpuset.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index e414ae9..605ed66 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
+	unsigned long flags;
+
 	rcu_read_lock();
+	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
 	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
+	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
 	rcu_read_unlock();
 
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25  7:59 [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed() Wanpeng Li
@ 2015-08-25  8:13 ` Leo Yan
  2015-08-25  8:24   ` Wanpeng Li
  2015-08-25 10:05 ` Peter Zijlstra
  2015-08-27 13:47 ` T. Zhou
  2 siblings, 1 reply; 13+ messages in thread
From: Leo Yan @ 2015-08-25  8:13 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

Hi Wanpeng,

On Tue, Aug 25, 2015 at 03:59:54PM +0800, Wanpeng Li wrote:
> [   15.273708] ------------[ cut here ]------------
> [   15.274097] WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80()
> [   15.274857] Modules linked in:
> [   15.275101] CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2
> [   15.275674]  00000000 00000000 d21f1d24 c19228b2 00000000 d21f1d58 c1056a3b c1ba00e4
> [   15.276084]  00000000 0000000d c1ba17d8 00000484 c10838be 00000484 c10838be d21e5000
> [   15.276084]  d2121900 d21e5158 d21f1d68 c1056b12 00000009 00000000 d21f1d7c c10838be
> [   15.276084] Call Trace:
> [   15.276084]  [<c19228b2>] dump_stack+0x4b/0x75
> [   15.276084]  [<c1056a3b>] warn_slowpath_common+0x8b/0xc0
> [   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
> [   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
> [   15.276084]  [<c1056b12>] warn_slowpath_null+0x22/0x30
> [   15.276084]  [<c10838be>] do_set_cpus_allowed+0x7e/0x80
> [   15.276084]  [<c110154c>] cpuset_cpus_allowed_fallback+0x7c/0x170
> [   15.276084]  [<c11014d0>] ? cpuset_cpus_allowed+0x180/0x180
> [   15.276084]  [<c1083ae1>] select_fallback_rq+0x221/0x280
> [   15.276084]  [<c1085073>] migration_call+0xe3/0x250
> [   15.276084]  [<c1079e23>] notifier_call_chain+0x53/0x70
> [   15.276084]  [<c1079e5e>] __raw_notifier_call_chain+0x1e/0x30
> [   15.276084]  [<c1056cc8>] cpu_notify+0x28/0x50
> [   15.276084]  [<c191e4d2>] take_cpu_down+0x22/0x40
> [   15.276084]  [<c1102895>] multi_cpu_stop+0xd5/0x140
> [   15.276084]  [<c11027c0>] ? __stop_cpus+0x80/0x80
> [   15.276084]  [<c11025cc>] cpu_stopper_thread+0xbc/0x170
> [   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
> [   15.276084]  [<c192b6a7>] ? _raw_spin_unlock_irq+0x37/0x50
> [   15.276084]  [<c192b655>] ? _raw_spin_unlock_irqrestore+0x55/0x70
> [   15.276084]  [<c10a9074>] ? trace_hardirqs_on_caller+0x144/0x1e0
> [   15.276084]  [<c11024a5>] ? cpu_stop_should_run+0x35/0x40
> [   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
> [   15.276084]  [<c192b641>] ? _raw_spin_unlock_irqrestore+0x41/0x70
> [   15.276084]  [<c107c944>] smpboot_thread_fn+0x174/0x2f0
> [   15.276084]  [<c107c7d0>] ? sort_range+0x30/0x30
> [   15.276084]  [<c1078934>] kthread+0xc4/0xe0
> [   15.276084]  [<c192c041>] ret_from_kernel_thread+0x21/0x30
> [   15.276084]  [<c1078870>] ? kthread_create_on_node+0x180/0x180
> [   15.276084] ---[ end trace 15f4c86d404693b0 ]---
> 
> After commit (25834c73f93: sched: Fix a race between __kthread_bind() 
> and sched_setaffinity()) do_set_cpus_allowed() should be called w/
> p->pi_lock held, however, it is not true in cpuset path currently. 
> 
> This patch fix it by holding p->pi_lock in cpuset path.
> 
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  kernel/cpuset.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index e414ae9..605ed66 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>  
>  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
>  {
> +	unsigned long flags;
> +
>  	rcu_read_lock();
> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
>  	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);

Just curious, Will introduce deadlock after acquire lock twice? ;)

Thanks,
Leo Yan

>  	rcu_read_unlock();
>  
>  	/*
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25  8:13 ` Leo Yan
@ 2015-08-25  8:24   ` Wanpeng Li
  2015-08-25  8:30     ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2015-08-25  8:24 UTC (permalink / raw)
  To: Leo Yan; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

On 8/25/15 4:13 PM, Leo Yan wrote:
> Hi Wanpeng,
>
> On Tue, Aug 25, 2015 at 03:59:54PM +0800, Wanpeng Li wrote:
>> [   15.273708] ------------[ cut here ]------------
>> [   15.274097] WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80()
>> [   15.274857] Modules linked in:
>> [   15.275101] CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2
>> [   15.275674]  00000000 00000000 d21f1d24 c19228b2 00000000 d21f1d58 c1056a3b c1ba00e4
>> [   15.276084]  00000000 0000000d c1ba17d8 00000484 c10838be 00000484 c10838be d21e5000
>> [   15.276084]  d2121900 d21e5158 d21f1d68 c1056b12 00000009 00000000 d21f1d7c c10838be
>> [   15.276084] Call Trace:
>> [   15.276084]  [<c19228b2>] dump_stack+0x4b/0x75
>> [   15.276084]  [<c1056a3b>] warn_slowpath_common+0x8b/0xc0
>> [   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
>> [   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
>> [   15.276084]  [<c1056b12>] warn_slowpath_null+0x22/0x30
>> [   15.276084]  [<c10838be>] do_set_cpus_allowed+0x7e/0x80
>> [   15.276084]  [<c110154c>] cpuset_cpus_allowed_fallback+0x7c/0x170
>> [   15.276084]  [<c11014d0>] ? cpuset_cpus_allowed+0x180/0x180
>> [   15.276084]  [<c1083ae1>] select_fallback_rq+0x221/0x280
>> [   15.276084]  [<c1085073>] migration_call+0xe3/0x250
>> [   15.276084]  [<c1079e23>] notifier_call_chain+0x53/0x70
>> [   15.276084]  [<c1079e5e>] __raw_notifier_call_chain+0x1e/0x30
>> [   15.276084]  [<c1056cc8>] cpu_notify+0x28/0x50
>> [   15.276084]  [<c191e4d2>] take_cpu_down+0x22/0x40
>> [   15.276084]  [<c1102895>] multi_cpu_stop+0xd5/0x140
>> [   15.276084]  [<c11027c0>] ? __stop_cpus+0x80/0x80
>> [   15.276084]  [<c11025cc>] cpu_stopper_thread+0xbc/0x170
>> [   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
>> [   15.276084]  [<c192b6a7>] ? _raw_spin_unlock_irq+0x37/0x50
>> [   15.276084]  [<c192b655>] ? _raw_spin_unlock_irqrestore+0x55/0x70
>> [   15.276084]  [<c10a9074>] ? trace_hardirqs_on_caller+0x144/0x1e0
>> [   15.276084]  [<c11024a5>] ? cpu_stop_should_run+0x35/0x40
>> [   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
>> [   15.276084]  [<c192b641>] ? _raw_spin_unlock_irqrestore+0x41/0x70
>> [   15.276084]  [<c107c944>] smpboot_thread_fn+0x174/0x2f0
>> [   15.276084]  [<c107c7d0>] ? sort_range+0x30/0x30
>> [   15.276084]  [<c1078934>] kthread+0xc4/0xe0
>> [   15.276084]  [<c192c041>] ret_from_kernel_thread+0x21/0x30
>> [   15.276084]  [<c1078870>] ? kthread_create_on_node+0x180/0x180
>> [   15.276084] ---[ end trace 15f4c86d404693b0 ]---
>>
>> After commit (25834c73f93: sched: Fix a race between __kthread_bind()
>> and sched_setaffinity()) do_set_cpus_allowed() should be called w/
>> p->pi_lock held, however, it is not true in cpuset path currently.
>>
>> This patch fix it by holding p->pi_lock in cpuset path.
>>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>   kernel/cpuset.c |    4 ++++
>>   1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
>> index e414ae9..605ed66 100644
>> --- a/kernel/cpuset.c
>> +++ b/kernel/cpuset.c
>> @@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>>   
>>   void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
>>   {
>> +	unsigned long flags;
>> +
>>   	rcu_read_lock();
>> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
>>   	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
>> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> Just curious, Will introduce deadlock after acquire lock twice? ;)

Could you point out where acquires lock twice?

Regards,
Wanpeng Li

>
> Thanks,
> Leo Yan
>
>>   	rcu_read_unlock();
>>   
>>   	/*
>> -- 
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25  8:24   ` Wanpeng Li
@ 2015-08-25  8:30     ` Ingo Molnar
  2015-08-25  8:38       ` Wanpeng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2015-08-25  8:30 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Leo Yan, Peter Zijlstra, linux-kernel


* Wanpeng Li <wanpeng.li@hotmail.com> wrote:

> >>--- a/kernel/cpuset.c
> >>+++ b/kernel/cpuset.c
> >>@@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> >>  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
> >>  {
> >>+	unsigned long flags;
> >>+
> >>  	rcu_read_lock();
> >>+	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> >>  	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
> >>+	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> >Just curious, Will introduce deadlock after acquire lock twice? ;)
> 
> Could you point out where acquires lock twice?

In the code you quote?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25  8:30     ` Ingo Molnar
@ 2015-08-25  8:38       ` Wanpeng Li
  0 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2015-08-25  8:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Leo Yan, Peter Zijlstra, linux-kernel

On 8/25/15 4:30 PM, Ingo Molnar wrote:
> * Wanpeng Li <wanpeng.li@hotmail.com> wrote:
>
>>>> --- a/kernel/cpuset.c
>>>> +++ b/kernel/cpuset.c
>>>> @@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>>>>   void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
>>>>   {
>>>> +	unsigned long flags;
>>>> +
>>>>   	rcu_read_lock();
>>>> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
>>>>   	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
>>>> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
>>> Just curious, Will introduce deadlock after acquire lock twice? ;)
>> Could you point out where acquires lock twice?
> In the code you quote?

Shame me, sorry for sending out the wrong version. I fix in it soon in v2.

Regards,
Wanpeng Li

>
> Thanks,
>
> 	Ingo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25  7:59 [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed() Wanpeng Li
  2015-08-25  8:13 ` Leo Yan
@ 2015-08-25 10:05 ` Peter Zijlstra
  2015-08-25 10:10   ` Peter Zijlstra
  2015-08-25 10:18   ` Wanpeng Li
  2015-08-27 13:47 ` T. Zhou
  2 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2015-08-25 10:05 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Ingo Molnar, linux-kernel

On Tue, Aug 25, 2015 at 03:59:54PM +0800, Wanpeng Li wrote:
> +++ b/kernel/cpuset.c
> @@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>  
>  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
>  {
> +	unsigned long flags;
> +
>  	rcu_read_lock();
> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
>  	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
>  	rcu_read_unlock();

Aside from the double lock thing that was already pointed out, I think
this is wrong, because the select_task_rq() call can already have
pi_lock held.

Taking it again would result in a deadlock.

Consider for instance:

try_to_wake_up()
  raw_spin_lock_irqsave(->pi_lock)
  select_task_rq()
    select_ballback_rq()
      cpuset_cpus_allowed_fallback()
        raw_spin_lock_irqsave(->pi_lock)


The problem is with the migration path and should be fixed there.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25 10:05 ` Peter Zijlstra
@ 2015-08-25 10:10   ` Peter Zijlstra
  2015-08-25 10:32     ` Peter Zijlstra
  2015-08-25 10:18   ` Wanpeng Li
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2015-08-25 10:10 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Ingo Molnar, linux-kernel

On Tue, Aug 25, 2015 at 12:05:27PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 25, 2015 at 03:59:54PM +0800, Wanpeng Li wrote:
> > +++ b/kernel/cpuset.c
> > @@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> >  
> >  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
> >  {
> > +	unsigned long flags;
> > +
> >  	rcu_read_lock();
> > +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> >  	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
> > +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> >  	rcu_read_unlock();
> 
> Aside from the double lock thing that was already pointed out, I think
> this is wrong, because the select_task_rq() call can already have
> pi_lock held.
> 
> Taking it again would result in a deadlock.
> 
> Consider for instance:
> 
> try_to_wake_up()
>   raw_spin_lock_irqsave(->pi_lock)
>   select_task_rq()
>     select_ballback_rq()
>       cpuset_cpus_allowed_fallback()
>         raw_spin_lock_irqsave(->pi_lock)
> 
> 
> The problem is with the migration path and should be fixed there.

Another problem, migration_call() will have rq->lock held, so you're
proposing to acquire pi_lock while holding rq->lock, this is an
inversion from the regular nesting order.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25 10:05 ` Peter Zijlstra
  2015-08-25 10:10   ` Peter Zijlstra
@ 2015-08-25 10:18   ` Wanpeng Li
  1 sibling, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2015-08-25 10:18 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On 8/25/15 6:05 PM, Peter Zijlstra wrote:
> On Tue, Aug 25, 2015 at 03:59:54PM +0800, Wanpeng Li wrote:
>> +++ b/kernel/cpuset.c
>> @@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>>   
>>   void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
>>   {
>> +	unsigned long flags;
>> +
>>   	rcu_read_lock();
>> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
>>   	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
>> +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
>>   	rcu_read_unlock();
> Aside from the double lock thing that was already pointed out, I think
> this is wrong, because the select_task_rq() call can already have
> pi_lock held.
>
> Taking it again would result in a deadlock.
>
> Consider for instance:
>
> try_to_wake_up()
>    raw_spin_lock_irqsave(->pi_lock)
>    select_task_rq()
>      select_ballback_rq()
>        cpuset_cpus_allowed_fallback()
>          raw_spin_lock_irqsave(->pi_lock)
>
>
> The problem is with the migration path and should be fixed there.

Indeed, it should be fixed in migration path. I will try to fight it out 
and post a patch. :)

Regards,
Wanpeng Li


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25 10:10   ` Peter Zijlstra
@ 2015-08-25 10:32     ` Peter Zijlstra
       [not found]       ` <BLU437-SMTP7261DF7A82716F49242C7B80610@phx.gbl>
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2015-08-25 10:32 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Ingo Molnar, linux-kernel

On Tue, Aug 25, 2015 at 12:10:32PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 25, 2015 at 12:05:27PM +0200, Peter Zijlstra wrote:
> > On Tue, Aug 25, 2015 at 03:59:54PM +0800, Wanpeng Li wrote:
> > > +++ b/kernel/cpuset.c
> > > @@ -2376,8 +2376,12 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> > >  
> > >  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
> > >  {
> > > +	unsigned long flags;
> > > +
> > >  	rcu_read_lock();
> > > +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> > >  	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
> > > +	raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> > >  	rcu_read_unlock();
> > 
> > Aside from the double lock thing that was already pointed out, I think
> > this is wrong, because the select_task_rq() call can already have
> > pi_lock held.
> > 
> > Taking it again would result in a deadlock.
> > 
> > Consider for instance:
> > 
> > try_to_wake_up()
> >   raw_spin_lock_irqsave(->pi_lock)
> >   select_task_rq()
> >     select_ballback_rq()
> >       cpuset_cpus_allowed_fallback()
> >         raw_spin_lock_irqsave(->pi_lock)
> > 
> > 
> > The problem is with the migration path and should be fixed there.
> 
> Another problem, migration_call() will have rq->lock held, so you're
> proposing to acquire pi_lock while holding rq->lock, this is an
> inversion from the regular nesting order.
> 

So Possibly, Maybe (I'm still to wrecked to say for sure), something
like this would work:

	WARN_ON(debug_locks && (lockdep_is_held(&p->pi_lock) ||
				(p->on_rq && lockdep_is_held(&rq->lock))));

Instead of those two separate lockdep asserts.

Please consider carefully.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-25  7:59 [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed() Wanpeng Li
  2015-08-25  8:13 ` Leo Yan
  2015-08-25 10:05 ` Peter Zijlstra
@ 2015-08-27 13:47 ` T. Zhou
  2 siblings, 0 replies; 13+ messages in thread
From: T. Zhou @ 2015-08-27 13:47 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

Hi,

On Tue, Aug 25, 2015 at 03:59:54PM +0800, Wanpeng Li wrote:
> [   15.273708] ------------[ cut here ]------------
> [   15.274097] WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80()
> [   15.274857] Modules linked in:
> [   15.275101] CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2
> [   15.275674]  00000000 00000000 d21f1d24 c19228b2 00000000 d21f1d58 c1056a3b c1ba00e4
> [   15.276084]  00000000 0000000d c1ba17d8 00000484 c10838be 00000484 c10838be d21e5000
> [   15.276084]  d2121900 d21e5158 d21f1d68 c1056b12 00000009 00000000 d21f1d7c c10838be
> [   15.276084] Call Trace:
> [   15.276084]  [<c19228b2>] dump_stack+0x4b/0x75
> [   15.276084]  [<c1056a3b>] warn_slowpath_common+0x8b/0xc0
> [   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
> [   15.276084]  [<c10838be>] ? do_set_cpus_allowed+0x7e/0x80
> [   15.276084]  [<c1056b12>] warn_slowpath_null+0x22/0x30
> [   15.276084]  [<c10838be>] do_set_cpus_allowed+0x7e/0x80
> [   15.276084]  [<c110154c>] cpuset_cpus_allowed_fallback+0x7c/0x170
> [   15.276084]  [<c11014d0>] ? cpuset_cpus_allowed+0x180/0x180
> [   15.276084]  [<c1083ae1>] select_fallback_rq+0x221/0x280
> [   15.276084]  [<c1085073>] migration_call+0xe3/0x250
> [   15.276084]  [<c1079e23>] notifier_call_chain+0x53/0x70
> [   15.276084]  [<c1079e5e>] __raw_notifier_call_chain+0x1e/0x30
> [   15.276084]  [<c1056cc8>] cpu_notify+0x28/0x50
> [   15.276084]  [<c191e4d2>] take_cpu_down+0x22/0x40
> [   15.276084]  [<c1102895>] multi_cpu_stop+0xd5/0x140
> [   15.276084]  [<c11027c0>] ? __stop_cpus+0x80/0x80
> [   15.276084]  [<c11025cc>] cpu_stopper_thread+0xbc/0x170
> [   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
> [   15.276084]  [<c192b6a7>] ? _raw_spin_unlock_irq+0x37/0x50
> [   15.276084]  [<c192b655>] ? _raw_spin_unlock_irqrestore+0x55/0x70
> [   15.276084]  [<c10a9074>] ? trace_hardirqs_on_caller+0x144/0x1e0
> [   15.276084]  [<c11024a5>] ? cpu_stop_should_run+0x35/0x40
> [   15.276084]  [<c1085ec9>] ? preempt_count_sub+0x9/0x50
> [   15.276084]  [<c192b641>] ? _raw_spin_unlock_irqrestore+0x41/0x70
> [   15.276084]  [<c107c944>] smpboot_thread_fn+0x174/0x2f0
> [   15.276084]  [<c107c7d0>] ? sort_range+0x30/0x30
> [   15.276084]  [<c1078934>] kthread+0xc4/0xe0
> [   15.276084]  [<c192c041>] ret_from_kernel_thread+0x21/0x30
> [   15.276084]  [<c1078870>] ? kthread_create_on_node+0x180/0x180
> [   15.276084] ---[ end trace 15f4c86d404693b0 ]---
> 

no experiment from me(hate myself and me). just guess this path:

    take_cpu_down() 
      cpu_notify(CPU_DYING)
        migration_call()

in migration_call(), there is CPU_DYING case. add these:

    raw_spin_lock_irqsave(&p->pi_lock, flags);
    raw_spin_lock(&rq->lock, flags);
    ...
    raw_spin_unlock(&rq->lock, flags);
    raw_spin_unlock_irqrestore(&p->pi_lock, flags);

no p->pi_lock and rq->lock inversed order(from Peter's review)
no lock on p->pi_lock two times(from Peter's review)

and in do_set_cpus_allowed(), add the following and delete some:

	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
				 p->on_rq && lockdep_is_held(&task_rq(p)->lock)));
    (from Peter's suggestion)

like what used in set_task_cpu()

better or right solution there :)

thanks,
-- 
Tao

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
       [not found]       ` <BLU437-SMTP7261DF7A82716F49242C7B80610@phx.gbl>
@ 2015-08-27 22:18         ` Peter Zijlstra
  2015-08-28  1:49           ` Wanpeng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2015-08-27 22:18 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Ingo Molnar, linux-kernel

On Tue, Aug 25, 2015 at 07:47:44PM +0800, Wanpeng Li wrote:
> On 8/25/15 6:32 PM, Peter Zijlstra wrote:

> >So Possibly, Maybe (I'm still to wrecked to say for sure), something
> >like this would work:
> >
> >	WARN_ON(debug_locks && (lockdep_is_held(&p->pi_lock) ||
> >				(p->on_rq && lockdep_is_held(&rq->lock))));
> >
> >Instead of those two separate lockdep asserts.
> >
> >Please consider carefully.

So the normal rules for changing task_struct::cpus_allowed are holding
both pi_lock and rq->lock, such that holding either stabilizes the mask.

This is so that wakeup can happen without rq->lock and load-balance
without pi_lock.

>From this we already get the relaxation that we can omit acquiring
rq->lock if the task is not on the rq, because in that case
load-balancing will not apply to it.

** these are the rules currently tested in do_set_cpus_allowed() **

Now, since __set_cpus_allowed_ptr() uses task_rq_lock() which
unconditionally acquires both locks, we could get away with holding just
rq->lock when on_rq for modification because that'd still exclude
__set_cpus_allowed_ptr(), it would also work against
__kthread_bind_mask() because that assumes !on_rq.

That said, this is all somewhat fragile.

> Commit (5e16bbc2f: sched: Streamline the task migration locking a little)
> won't hold the pi_lock in migrate_tasks() path any more, actually pi_lock
> was still not held when call select_fallback_rq() and it was held in
> __migrate_task() before  the commit. Then commit (25834c73f93: sched: Fix a
> race between __kthread_bind() and sched_setaffinity()) add a
> lockdep_assert_held() in do_set_cpus_allowed(), the bug is triggered. How
> about something like below:
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5186,6 +5186,15 @@ static void migrate_tasks(struct rq *dead_rq)
>                 BUG_ON(!next);
>                 next->sched_class->put_prev_task(rq, next);
> 
> +               raw_spin_unlock(&rq->lock);
> +               raw_spin_lock(&next->pi_lock);
> +               raw_spin_lock(&rq->lock);
> +               if (!(task_rq(next) == rq && task_on_rq_queued(next))) {
> +                       raw_spin_unlock(&rq->lock);
> +                       raw_spin_unlock(&next->pi_lock);
> +                       continue;
> +               }

Yeah, that's quite disgusting.. also you'll trip over the lockdep_pin if
you were to actually run this.

Now, I don't think dropping rq->lock is quite as disastrous as it
usually is because !cpu_active at this point, which means load-balance
will not interfere, but that too is somewhat fragile.

So we end up with a choice of two fragile.. let me ponder that a wee
bit more.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed()
  2015-08-27 22:18         ` Peter Zijlstra
@ 2015-08-28  1:49           ` Wanpeng Li
  0 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2015-08-28  1:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On 8/28/15 6:18 AM, Peter Zijlstra wrote:
> On Tue, Aug 25, 2015 at 07:47:44PM +0800, Wanpeng Li wrote:
>> On 8/25/15 6:32 PM, Peter Zijlstra wrote:
>>> So Possibly, Maybe (I'm still to wrecked to say for sure), something
>>> like this would work:
>>>
>>> 	WARN_ON(debug_locks && (lockdep_is_held(&p->pi_lock) ||
>>> 				(p->on_rq && lockdep_is_held(&rq->lock))));
>>>
>>> Instead of those two separate lockdep asserts.
>>>
>>> Please consider carefully.
> So the normal rules for changing task_struct::cpus_allowed are holding
> both pi_lock and rq->lock, such that holding either stabilizes the mask.
>
> This is so that wakeup can happen without rq->lock and load-balance
> without pi_lock.
>
>  From this we already get the relaxation that we can omit acquiring
> rq->lock if the task is not on the rq, because in that case
> load-balancing will not apply to it.
>
> ** these are the rules currently tested in do_set_cpus_allowed() **
>
> Now, since __set_cpus_allowed_ptr() uses task_rq_lock() which
> unconditionally acquires both locks, we could get away with holding just
> rq->lock when on_rq for modification because that'd still exclude
> __set_cpus_allowed_ptr(), it would also work against
> __kthread_bind_mask() because that assumes !on_rq.
>
> That said, this is all somewhat fragile.
>
>> Commit (5e16bbc2f: sched: Streamline the task migration locking a little)
>> won't hold the pi_lock in migrate_tasks() path any more, actually pi_lock
>> was still not held when call select_fallback_rq() and it was held in
>> __migrate_task() before  the commit. Then commit (25834c73f93: sched: Fix a
>> race between __kthread_bind() and sched_setaffinity()) add a
>> lockdep_assert_held() in do_set_cpus_allowed(), the bug is triggered. How
>> about something like below:
>>
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5186,6 +5186,15 @@ static void migrate_tasks(struct rq *dead_rq)
>>                  BUG_ON(!next);
>>                  next->sched_class->put_prev_task(rq, next);
>>
>> +               raw_spin_unlock(&rq->lock);
>> +               raw_spin_lock(&next->pi_lock);
>> +               raw_spin_lock(&rq->lock);
>> +               if (!(task_rq(next) == rq && task_on_rq_queued(next))) {
>> +                       raw_spin_unlock(&rq->lock);
>> +                       raw_spin_unlock(&next->pi_lock);
>> +                       continue;
>> +               }
> Yeah, that's quite disgusting.. also you'll trip over the lockdep_pin if
> you were to actually run this.

Indeed. I will handle lockdep_pin in these codes if you choice the 
second fragile. :-)

Regards,
Wanpeng Li

>
> Now, I don't think dropping rq->lock is quite as disastrous as it
> usually is because !cpu_active at this point, which means load-balance
> will not interfere, but that too is somewhat fragile.
>
>
> So we end up with a choice of two fragile.. let me ponder that a wee
> bit more.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-08-28  1:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-25  7:59 [PATCH] sched: fix tsk->pi_lock isn't held when do_set_cpus_allowed() Wanpeng Li
2015-08-25  8:13 ` Leo Yan
2015-08-25  8:24   ` Wanpeng Li
2015-08-25  8:30     ` Ingo Molnar
2015-08-25  8:38       ` Wanpeng Li
2015-08-25 10:05 ` Peter Zijlstra
2015-08-25 10:10   ` Peter Zijlstra
2015-08-25 10:32     ` Peter Zijlstra
     [not found]       ` <BLU437-SMTP7261DF7A82716F49242C7B80610@phx.gbl>
2015-08-27 22:18         ` Peter Zijlstra
2015-08-28  1:49           ` Wanpeng Li
2015-08-25 10:18   ` Wanpeng Li
2015-08-27 13:47 ` T. Zhou
  -- strict thread matches above, loose matches on Subject: below --
2015-08-25  7:56 Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox