[RFC][PATCH] sched: Fix a deadlock of cpu-hotplug

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
@ 2012-10-24  9:25 Huacai Chen
  2012-10-24  9:38 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Huacai Chen @ 2012-10-24  9:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel, Fuxin Zhang, Huacai Chen

We found poweroff sometimes fails on our computers, so we have the
lock debug options configured. Then, when we do poweroff or take a
cpu down via cpu-hotplug, kernel complain as below. To resove this,
we modify sched_ttwu_pending(), disable the local irq when acquire
rq->lock.

[   83.066406] =================================
[   83.066406] [ INFO: inconsistent lock state ]
[   83.066406] 3.5.0-3.lemote #428 Not tainted
[   83.066406] ---------------------------------
[   83.066406] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[   83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   83.066406]  (&rq->lock){?.-.-.}, at: [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
[   83.066406] {IN-HARDIRQ-W} state was registered at:
[   83.066406]   [<ffffffff8027c9ac>] __lock_acquire+0x80c/0x1cc0
[   83.066406]   [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
[   83.066406]   [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
[   83.066406]   [<ffffffff8025a2fc>] scheduler_tick+0x48/0x178
[   83.066406]   [<ffffffff8023b334>] update_process_times+0x54/0x70
[   83.066406]   [<ffffffff80277568>] tick_handle_periodic+0x2c/0x9c
[   83.066406]   [<ffffffff8020a818>] c0_compare_interrupt+0x8c/0x94
[   83.066406]   [<ffffffff8029ec8c>] handle_irq_event_percpu+0x7c/0x248
[   83.066406]   [<ffffffff802a2774>] handle_percpu_irq+0x8c/0xc0
[   83.066406]   [<ffffffff8029e2c8>] generic_handle_irq+0x48/0x58
[   83.066406]   [<ffffffff80205c04>] do_IRQ+0x18/0x24
[   83.066406]   [<ffffffff802016e4>] mach_irq_dispatch+0xe4/0x124
[   83.066406]   [<ffffffff80203ca0>] ret_from_irq+0x0/0x4
[   83.066406]   [<ffffffff8022d114>] console_unlock+0x3e8/0x4c0
[   83.066406]   [<ffffffff811ff0d0>] con_init+0x370/0x398
[   83.066406]   [<ffffffff811fe3e0>] console_init+0x34/0x50
[   83.066406]   [<ffffffff811e4844>] start_kernel+0x2f8/0x4e0
[   83.066406] irq event stamp: 971
[   83.066406] hardirqs last  enabled at (971): [<ffffffff8021c384>] local_flush_tlb_all+0x134/0x17c
[   83.066406] hardirqs last disabled at (970): [<ffffffff8021c298>] local_flush_tlb_all+0x48/0x17c
[   83.066406] softirqs last  enabled at (0): [<ffffffff802298a4>] copy_process+0x510/0x117c
[   83.066406] softirqs last disabled at (0): [<          (null)>] (null)
[   83.066406]
[   83.066406] other info that might help us debug this:
[   83.066406]  Possible unsafe locking scenario:
[   83.066406]
[   83.066406]        CPU0
[   83.066406]        ----
[   83.066406]   lock(&rq->lock);
[   83.066406]   <Interrupt>
[   83.066406]     lock(&rq->lock);
[   83.066406]
[   83.066406]  *** DEADLOCK ***
[   83.066406]
[   83.066406] no locks held by migration/1/7.
[   83.066406]
[   83.066406] stack backtrace:
[   83.066406] Call Trace:
[   83.066406] [<ffffffff80747544>] dump_stack+0x8/0x34
[   83.066406] [<ffffffff8027ba04>] print_usage_bug+0x2ec/0x314
[   83.066406] [<ffffffff8027be28>] mark_lock+0x3fc/0x774
[   83.066406] [<ffffffff8027ca48>] __lock_acquire+0x8a8/0x1cc0
[   83.066406] [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
[   83.066406] [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
[   83.066406] [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
[   83.066406] [<ffffffff80745ff4>] migration_call+0x10c/0x2e0
[   83.066406] [<ffffffff80253110>] notifier_call_chain+0x44/0x94
[   83.066406] [<ffffffff8022eae0>] __cpu_notify+0x30/0x5c
[   83.066406] [<ffffffff8072b598>] take_cpu_down+0x5c/0x70
[   83.066406] [<ffffffff80299ba4>] stop_machine_cpu_stop+0x104/0x1e8
[   83.066406] [<ffffffff802997cc>] cpu_stopper_thread+0x110/0x1ac
[   83.066406] [<ffffffff8024c940>] kthread+0x88/0x90
[   83.066406] [<ffffffff80205ee4>] kernel_thread_helper+0x10/0x18

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 kernel/sched/core.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 36e2666..703754a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
 	struct llist_node *llist = llist_del_all(&rq->wake_list);
+	unsigned long flags;
 	struct task_struct *p;
 
-	raw_spin_lock(&rq->lock);
+	raw_spin_lock_irqsave(&rq->lock, flags);
 
 	while (llist) {
 		p = llist_entry(llist, struct task_struct, wake_entry);
@@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void)
 		ttwu_do_activate(rq, p, 0);
 	}
 
-	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }
 
 void scheduler_ipi(void)
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
  2012-10-24  9:25 Huacai Chen
@ 2012-10-24  9:38 ` Peter Zijlstra
  2012-10-25  3:32   ` Michael Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2012-10-24  9:38 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ingo Molnar, linux-kernel, Fuxin Zhang, Thomas Gleixner,
	Srivatsa S. Bhat, Tejun Heo

On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote:
> We found poweroff sometimes fails on our computers, so we have the
> lock debug options configured. Then, when we do poweroff or take a
> cpu down via cpu-hotplug, kernel complain as below. To resove this,
> we modify sched_ttwu_pending(), disable the local irq when acquire
> rq->lock.
> 
> [   83.066406] =================================
> [   83.066406] [ INFO: inconsistent lock state ]
> [   83.066406] 3.5.0-3.lemote #428 Not tainted
> [   83.066406] ---------------------------------
> [   83.066406] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [   83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [   83.066406]  (&rq->lock){?.-.-.}, at: [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
> [   83.066406] {IN-HARDIRQ-W} state was registered at:
> [   83.066406]   [<ffffffff8027c9ac>] __lock_acquire+0x80c/0x1cc0
> [   83.066406]   [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
> [   83.066406]   [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
> [   83.066406]   [<ffffffff8025a2fc>] scheduler_tick+0x48/0x178
> [   83.066406]   [<ffffffff8023b334>] update_process_times+0x54/0x70
> [   83.066406]   [<ffffffff80277568>] tick_handle_periodic+0x2c/0x9c
> [   83.066406]   [<ffffffff8020a818>] c0_compare_interrupt+0x8c/0x94
> [   83.066406]   [<ffffffff8029ec8c>] handle_irq_event_percpu+0x7c/0x248
> [   83.066406]   [<ffffffff802a2774>] handle_percpu_irq+0x8c/0xc0
> [   83.066406]   [<ffffffff8029e2c8>] generic_handle_irq+0x48/0x58
> [   83.066406]   [<ffffffff80205c04>] do_IRQ+0x18/0x24
> [   83.066406]   [<ffffffff802016e4>] mach_irq_dispatch+0xe4/0x124
> [   83.066406]   [<ffffffff80203ca0>] ret_from_irq+0x0/0x4
> [   83.066406]   [<ffffffff8022d114>] console_unlock+0x3e8/0x4c0
> [   83.066406]   [<ffffffff811ff0d0>] con_init+0x370/0x398
> [   83.066406]   [<ffffffff811fe3e0>] console_init+0x34/0x50
> [   83.066406]   [<ffffffff811e4844>] start_kernel+0x2f8/0x4e0
> [   83.066406] irq event stamp: 971
> [   83.066406] hardirqs last  enabled at (971): [<ffffffff8021c384>] local_flush_tlb_all+0x134/0x17c
> [   83.066406] hardirqs last disabled at (970): [<ffffffff8021c298>] local_flush_tlb_all+0x48/0x17c
> [   83.066406] softirqs last  enabled at (0): [<ffffffff802298a4>] copy_process+0x510/0x117c
> [   83.066406] softirqs last disabled at (0): [<          (null)>] (null)
> [   83.066406]
> [   83.066406] other info that might help us debug this:
> [   83.066406]  Possible unsafe locking scenario:
> [   83.066406]
> [   83.066406]        CPU0
> [   83.066406]        ----
> [   83.066406]   lock(&rq->lock);
> [   83.066406]   <Interrupt>
> [   83.066406]     lock(&rq->lock);
> [   83.066406]
> [   83.066406]  *** DEADLOCK ***
> [   83.066406]
> [   83.066406] no locks held by migration/1/7.
> [   83.066406]
> [   83.066406] stack backtrace:
> [   83.066406] Call Trace:
> [   83.066406] [<ffffffff80747544>] dump_stack+0x8/0x34
> [   83.066406] [<ffffffff8027ba04>] print_usage_bug+0x2ec/0x314
> [   83.066406] [<ffffffff8027be28>] mark_lock+0x3fc/0x774
> [   83.066406] [<ffffffff8027ca48>] __lock_acquire+0x8a8/0x1cc0
> [   83.066406] [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
> [   83.066406] [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
> [   83.066406] [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
> [   83.066406] [<ffffffff80745ff4>] migration_call+0x10c/0x2e0
> [   83.066406] [<ffffffff80253110>] notifier_call_chain+0x44/0x94
> [   83.066406] [<ffffffff8022eae0>] __cpu_notify+0x30/0x5c
> [   83.066406] [<ffffffff8072b598>] take_cpu_down+0x5c/0x70
> [   83.066406] [<ffffffff80299ba4>] stop_machine_cpu_stop+0x104/0x1e8
> [   83.066406] [<ffffffff802997cc>] cpu_stopper_thread+0x110/0x1ac
> [   83.066406] [<ffffffff8024c940>] kthread+0x88/0x90
> [   83.066406] [<ffffffff80205ee4>] kernel_thread_helper+0x10/0x18

Weird, that's from a CPU_DYING call, I thought those were with IRQs
disabled. 

Look at how __stop_machine() calls the function with IRQs disabled for !
stop_machine_initialized or !SMP. Also stop_machine_cpu_stop() seems to
disabled interrupts, so how do we end up calling take_cpu_down() with
IRQs enabled?

That simply doesn't make any sense.

> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  kernel/sched/core.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 36e2666..703754a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void)
>  {
>  	struct rq *rq = this_rq();
>  	struct llist_node *llist = llist_del_all(&rq->wake_list);
> +	unsigned long flags;
>  	struct task_struct *p;
>  
> -	raw_spin_lock(&rq->lock);
> +	raw_spin_lock_irqsave(&rq->lock, flags);
>  
>  	while (llist) {
>  		p = llist_entry(llist, struct task_struct, wake_entry);
> @@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void)
>  		ttwu_do_activate(rq, p, 0);
>  	}
>  
> -	raw_spin_unlock(&rq->lock);
> +	raw_spin_unlock_irqrestore(&rq->lock, flags);
>  }
>  
>  void scheduler_ipi(void)


That's wrong though, you add the cost to the common case instead of the
hardly ever ran hotplug case.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
@ 2012-10-24 12:34 陈华才
  2012-10-24 13:45 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: 陈华才 @ 2012-10-24 12:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Fuxin Zhang, Thomas Gleixner,
	Srivatsa S. Bhat, Tejun Heo


> On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote:
>> We found poweroff sometimes fails on our computers, so we have the
>> lock debug options configured. Then, when we do poweroff or take a
>> cpu down via cpu-hotplug, kernel complain as below. To resove this,
>> we modify sched_ttwu_pending(), disable the local irq when acquire
>> rq->lock.
>>
>> [   83.066406] =================================
>> [   83.066406] [ INFO: inconsistent lock state ]
>> [   83.066406] 3.5.0-3.lemote #428 Not tainted
>> [   83.066406] ---------------------------------
>> [   83.066406] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
>> [   83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> [   83.066406]  (&rq->lock){?.-.-.}, at: [<ffffffff802585ac>]
>> sched_ttwu_pending+0x64/0x98
>> [   83.066406] {IN-HARDIRQ-W} state was registered at:
>> [   83.066406]   [<ffffffff8027c9ac>] __lock_acquire+0x80c/0x1cc0
>> [   83.066406]   [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
>> [   83.066406]   [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
>> [   83.066406]   [<ffffffff8025a2fc>] scheduler_tick+0x48/0x178
>> [   83.066406]   [<ffffffff8023b334>] update_process_times+0x54/0x70
>> [   83.066406]   [<ffffffff80277568>] tick_handle_periodic+0x2c/0x9c
>> [   83.066406]   [<ffffffff8020a818>] c0_compare_interrupt+0x8c/0x94
>> [   83.066406]   [<ffffffff8029ec8c>] handle_irq_event_percpu+0x7c/0x248
>> [   83.066406]   [<ffffffff802a2774>] handle_percpu_irq+0x8c/0xc0
>> [   83.066406]   [<ffffffff8029e2c8>] generic_handle_irq+0x48/0x58
>> [   83.066406]   [<ffffffff80205c04>] do_IRQ+0x18/0x24
>> [   83.066406]   [<ffffffff802016e4>] mach_irq_dispatch+0xe4/0x124
>> [   83.066406]   [<ffffffff80203ca0>] ret_from_irq+0x0/0x4
>> [   83.066406]   [<ffffffff8022d114>] console_unlock+0x3e8/0x4c0
>> [   83.066406]   [<ffffffff811ff0d0>] con_init+0x370/0x398
>> [   83.066406]   [<ffffffff811fe3e0>] console_init+0x34/0x50
>> [   83.066406]   [<ffffffff811e4844>] start_kernel+0x2f8/0x4e0
>> [   83.066406] irq event stamp: 971
>> [   83.066406] hardirqs last  enabled at (971): [<ffffffff8021c384>]
>> local_flush_tlb_all+0x134/0x17c
>> [   83.066406] hardirqs last disabled at (970): [<ffffffff8021c298>]
>> local_flush_tlb_all+0x48/0x17c
>> [   83.066406] softirqs last  enabled at (0): [<ffffffff802298a4>]
>> copy_process+0x510/0x117c
>> [   83.066406] softirqs last disabled at (0): [<          (null)>]
>> (null)
>> [   83.066406]
>> [   83.066406] other info that might help us debug this:
>> [   83.066406]  Possible unsafe locking scenario:
>> [   83.066406]
>> [   83.066406]        CPU0
>> [   83.066406]        ----
>> [   83.066406]   lock(&rq->lock);
>> [   83.066406]   <Interrupt>
>> [   83.066406]     lock(&rq->lock);
>> [   83.066406]
>> [   83.066406]  *** DEADLOCK ***
>> [   83.066406]
>> [   83.066406] no locks held by migration/1/7.
>> [   83.066406]
>> [   83.066406] stack backtrace:
>> [   83.066406] Call Trace:
>> [   83.066406] [<ffffffff80747544>] dump_stack+0x8/0x34
>> [   83.066406] [<ffffffff8027ba04>] print_usage_bug+0x2ec/0x314
>> [   83.066406] [<ffffffff8027be28>] mark_lock+0x3fc/0x774
>> [   83.066406] [<ffffffff8027ca48>] __lock_acquire+0x8a8/0x1cc0
>> [   83.066406] [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
>> [   83.066406] [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
>> [   83.066406] [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
>> [   83.066406] [<ffffffff80745ff4>] migration_call+0x10c/0x2e0
>> [   83.066406] [<ffffffff80253110>] notifier_call_chain+0x44/0x94
>> [   83.066406] [<ffffffff8022eae0>] __cpu_notify+0x30/0x5c
>> [   83.066406] [<ffffffff8072b598>] take_cpu_down+0x5c/0x70
>> [   83.066406] [<ffffffff80299ba4>] stop_machine_cpu_stop+0x104/0x1e8
>> [   83.066406] [<ffffffff802997cc>] cpu_stopper_thread+0x110/0x1ac
>> [   83.066406] [<ffffffff8024c940>] kthread+0x88/0x90
>> [   83.066406] [<ffffffff80205ee4>] kernel_thread_helper+0x10/0x18
>
> Weird, that's from a CPU_DYING call, I thought those were with IRQs
> disabled.
>
> Look at how __stop_machine() calls the function with IRQs disabled for !
> stop_machine_initialized or !SMP. Also stop_machine_cpu_stop() seems to
> disabled interrupts, so how do we end up calling take_cpu_down() with
> IRQs enabled?
>
> That simply doesn't make any sense.
I see, this is an arch-specific bug, sorry for my carelessness and thank
you for your tips.

>
>> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>> ---
>>  kernel/sched/core.c |    5 +++--
>>  1 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 36e2666..703754a 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void)
>>  {
>>  	struct rq *rq = this_rq();
>>  	struct llist_node *llist = llist_del_all(&rq->wake_list);
>> +	unsigned long flags;
>>  	struct task_struct *p;
>>
>> -	raw_spin_lock(&rq->lock);
>> +	raw_spin_lock_irqsave(&rq->lock, flags);
>>
>>  	while (llist) {
>>  		p = llist_entry(llist, struct task_struct, wake_entry);
>> @@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void)
>>  		ttwu_do_activate(rq, p, 0);
>>  	}
>>
>> -	raw_spin_unlock(&rq->lock);
>> +	raw_spin_unlock_irqrestore(&rq->lock, flags);
>>  }
>>
>>  void scheduler_ipi(void)
>
>
> That's wrong though, you add the cost to the common case instead of the
> hardly ever ran hotplug case.
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
@ 2012-10-24 13:12 陈华才
  0 siblings, 0 replies; 7+ messages in thread
From: 陈华才 @ 2012-10-24 13:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Fuxin Zhang, Thomas Gleixner,
	Srivatsa S. Bhat, Tejun Heo


> On Wed, 2012-10-24 at 20:34 +0800, 陈华才 wrote:
>> I see, this is an arch-specific bug, sorry for my carelessness and thank
>> you for your tips.
>
> What arch are you using? And what exactly did the arch do wrong? Most of
> the code involved seems to be common code.
>
> Going by c0_compare_interrupt, this is some MIPS device.
>
Yes, I'm use MIPS, In a place which local_irq_save()/local_irq_restore()
should be used, I use local_irq_disable()/local_irq_enable() by mistake.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
  2012-10-24 12:34 [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug 陈华才
@ 2012-10-24 13:45 ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2012-10-24 13:45 UTC (permalink / raw)
  To: 陈华才
  Cc: Ingo Molnar, linux-kernel, Fuxin Zhang, Thomas Gleixner,
	Srivatsa S. Bhat, Tejun Heo

On Wed, 2012-10-24 at 20:34 +0800, 陈华才 wrote:
> I see, this is an arch-specific bug, sorry for my carelessness and thank
> you for your tips. 

What arch are you using? And what exactly did the arch do wrong? Most of
the code involved seems to be common code.

Going by c0_compare_interrupt, this is some MIPS device.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
  2012-10-24  9:38 ` Peter Zijlstra
@ 2012-10-25  3:32   ` Michael Wang
  2012-10-25  6:13     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Wang @ 2012-10-25  3:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Huacai Chen, Ingo Molnar, linux-kernel, Fuxin Zhang,
	Thomas Gleixner, Srivatsa S. Bhat, Tejun Heo

On 10/24/2012 05:38 PM, Peter Zijlstra wrote:
> On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote:
>> We found poweroff sometimes fails on our computers, so we have the
>> lock debug options configured. Then, when we do poweroff or take a
>> cpu down via cpu-hotplug, kernel complain as below. To resove this,
>> we modify sched_ttwu_pending(), disable the local irq when acquire
>> rq->lock.
>>
>> [   83.066406] =================================
>> [   83.066406] [ INFO: inconsistent lock state ]
>> [   83.066406] 3.5.0-3.lemote #428 Not tainted
>> [   83.066406] ---------------------------------
>> [   83.066406] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
>> [   83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> [   83.066406]  (&rq->lock){?.-.-.}, at: [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
>> [   83.066406] {IN-HARDIRQ-W} state was registered at:
>> [   83.066406]   [<ffffffff8027c9ac>] __lock_acquire+0x80c/0x1cc0
>> [   83.066406]   [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
>> [   83.066406]   [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
>> [   83.066406]   [<ffffffff8025a2fc>] scheduler_tick+0x48/0x178
>> [   83.066406]   [<ffffffff8023b334>] update_process_times+0x54/0x70
>> [   83.066406]   [<ffffffff80277568>] tick_handle_periodic+0x2c/0x9c
>> [   83.066406]   [<ffffffff8020a818>] c0_compare_interrupt+0x8c/0x94
>> [   83.066406]   [<ffffffff8029ec8c>] handle_irq_event_percpu+0x7c/0x248
>> [   83.066406]   [<ffffffff802a2774>] handle_percpu_irq+0x8c/0xc0
>> [   83.066406]   [<ffffffff8029e2c8>] generic_handle_irq+0x48/0x58
>> [   83.066406]   [<ffffffff80205c04>] do_IRQ+0x18/0x24
>> [   83.066406]   [<ffffffff802016e4>] mach_irq_dispatch+0xe4/0x124
>> [   83.066406]   [<ffffffff80203ca0>] ret_from_irq+0x0/0x4
>> [   83.066406]   [<ffffffff8022d114>] console_unlock+0x3e8/0x4c0
>> [   83.066406]   [<ffffffff811ff0d0>] con_init+0x370/0x398
>> [   83.066406]   [<ffffffff811fe3e0>] console_init+0x34/0x50
>> [   83.066406]   [<ffffffff811e4844>] start_kernel+0x2f8/0x4e0
>> [   83.066406] irq event stamp: 971
>> [   83.066406] hardirqs last  enabled at (971): [<ffffffff8021c384>] local_flush_tlb_all+0x134/0x17c
>> [   83.066406] hardirqs last disabled at (970): [<ffffffff8021c298>] local_flush_tlb_all+0x48/0x17c
>> [   83.066406] softirqs last  enabled at (0): [<ffffffff802298a4>] copy_process+0x510/0x117c
>> [   83.066406] softirqs last disabled at (0): [<          (null)>] (null)
>> [   83.066406]
>> [   83.066406] other info that might help us debug this:
>> [   83.066406]  Possible unsafe locking scenario:
>> [   83.066406]
>> [   83.066406]        CPU0
>> [   83.066406]        ----
>> [   83.066406]   lock(&rq->lock);
>> [   83.066406]   <Interrupt>
>> [   83.066406]     lock(&rq->lock);
>> [   83.066406]
>> [   83.066406]  *** DEADLOCK ***
>> [   83.066406]
>> [   83.066406] no locks held by migration/1/7.
>> [   83.066406]
>> [   83.066406] stack backtrace:
>> [   83.066406] Call Trace:
>> [   83.066406] [<ffffffff80747544>] dump_stack+0x8/0x34
>> [   83.066406] [<ffffffff8027ba04>] print_usage_bug+0x2ec/0x314
>> [   83.066406] [<ffffffff8027be28>] mark_lock+0x3fc/0x774
>> [   83.066406] [<ffffffff8027ca48>] __lock_acquire+0x8a8/0x1cc0
>> [   83.066406] [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
>> [   83.066406] [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
>> [   83.066406] [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
>> [   83.066406] [<ffffffff80745ff4>] migration_call+0x10c/0x2e0
>> [   83.066406] [<ffffffff80253110>] notifier_call_chain+0x44/0x94
>> [   83.066406] [<ffffffff8022eae0>] __cpu_notify+0x30/0x5c
>> [   83.066406] [<ffffffff8072b598>] take_cpu_down+0x5c/0x70
>> [   83.066406] [<ffffffff80299ba4>] stop_machine_cpu_stop+0x104/0x1e8
>> [   83.066406] [<ffffffff802997cc>] cpu_stopper_thread+0x110/0x1ac
>> [   83.066406] [<ffffffff8024c940>] kthread+0x88/0x90
>> [   83.066406] [<ffffffff80205ee4>] kernel_thread_helper+0x10/0x18
> 
> Weird, that's from a CPU_DYING call, I thought those were with IRQs
> disabled. 
> 
> Look at how __stop_machine() calls the function with IRQs disabled for !
> stop_machine_initialized or !SMP. Also stop_machine_cpu_stop() seems to
> disabled interrupts, so how do we end up calling take_cpu_down() with
> IRQs enabled?

The patch is no doubt wrong...

The discuss in:

https://lkml.org/lkml/2012/7/19/164

Which also faced the issue that the timer interrupt come in after apic
was shut down, I'm not sure whether this could do help to Huacai, just
as a clue...

Regards,
Michael Wang

> 
> That simply doesn't make any sense.
> 
>> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>> ---
>>  kernel/sched/core.c |    5 +++--
>>  1 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 36e2666..703754a 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void)
>>  {
>>  	struct rq *rq = this_rq();
>>  	struct llist_node *llist = llist_del_all(&rq->wake_list);
>> +	unsigned long flags;
>>  	struct task_struct *p;
>>  
>> -	raw_spin_lock(&rq->lock);
>> +	raw_spin_lock_irqsave(&rq->lock, flags);
>>  
>>  	while (llist) {
>>  		p = llist_entry(llist, struct task_struct, wake_entry);
>> @@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void)
>>  		ttwu_do_activate(rq, p, 0);
>>  	}
>>  
>> -	raw_spin_unlock(&rq->lock);
>> +	raw_spin_unlock_irqrestore(&rq->lock, flags);
>>  }
>>  
>>  void scheduler_ipi(void)
> 
> 
> That's wrong though, you add the cost to the common case instead of the
> hardly ever ran hotplug case.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
  2012-10-25  3:32   ` Michael Wang
@ 2012-10-25  6:13     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 7+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-25  6:13 UTC (permalink / raw)
  To: Michael Wang
  Cc: Peter Zijlstra, Huacai Chen, Ingo Molnar, linux-kernel,
	Fuxin Zhang, Thomas Gleixner, Tejun Heo

On 10/25/2012 09:02 AM, Michael Wang wrote:
> On 10/24/2012 05:38 PM, Peter Zijlstra wrote:
>> On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote:
>>> We found poweroff sometimes fails on our computers, so we have the
>>> lock debug options configured. Then, when we do poweroff or take a
>>> cpu down via cpu-hotplug, kernel complain as below. To resove this,
>>> we modify sched_ttwu_pending(), disable the local irq when acquire
>>> rq->lock.
>>>
>>> [   83.066406] =================================
>>> [   83.066406] [ INFO: inconsistent lock state ]
>>> [   83.066406] 3.5.0-3.lemote #428 Not tainted
>>> [   83.066406] ---------------------------------
>>> [   83.066406] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
>>> [   83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes:
>>> [   83.066406]  (&rq->lock){?.-.-.}, at: [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
>>> [   83.066406] {IN-HARDIRQ-W} state was registered at:
>>> [   83.066406]   [<ffffffff8027c9ac>] __lock_acquire+0x80c/0x1cc0
>>> [   83.066406]   [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
>>> [   83.066406]   [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
>>> [   83.066406]   [<ffffffff8025a2fc>] scheduler_tick+0x48/0x178
>>> [   83.066406]   [<ffffffff8023b334>] update_process_times+0x54/0x70
>>> [   83.066406]   [<ffffffff80277568>] tick_handle_periodic+0x2c/0x9c
>>> [   83.066406]   [<ffffffff8020a818>] c0_compare_interrupt+0x8c/0x94
>>> [   83.066406]   [<ffffffff8029ec8c>] handle_irq_event_percpu+0x7c/0x248
>>> [   83.066406]   [<ffffffff802a2774>] handle_percpu_irq+0x8c/0xc0
>>> [   83.066406]   [<ffffffff8029e2c8>] generic_handle_irq+0x48/0x58
>>> [   83.066406]   [<ffffffff80205c04>] do_IRQ+0x18/0x24
>>> [   83.066406]   [<ffffffff802016e4>] mach_irq_dispatch+0xe4/0x124
>>> [   83.066406]   [<ffffffff80203ca0>] ret_from_irq+0x0/0x4
>>> [   83.066406]   [<ffffffff8022d114>] console_unlock+0x3e8/0x4c0
>>> [   83.066406]   [<ffffffff811ff0d0>] con_init+0x370/0x398
>>> [   83.066406]   [<ffffffff811fe3e0>] console_init+0x34/0x50
>>> [   83.066406]   [<ffffffff811e4844>] start_kernel+0x2f8/0x4e0
>>> [   83.066406] irq event stamp: 971
>>> [   83.066406] hardirqs last  enabled at (971): [<ffffffff8021c384>] local_flush_tlb_all+0x134/0x17c
>>> [   83.066406] hardirqs last disabled at (970): [<ffffffff8021c298>] local_flush_tlb_all+0x48/0x17c
>>> [   83.066406] softirqs last  enabled at (0): [<ffffffff802298a4>] copy_process+0x510/0x117c
>>> [   83.066406] softirqs last disabled at (0): [<          (null)>] (null)
>>> [   83.066406]
>>> [   83.066406] other info that might help us debug this:
>>> [   83.066406]  Possible unsafe locking scenario:
>>> [   83.066406]
>>> [   83.066406]        CPU0
>>> [   83.066406]        ----
>>> [   83.066406]   lock(&rq->lock);
>>> [   83.066406]   <Interrupt>
>>> [   83.066406]     lock(&rq->lock);
>>> [   83.066406]
>>> [   83.066406]  *** DEADLOCK ***
>>> [   83.066406]
>>> [   83.066406] no locks held by migration/1/7.
>>> [   83.066406]
>>> [   83.066406] stack backtrace:
>>> [   83.066406] Call Trace:
>>> [   83.066406] [<ffffffff80747544>] dump_stack+0x8/0x34
>>> [   83.066406] [<ffffffff8027ba04>] print_usage_bug+0x2ec/0x314
>>> [   83.066406] [<ffffffff8027be28>] mark_lock+0x3fc/0x774
>>> [   83.066406] [<ffffffff8027ca48>] __lock_acquire+0x8a8/0x1cc0
>>> [   83.066406] [<ffffffff8027e3d0>] lock_acquire+0x60/0x9c
>>> [   83.066406] [<ffffffff8074ba04>] _raw_spin_lock+0x3c/0x50
>>> [   83.066406] [<ffffffff802585ac>] sched_ttwu_pending+0x64/0x98
>>> [   83.066406] [<ffffffff80745ff4>] migration_call+0x10c/0x2e0
>>> [   83.066406] [<ffffffff80253110>] notifier_call_chain+0x44/0x94
>>> [   83.066406] [<ffffffff8022eae0>] __cpu_notify+0x30/0x5c
>>> [   83.066406] [<ffffffff8072b598>] take_cpu_down+0x5c/0x70
>>> [   83.066406] [<ffffffff80299ba4>] stop_machine_cpu_stop+0x104/0x1e8
>>> [   83.066406] [<ffffffff802997cc>] cpu_stopper_thread+0x110/0x1ac
>>> [   83.066406] [<ffffffff8024c940>] kthread+0x88/0x90
>>> [   83.066406] [<ffffffff80205ee4>] kernel_thread_helper+0x10/0x18
>>
>> Weird, that's from a CPU_DYING call, I thought those were with IRQs
>> disabled. 
>>
>> Look at how __stop_machine() calls the function with IRQs disabled for !
>> stop_machine_initialized or !SMP. Also stop_machine_cpu_stop() seems to
>> disabled interrupts, so how do we end up calling take_cpu_down() with
>> IRQs enabled?
> 
> The patch is no doubt wrong...
> 
> The discuss in:
> 
> https://lkml.org/lkml/2012/7/19/164
> 
> Which also faced the issue that the timer interrupt come in after apic
> was shut down, I'm not sure whether this could do help to Huacai, just
> as a clue...
>

One interesting thing that I noted in that case was that we noticed that
(stale) interrupt exactly at the call to local_irq_restore() in
stop_machine_cpu_stop().

However, as Peter pointed out, migration_call's CPU_DYING notifier runs
right in the middle of the stop machine dance, much much before the call
to local_irq_restore().. so it doesn't look like a case of a stale interrupt
being recognized.. it looks as if the sequence of local_irq_disable(),
hard_irq_disable() and __cpu_disable() somehow managed to wrongly keep the
interrupts still enabled...

Regards,
Srivatsa S. Bhat

> 
>>
>> That simply doesn't make any sense.
>>
>>> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>>> ---
>>>  kernel/sched/core.c |    5 +++--
>>>  1 files changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 36e2666..703754a 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void)
>>>  {
>>>  	struct rq *rq = this_rq();
>>>  	struct llist_node *llist = llist_del_all(&rq->wake_list);
>>> +	unsigned long flags;
>>>  	struct task_struct *p;
>>>  
>>> -	raw_spin_lock(&rq->lock);
>>> +	raw_spin_lock_irqsave(&rq->lock, flags);
>>>  
>>>  	while (llist) {
>>>  		p = llist_entry(llist, struct task_struct, wake_entry);
>>> @@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void)
>>>  		ttwu_do_activate(rq, p, 0);
>>>  	}
>>>  
>>> -	raw_spin_unlock(&rq->lock);
>>> +	raw_spin_unlock_irqrestore(&rq->lock, flags);
>>>  }
>>>  
>>>  void scheduler_ipi(void)
>>
>>
>> That's wrong though, you add the cost to the common case instead of the
>> hardly ever ran hotplug case.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-10-25  6:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-24 12:34 [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug 陈华才
2012-10-24 13:45 ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2012-10-24 13:12 陈华才
2012-10-24  9:25 Huacai Chen
2012-10-24  9:38 ` Peter Zijlstra
2012-10-25  3:32   ` Michael Wang
2012-10-25  6:13     ` Srivatsa S. Bhat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox