rcu_sched stall while waiting in csd_lock

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* rcu_sched stall while waiting in csd_lock_wait()
@ 2017-08-02  3:31 Pratyush Anand
  2017-08-02  8:08 ` Will Deacon
  2017-08-02  8:14 ` Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Pratyush Anand @ 2017-08-02  3:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I am observing following rcu_sched stall while executing `perf record -a  -- 
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was 
waiting in csd_lock_wait() from where it never came out,and so the stall. Any 
help/pointer for further debugging would be very helpful. Problem also 
reproduced with 4.13.0-rc3.


[11108.758804] task: ffff801782490000 task.stack: ffff801782574000
[11108.764710] PC is at smp_call_function_single+0x128/0x178
[11108.770089] LR is at smp_call_function_single+0x11c/0x178
[11108.775470] pc : [<ffff0000081615ac>] lr : [<ffff0000081615a0>] pstate: 
20400145
[11108.782848] sp : ffff801782577b80
[11108.786147] x29: ffff801782577b80 x28: ffff8017815c3800
[11108.791442] x27: ffff0000087f1000 x26: 000000000000001d
[11108.796737] x25: 0000000000000000 x24: ffff0000081ddc90
[11108.802032] x23: ffff000008c73000 x22: 0000000000000006
[11108.807327] x21: ffff0000081da548 x20: ffff801782577c18
[11108.812623] x19: ffff000008c73000 x18: 0000fffffa011c30
[11108.817918] x17: 0000000000000000 x16: 0000000000000000
[11108.823213] x15: 00000000006ed000 x14: 0000000000620170
[11108.828508] x13: ffffffffffffffff x12: 0000000000000005
[11108.833803] x11: 0000000000000000 x10: 00000000ffffffff
[11108.839098] x9 : 0000fffffa011e40 x8 : 000000000000001d
[11108.844393] x7 : 0000000000000000 x6 : ffff801782577bc8
[11108.849689] x5 : 0000000000000040 x4 : ffff801782577be0
[11108.854984] x3 : ffff801782577be0 x2 : 0000000000000000
[11108.860279] x1 : 0000000000000003 x0 : 0000000000000000
[11108.865574]
[11141.885599] INFO: rcu_sched self-detected stall on CPU
[11141.889777] 	7-...: (5982 ticks this GP) idle=2fb/140000000000001/0 
softirq=604/604 fqs=2995
[11141.898281] 	 (t=6000 jiffies g=11855 c=11854 q=19196)
[11141.903404] Task dump for CPU 7:
[11141.906615] perf            R  running task        0  2357   2329 0x00000202
[11141.913647] Call trace:
[11141.916081] [<ffff000008088770>] dump_backtrace+0x0/0x270
[11141.921460] [<ffff000008088a04>] show_stack+0x24/0x2c
[11141.926496] [<ffff0000080fcd50>] sched_show_task+0xec/0x130
[11141.932050] [<ffff0000080fe7e8>] dump_cpu_task+0x48/0x54
[11141.937347] [<ffff0000081ee314>] rcu_dump_cpu_stacks+0xac/0xf4
[11141.943163] [<ffff000008144b94>] rcu_check_callbacks+0x7a0/0xa84
[11141.949152] [<ffff00000814ae98>] update_process_times+0x34/0x5c
[11141.955055] [<ffff00000815c7ec>] tick_sched_handle.isra.15+0x38/0x70
[11141.961390] [<ffff00000815c86c>] tick_sched_timer+0x48/0x88
[11141.966946] [<ffff00000814b84c>] __hrtimer_run_queues+0x150/0x2e4
[11141.973022] [<ffff00000814c264>] hrtimer_interrupt+0xa0/0x1d4
[11141.978754] [<ffff000008650898>] arch_timer_handler_phys+0x3c/0x48
[11141.984916] [<ffff000008137c40>] handle_percpu_devid_irq+0x98/0x210
[11141.991164] [<ffff0000081321a4>] generic_handle_irq+0x34/0x4c
[11141.996893] [<ffff000008132868>] __handle_domain_irq+0x6c/0xc4
[11142.002710] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[11142.008178] Exception stack(0xffff801782577a50 to 0xffff801782577b80)
[11142.014602] 7a40:                                   0000000000000000 
0000000000000003
[11142.022414] 7a60: 0000000000000000 ffff801782577be0 ffff801782577be0 
0000000000000040
[11142.030226] 7a80: ffff801782577bc8 0000000000000000 000000000000001d 
0000fffffa011e40
[11142.038039] 7aa0: 00000000ffffffff 0000000000000000 0000000000000005 
ffffffffffffffff
[11142.045851] 7ac0: 0000000000620170 00000000006ed000 0000000000000000 
0000000000000000
[11142.053664] 7ae0: 0000fffffa011c30 ffff000008c73000 ffff801782577c18 
ffff0000081da548
[11142.061477] 7b00: 0000000000000006 ffff000008c73000 ffff0000081ddc90 
0000000000000000
[11142.069289] 7b20: 000000000000001d ffff0000087f1000 ffff8017815c3800 
ffff801782577b80
[11142.077101] 7b40: ffff0000081615a0 ffff801782577b80 ffff0000081615ac 
0000000020400145
[11142.084914] 7b60: 0000001b00000019 ffff801782577c18 0001000000000000 
ffff801782577bc8
[11142.092727] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[11142.097588] [<ffff0000081615ac>] smp_call_function_single+0x128/0x178
[11142.104014] [<ffff0000081d8124>] cpu_function_call+0x40/0x64
[11142.109654] [<ffff0000081d837c>] event_function_call+0x100/0x108
[11142.115643] [<ffff0000081d83d0>] _perf_event_disable+0x4c/0x74
[11142.121459] [<ffff0000081d7f78>] perf_event_for_each_child+0x38/0x98
[11142.127796] [<ffff0000081e110c>] perf_ioctl+0xec/0x7b4
[11142.132920] [<ffff00000829e564>] do_vfs_ioctl+0xcc/0x7bc
[11142.138213] [<ffff00000829ece4>] SyS_ioctl+0x90/0xa4
[11142.143161] [<ffff0000080834a0>] __sys_trace_return+0x0/0x4
[11142.148717] INFO: rcu_sched detected stalls on CPUs/tasks:
[11142.154187] 	7-...: (5983 ticks this GP) idle=2fb/140000000000000/0 
softirq=604/604 fqs=2995
[11142.162692] 	(detected by 8, t=6027 jiffies, g=11855, c=11854, q=19211)
[11142.169289] Task dump for CPU 7:
[11142.172500] perf            R  running task        0  2357   2329 0x00000202
[11142.179531] Call trace:
[11142.181964] [<ffff0000080857c8>] __switch_to+0x64/0x70
[11142.187088] [<ffff000008c20000>] cpu_number+0x0/0x8


-- 
Regards
Pratyush

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-02  3:31 rcu_sched stall while waiting in csd_lock_wait() Pratyush Anand
@ 2017-08-02  8:08 ` Will Deacon
  2017-08-02  8:44   ` Marc Zyngier
  2017-08-03  3:55   ` Pratyush Anand
  2017-08-02  8:14 ` Peter Zijlstra
  1 sibling, 2 replies; 10+ messages in thread
From: Will Deacon @ 2017-08-02  8:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:
> I am observing following rcu_sched stall while executing `perf record -a  --
> sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
> waiting in csd_lock_wait() from where it never came out,and so the stall.
> Any help/pointer for further debugging would be very helpful. Problem also
> reproduced with 4.13.0-rc3.

When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?

It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?

Cheers,

Will

> [11108.758804] task: ffff801782490000 task.stack: ffff801782574000
> [11108.764710] PC is at smp_call_function_single+0x128/0x178
> [11108.770089] LR is at smp_call_function_single+0x11c/0x178
> [11108.775470] pc : [<ffff0000081615ac>] lr : [<ffff0000081615a0>] pstate:
> 20400145
> [11108.782848] sp : ffff801782577b80
> [11108.786147] x29: ffff801782577b80 x28: ffff8017815c3800
> [11108.791442] x27: ffff0000087f1000 x26: 000000000000001d
> [11108.796737] x25: 0000000000000000 x24: ffff0000081ddc90
> [11108.802032] x23: ffff000008c73000 x22: 0000000000000006
> [11108.807327] x21: ffff0000081da548 x20: ffff801782577c18
> [11108.812623] x19: ffff000008c73000 x18: 0000fffffa011c30
> [11108.817918] x17: 0000000000000000 x16: 0000000000000000
> [11108.823213] x15: 00000000006ed000 x14: 0000000000620170
> [11108.828508] x13: ffffffffffffffff x12: 0000000000000005
> [11108.833803] x11: 0000000000000000 x10: 00000000ffffffff
> [11108.839098] x9 : 0000fffffa011e40 x8 : 000000000000001d
> [11108.844393] x7 : 0000000000000000 x6 : ffff801782577bc8
> [11108.849689] x5 : 0000000000000040 x4 : ffff801782577be0
> [11108.854984] x3 : ffff801782577be0 x2 : 0000000000000000
> [11108.860279] x1 : 0000000000000003 x0 : 0000000000000000
> [11108.865574]
> [11141.885599] INFO: rcu_sched self-detected stall on CPU
> [11141.889777] 	7-...: (5982 ticks this GP) idle=2fb/140000000000001/0
> softirq=604/604 fqs=2995
> [11141.898281] 	 (t=6000 jiffies g=11855 c=11854 q=19196)
> [11141.903404] Task dump for CPU 7:
> [11141.906615] perf            R  running task        0  2357   2329 0x00000202
> [11141.913647] Call trace:
> [11141.916081] [<ffff000008088770>] dump_backtrace+0x0/0x270
> [11141.921460] [<ffff000008088a04>] show_stack+0x24/0x2c
> [11141.926496] [<ffff0000080fcd50>] sched_show_task+0xec/0x130
> [11141.932050] [<ffff0000080fe7e8>] dump_cpu_task+0x48/0x54
> [11141.937347] [<ffff0000081ee314>] rcu_dump_cpu_stacks+0xac/0xf4
> [11141.943163] [<ffff000008144b94>] rcu_check_callbacks+0x7a0/0xa84
> [11141.949152] [<ffff00000814ae98>] update_process_times+0x34/0x5c
> [11141.955055] [<ffff00000815c7ec>] tick_sched_handle.isra.15+0x38/0x70
> [11141.961390] [<ffff00000815c86c>] tick_sched_timer+0x48/0x88
> [11141.966946] [<ffff00000814b84c>] __hrtimer_run_queues+0x150/0x2e4
> [11141.973022] [<ffff00000814c264>] hrtimer_interrupt+0xa0/0x1d4
> [11141.978754] [<ffff000008650898>] arch_timer_handler_phys+0x3c/0x48
> [11141.984916] [<ffff000008137c40>] handle_percpu_devid_irq+0x98/0x210
> [11141.991164] [<ffff0000081321a4>] generic_handle_irq+0x34/0x4c
> [11141.996893] [<ffff000008132868>] __handle_domain_irq+0x6c/0xc4
> [11142.002710] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [11142.008178] Exception stack(0xffff801782577a50 to 0xffff801782577b80)
> [11142.014602] 7a40:                                   0000000000000000
> 0000000000000003
> [11142.022414] 7a60: 0000000000000000 ffff801782577be0 ffff801782577be0
> 0000000000000040
> [11142.030226] 7a80: ffff801782577bc8 0000000000000000 000000000000001d
> 0000fffffa011e40
> [11142.038039] 7aa0: 00000000ffffffff 0000000000000000 0000000000000005
> ffffffffffffffff
> [11142.045851] 7ac0: 0000000000620170 00000000006ed000 0000000000000000
> 0000000000000000
> [11142.053664] 7ae0: 0000fffffa011c30 ffff000008c73000 ffff801782577c18
> ffff0000081da548
> [11142.061477] 7b00: 0000000000000006 ffff000008c73000 ffff0000081ddc90
> 0000000000000000
> [11142.069289] 7b20: 000000000000001d ffff0000087f1000 ffff8017815c3800
> ffff801782577b80
> [11142.077101] 7b40: ffff0000081615a0 ffff801782577b80 ffff0000081615ac
> 0000000020400145
> [11142.084914] 7b60: 0000001b00000019 ffff801782577c18 0001000000000000
> ffff801782577bc8
> [11142.092727] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [11142.097588] [<ffff0000081615ac>] smp_call_function_single+0x128/0x178
> [11142.104014] [<ffff0000081d8124>] cpu_function_call+0x40/0x64
> [11142.109654] [<ffff0000081d837c>] event_function_call+0x100/0x108
> [11142.115643] [<ffff0000081d83d0>] _perf_event_disable+0x4c/0x74
> [11142.121459] [<ffff0000081d7f78>] perf_event_for_each_child+0x38/0x98
> [11142.127796] [<ffff0000081e110c>] perf_ioctl+0xec/0x7b4
> [11142.132920] [<ffff00000829e564>] do_vfs_ioctl+0xcc/0x7bc
> [11142.138213] [<ffff00000829ece4>] SyS_ioctl+0x90/0xa4
> [11142.143161] [<ffff0000080834a0>] __sys_trace_return+0x0/0x4
> [11142.148717] INFO: rcu_sched detected stalls on CPUs/tasks:
> [11142.154187] 	7-...: (5983 ticks this GP) idle=2fb/140000000000000/0
> softirq=604/604 fqs=2995
> [11142.162692] 	(detected by 8, t=6027 jiffies, g=11855, c=11854, q=19211)
> [11142.169289] Task dump for CPU 7:
> [11142.172500] perf            R  running task        0  2357   2329 0x00000202
> [11142.179531] Call trace:
> [11142.181964] [<ffff0000080857c8>] __switch_to+0x64/0x70
> [11142.187088] [<ffff000008c20000>] cpu_number+0x0/0x8
> 
> 
> -- 
> Regards
> Pratyush

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-02  3:31 rcu_sched stall while waiting in csd_lock_wait() Pratyush Anand
  2017-08-02  8:08 ` Will Deacon
@ 2017-08-02  8:14 ` Peter Zijlstra
  2017-08-03  3:56   ` Pratyush Anand
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2017-08-02  8:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:
> Hi,
> 
> I am observing following rcu_sched stall while executing `perf record -a  --
> sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
> waiting in csd_lock_wait() from where it never came out,and so the stall.
> Any help/pointer for further debugging would be very helpful. Problem also
> reproduced with 4.13.0-rc3.

I'm sitting on this patch:

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=15377ef4fe0c86eb7fa1099575b2e86357d99e42

please give that a spin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-02  8:08 ` Will Deacon
@ 2017-08-02  8:44   ` Marc Zyngier
  2017-08-03  3:56     ` Pratyush Anand
  2017-08-03  3:55   ` Pratyush Anand
  1 sibling, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2017-08-02  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/08/17 09:08, Will Deacon wrote:
> Hi Pratyush,
> 
> On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:
>> I am observing following rcu_sched stall while executing `perf record -a  --
>> sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
>> waiting in csd_lock_wait() from where it never came out,and so the stall.
>> Any help/pointer for further debugging would be very helpful. Problem also
>> reproduced with 4.13.0-rc3.
> 
> When you say "also", which other kernel(s) show the problem? Is this a
> recent regression? Which platform are you running on?
> 
> It would be interesting to know what the other CPUs are doing, in particular
> the target of the cross-call. Either it crashed spectacularly and didn't
> unlock the csd lock, or the IPI somehow wasn't delivered.
> 
> Do you see any other splats if you enable lock debugging?

Also, is that in a guest, or bare metal? If that's a guest, what's the
host's kernel version?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-02  8:08 ` Will Deacon
  2017-08-02  8:44   ` Marc Zyngier
@ 2017-08-03  3:55   ` Pratyush Anand
  2017-08-03  7:19     ` Marc Zyngier
  2017-08-03 12:54     ` Paul E. McKenney
  1 sibling, 2 replies; 10+ messages in thread
From: Pratyush Anand @ 2017-08-03  3:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Wednesday 02 August 2017 01:38 PM, Will Deacon wrote:
> Hi Pratyush,
> 
> On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:
>> I am observing following rcu_sched stall while executing `perf record -a  --
>> sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
>> waiting in csd_lock_wait() from where it never came out,and so the stall.
>> Any help/pointer for further debugging would be very helpful. Problem also
>> reproduced with 4.13.0-rc3.
> When you say "also", which other kernel(s) show the problem? Is this a
> recent regression? Which platform are you running on?

Other than 4.13.0-rc3 I had tested with 4.11 based kernel which has 4.11 
vanila+some 4.12 patches back ported, and log which I had attached was from 
same kernel, thats why I mentioned that it reproduced with vanilla upstream as 
well.

> 
> It would be interesting to know what the other CPUs are doing, in particular
> the target of the cross-call. Either it crashed spectacularly and didn't
> unlock the csd lock, or the IPI somehow wasn't delivered.
> 
> Do you see any other splats if you enable lock debugging?

It was same.

Following is the log from 4.13.0-rc3 + patch pointed by Peter:

[  173.649589] perf: interrupt took too long (4952 > 4902), lowering 
kernel.perf_event_max_sample_rate to 40300
[  201.340926] INFO: rcu_sched self-detected stall on CPU
[  201.345115] 	9-...: (6499 ticks this GP) idle=e1a/140000000000001/0 
softirq=334/334 fqs=3250
[  201.353617] 	 (t=6500 jiffies g=313 c=312 q=428)
[  201.358220] Task dump for CPU 9:
[  201.361431] perf            R  running task        0  1888   1864 0x00000202
[  201.368462] Call trace:
[  201.370897] [<ffff000008089498>] dump_backtrace+0x0/0x28c
[  201.376276] [<ffff000008089748>] show_stack+0x24/0x2c
[  201.381312] [<ffff00000810c5d0>] sched_show_task+0x19c/0x26c
[  201.386952] [<ffff00000810df28>] dump_cpu_task+0x48/0x54
[  201.392250] [<ffff00000816c564>] rcu_dump_cpu_stacks+0xac/0xf4
[  201.398063] [<ffff00000816b714>] rcu_check_callbacks+0x908/0xc90
[  201.404053] [<ffff000008172fe0>] update_process_times+0x34/0x5c
[  201.409957] [<ffff00000818697c>] tick_sched_handle.isra.16+0x4c/0x70
[  201.416292] [<ffff0000081869e8>] tick_sched_timer+0x48/0x88
[  201.421847] [<ffff000008173c60>] __hrtimer_run_queues+0x17c/0x604
[  201.427924] [<ffff000008174b90>] hrtimer_interrupt+0xa4/0x1e8
[  201.433656] [<ffff0000086d9d34>] arch_timer_handler_phys+0x3c/0x48
[  201.439818] [<ffff00000815b718>] handle_percpu_devid_irq+0xdc/0x42c
[  201.446069] [<ffff00000815569c>] generic_handle_irq+0x34/0x4c
[  201.451796] [<ffff000008155dec>] __handle_domain_irq+0x6c/0xc4
[  201.457611] [<ffff0000080816c4>] gic_handle_irq+0xa0/0x1b0
[  201.463080] Exception stack(0xffff8016df013a40 to 0xffff8016df013b70)
[  201.469504] 3a40: 0000000000000000 0000000000000003 0000000000000000 
ffff8016df013bd0
[  201.477316] 3a60: ffff8016df013bd0 0000000000080000 ffff8016df013bb8 
ffff0000082113c8
[  201.485129] 3a80: 0000000000000000 0000ffffe507a9f0 00000000ffffffff 
0000000000000000
[  201.492941] 3aa0: 0000000000000005 ffffffffffffffff 002f547d23157399 
00003a2a9f82ac9c
[  201.500754] 3ac0: 0000000000000000 0000000000000000 0000ffffe507a7e0 
ffff000008f5b000
[  201.508566] 3ae0: ffff8016df013c08 ffff000008213fcc 0000000000000013 
ffff8017616a7800
[  201.516379] 3b00: ffff000008f5b000 ffff0000082179d4 0000000000000000 
ffff0000088c1000
[  201.524191] 3b20: ffff8017616a7800 ffff8016df013b70 ffff00000818bf04 
ffff8016df013b70
[  201.532004] 3b40: ffff00000818bf28 0000000020400145 ffff000008213fcc 
0000000000000013
[  201.539816] 3b60: 0001000000000000 ffff8016df013bb8
[  201.544677] [<ffff000008082eb8>] el1_irq+0xb8/0x140
[  201.549539] [<ffff00000818bf28>] smp_call_function_single+0x160/0x184
[  201.555965] [<ffff000008211128>] cpu_function_call+0x40/0x64
[  201.561605] [<ffff0000082113a0>] event_function_call+0x120/0x128
[  201.567594] [<ffff0000082113ec>] _perf_event_disable+0x44/0x64
[  201.573410] [<ffff000008210afc>] perf_event_for_each_child+0x3c/0x84
[  201.579747] [<ffff00000821bcfc>] perf_ioctl+0x21c/0x9a4
[  201.584957] [<ffff0000082eec6c>] do_vfs_ioctl+0xcc/0x874
[  201.590250] [<ffff0000082ef4a4>] sys_ioctl+0x90/0xa4
[  201.595198] [<ffff0000080834a0>] __sys_trace_return+0x0/0x4
[  239.003035] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 
9-... } 6592 jiffies s: 1149 root: 0x1/.
[  239.012199] blocking rcu_node structures: l=1:0-14:0x200/.
[  239.017695] Task dump for CPU 9:
[  239.020880] perf            R  running task        0  1888   1864 0x00000202
[  239.027929] Call trace:
[  239.030346] [<ffff000008085c9c>] __switch_to+0x64/0x70
[  239.035484] [<ffff000008235a00>] free_pcppages_bulk+0x43c/0x640
[  262.304244] perf: interrupt took too long (6221 > 6190), lowering 
kernel.perf_event_max_sample_rate to 32100
[  367.009704] INFO: task kworker/15:2:1187 blocked for more than 120 seconds.
[  367.015713]       Tainted: G        W       4.13.0-rc3+ #2
[  367.021200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[  367.028994] kworker/15:2    D    0  1187      2 0x00000000
[  367.034481] Workqueue: usb_hub_wq hub_event
[  367.038629] Call trace:
[  367.041077] [<ffff000008085c9c>] __switch_to+0x64/0x70
[  367.046184] [<ffff0000088a3680>] __schedule+0x410/0xcbc
[  367.051403] [<ffff0000088a3f6c>] schedule+0x40/0xa4
[  367.056253] [<ffff00000816a578>] 
_synchronize_rcu_expedited.constprop.64+0x370/0x3e8
[  367.063990] [<ffff00000816a66c>] synchronize_sched_expedited+0x7c/0xf0
[  367.070501] [<ffff00000816aa0c>] synchronize_rcu_expedited+0x14/0x1c
[  367.076825] [<ffff00000825ec24>] bdi_unregister+0x78/0x200
[  367.082307] [<ffff00000841fbac>] del_gendisk+0x130/0x240
[  367.087592] [<ffff000003de0688>] sr_remove+0x30/0x90 [sr_mod]
[  367.093333] [<ffff00000859901c>] device_release_driver_internal+0x15c/0x1dc
[  367.100275] [<ffff0000085990c4>] device_release_driver+0x28/0x34
[  367.106251] [<ffff00000859794c>] bus_remove_device+0xe4/0x14c
[  367.111993] [<ffff000008593890>] device_del+0x1dc/0x308
[  367.117191] [<ffff0000085e121c>] __scsi_remove_device+0xfc/0x128
[  367.123192] [<ffff0000085de894>] scsi_forget_host+0x74/0x78
[  367.128778] [<ffff0000085d0cf8>] scsi_remove_host+0x7c/0x120
[  367.134396] [<ffff000000ad3720>] usb_stor_disconnect+0x58/0xc8 [usb_storage]
[  367.141421] [<ffff000008636fd8>] usb_unbind_interface+0x7c/0x268
[  367.147397] [<ffff00000859901c>] device_release_driver_internal+0x15c/0x1dc
[  367.154355] [<ffff0000085990c4>] device_release_driver+0x28/0x34
[  367.160344] [<ffff00000859794c>] bus_remove_device+0xe4/0x14c
[  367.166060] [<ffff000008593890>] device_del+0x1dc/0x308
[  367.171282] [<ffff0000086344fc>] usb_disable_device+0xc4/0x2ac
[  367.177084] [<ffff000008629de4>] usb_disconnect+0x98/0x248
[  367.182566] [<ffff000008629ff8>] hub_quiesce+0x64/0xac
[  367.187674] [<ffff00000862bfc0>] hub_event+0x110/0xa8c
[  367.192810] [<ffff0000080f2a64>] process_one_work+0x228/0x6a8
[  367.198568] [<ffff0000080f2f44>] worker_thread+0x60/0x3bc
[  367.203922] [<ffff0000080faa40>] kthread+0x114/0x140
[  367.208898] [<ffff0000080833d0>] ret_from_fork+0x10/0x40
[  367.214171] INFO: task kworker/15:3:1890 blocked for more than 120 seconds.
[  367.221108]       Tainted: G        W       4.13.0-rc3+ #2
[  367.226563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[  367.234388] kworker/15:3    D    0  1890      2 0x00000200
[  367.239918] Workqueue: events wait_rcu_exp_gp
[  367.244184] Call trace:
[  367.246615] [<ffff000008085c9c>] __switch_to+0x64/0x70
[  367.251751] [<ffff0000088a3680>] __schedule+0x410/0xcbc
[  367.256945] [<ffff0000088a3f6c>] schedule+0x40/0xa4
[  367.261820] [<ffff0000088a936c>] schedule_timeout+0x1e8/0x454
[  367.267536] [<ffff000008165da0>] rcu_exp_wait_wake+0xb8/0x5b0
[  367.273278] [<ffff00000816a1f8>] wait_rcu_exp_gp+0x3c/0x4c
[  367.278776] [<ffff0000080f2a64>] process_one_work+0x228/0x6a8
[  367.284475] [<ffff0000080f2f44>] worker_thread+0x60/0x3bc
[  367.289931] [<ffff0000080faa40>] kthread+0x114/0x140
[  367.294793] [<ffff0000080833d0>] ret_from_fork+0x10/0x40
[  367.300100] INFO: lockdep is turned off.

-- 
Regards
Pratyush

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-02  8:14 ` Peter Zijlstra
@ 2017-08-03  3:56   ` Pratyush Anand
  2017-08-03  8:32     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Pratyush Anand @ 2017-08-03  3:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Peter,

On Wednesday 02 August 2017 01:44 PM, Peter Zijlstra wrote:
> On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:
>> Hi,
>>
>> I am observing following rcu_sched stall while executing `perf record -a  --
>> sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
>> waiting in csd_lock_wait() from where it never came out,and so the stall.
>> Any help/pointer for further debugging would be very helpful. Problem also
>> reproduced with 4.13.0-rc3.
> 
> I'm sitting on this patch:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=15377ef4fe0c86eb7fa1099575b2e86357d99e42
> 
> please give that a spin.

Thanks for the pointer.
Tried, but it did not help.

-- 
Regards
Pratyush

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-02  8:44   ` Marc Zyngier
@ 2017-08-03  3:56     ` Pratyush Anand
  0 siblings, 0 replies; 10+ messages in thread
From: Pratyush Anand @ 2017-08-03  3:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On Wednesday 02 August 2017 02:14 PM, Marc Zyngier wrote:
> On 02/08/17 09:08, Will Deacon wrote:
>> Hi Pratyush,
>>
>> On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:
>>> I am observing following rcu_sched stall while executing `perf record -a  --
>>> sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
>>> waiting in csd_lock_wait() from where it never came out,and so the stall.
>>> Any help/pointer for further debugging would be very helpful. Problem also
>>> reproduced with 4.13.0-rc3.
>>
>> When you say "also", which other kernel(s) show the problem? Is this a
>> recent regression? Which platform are you running on?
>>
>> It would be interesting to know what the other CPUs are doing, in particular
>> the target of the cross-call. Either it crashed spectacularly and didn't
>> unlock the csd lock, or the IPI somehow wasn't delivered.
>>
>> Do you see any other splats if you enable lock debugging?
> 
> Also, is that in a guest, or bare metal? If that's a guest, what's the
> host's kernel version?

Its a host.
-- 
Regards
Pratyush

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-03  3:55   ` Pratyush Anand
@ 2017-08-03  7:19     ` Marc Zyngier
  2017-08-03 12:54     ` Paul E. McKenney
  1 sibling, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2017-08-03  7:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 3 Aug 2017 09:25:40 +0530
Pratyush Anand <panand@redhat.com> wrote:

> Hi Will,
> 
> On Wednesday 02 August 2017 01:38 PM, Will Deacon wrote:
> > Hi Pratyush,
> > 
> > On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:  
> >> I am observing following rcu_sched stall while executing `perf record -a  --
> >> sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
> >> waiting in csd_lock_wait() from where it never came out,and so the stall.
> >> Any help/pointer for further debugging would be very helpful. Problem also
> >> reproduced with 4.13.0-rc3.  
> > When you say "also", which other kernel(s) show the problem? Is this a
> > recent regression? Which platform are you running on?  
> 
> Other than 4.13.0-rc3 I had tested with 4.11 based kernel which has 4.11 
> vanila+some 4.12 patches back ported, and log which I had attached was from 
> same kernel, thats why I mentioned that it reproduced with vanilla upstream as 
> well.
> 
> > 
> > It would be interesting to know what the other CPUs are doing, in particular
> > the target of the cross-call. Either it crashed spectacularly and didn't
> > unlock the csd lock, or the IPI somehow wasn't delivered.
> > 
> > Do you see any other splats if you enable lock debugging?  
> 
> It was same.
> 
> Following is the log from 4.13.0-rc3 + patch pointed by Peter:
> 
> [  173.649589] perf: interrupt took too long (4952 > 4902), lowering 
> kernel.perf_event_max_sample_rate to 40300
> [  201.340926] INFO: rcu_sched self-detected stall on CPU
> [  201.345115] 	9-...: (6499 ticks this GP) idle=e1a/140000000000001/0 
> softirq=334/334 fqs=3250
> [  201.353617] 	 (t=6500 jiffies g=313 c=312 q=428)
> [  201.358220] Task dump for CPU 9:
> [  201.361431] perf            R  running task        0  1888   1864 0x00000202
> [  201.368462] Call trace:
> [  201.370897] [<ffff000008089498>] dump_backtrace+0x0/0x28c
> [  201.376276] [<ffff000008089748>] show_stack+0x24/0x2c
> [  201.381312] [<ffff00000810c5d0>] sched_show_task+0x19c/0x26c
> [  201.386952] [<ffff00000810df28>] dump_cpu_task+0x48/0x54
> [  201.392250] [<ffff00000816c564>] rcu_dump_cpu_stacks+0xac/0xf4
> [  201.398063] [<ffff00000816b714>] rcu_check_callbacks+0x908/0xc90
> [  201.404053] [<ffff000008172fe0>] update_process_times+0x34/0x5c
> [  201.409957] [<ffff00000818697c>] tick_sched_handle.isra.16+0x4c/0x70
> [  201.416292] [<ffff0000081869e8>] tick_sched_timer+0x48/0x88
> [  201.421847] [<ffff000008173c60>] __hrtimer_run_queues+0x17c/0x604
> [  201.427924] [<ffff000008174b90>] hrtimer_interrupt+0xa4/0x1e8
> [  201.433656] [<ffff0000086d9d34>] arch_timer_handler_phys+0x3c/0x48
> [  201.439818] [<ffff00000815b718>] handle_percpu_devid_irq+0xdc/0x42c
> [  201.446069] [<ffff00000815569c>] generic_handle_irq+0x34/0x4c
> [  201.451796] [<ffff000008155dec>] __handle_domain_irq+0x6c/0xc4
> [  201.457611] [<ffff0000080816c4>] gic_handle_irq+0xa0/0x1b0
> [  201.463080] Exception stack(0xffff8016df013a40 to 0xffff8016df013b70)
> [  201.469504] 3a40: 0000000000000000 0000000000000003 0000000000000000 
> ffff8016df013bd0
> [  201.477316] 3a60: ffff8016df013bd0 0000000000080000 ffff8016df013bb8 
> ffff0000082113c8
> [  201.485129] 3a80: 0000000000000000 0000ffffe507a9f0 00000000ffffffff 
> 0000000000000000
> [  201.492941] 3aa0: 0000000000000005 ffffffffffffffff 002f547d23157399 
> 00003a2a9f82ac9c
> [  201.500754] 3ac0: 0000000000000000 0000000000000000 0000ffffe507a7e0 
> ffff000008f5b000
> [  201.508566] 3ae0: ffff8016df013c08 ffff000008213fcc 0000000000000013 
> ffff8017616a7800
> [  201.516379] 3b00: ffff000008f5b000 ffff0000082179d4 0000000000000000 
> ffff0000088c1000
> [  201.524191] 3b20: ffff8017616a7800 ffff8016df013b70 ffff00000818bf04 
> ffff8016df013b70
> [  201.532004] 3b40: ffff00000818bf28 0000000020400145 ffff000008213fcc 
> 0000000000000013
> [  201.539816] 3b60: 0001000000000000 ffff8016df013bb8
> [  201.544677] [<ffff000008082eb8>] el1_irq+0xb8/0x140
> [  201.549539] [<ffff00000818bf28>] smp_call_function_single+0x160/0x184

[...]

We really need to get to the bottom of what is happening at that point.
Either the targeted CPU has not received the IPI, or it has locked-up
before replying to the source CPU. You probably want to instrument the
SMP code and find out what that CPU is doing, as there is little in
that trace that tells us what is happening.

Thanks,

	M.
-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-03  3:56   ` Pratyush Anand
@ 2017-08-03  8:32     ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-08-03  8:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 03, 2017 at 09:26:06AM +0530, Pratyush Anand wrote:
> Hi Peter,
> 
> On Wednesday 02 August 2017 01:44 PM, Peter Zijlstra wrote:
> > On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:
> > > Hi,
> > > 
> > > I am observing following rcu_sched stall while executing `perf record -a  --
> > > sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
> > > waiting in csd_lock_wait() from where it never came out,and so the stall.
> > > Any help/pointer for further debugging would be very helpful. Problem also
> > > reproduced with 4.13.0-rc3.
> > 
> > I'm sitting on this patch:
> > 
> >    https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=15377ef4fe0c86eb7fa1099575b2e86357d99e42
> > 
> > please give that a spin.
> 
> Thanks for the pointer.

Seems like my brain misfired in any case. That's the OSQ lock not the
CSD one ;-) Let me go stare at your traces with hopefully more
wakefulness.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* rcu_sched stall while waiting in csd_lock_wait()
  2017-08-03  3:55   ` Pratyush Anand
  2017-08-03  7:19     ` Marc Zyngier
@ 2017-08-03 12:54     ` Paul E. McKenney
  1 sibling, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2017-08-03 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 03, 2017 at 09:25:40AM +0530, Pratyush Anand wrote:
> Hi Will,
> 
> On Wednesday 02 August 2017 01:38 PM, Will Deacon wrote:
> >Hi Pratyush,
> >
> >On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:
> >>I am observing following rcu_sched stall while executing `perf record -a  --
> >>sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
> >>waiting in csd_lock_wait() from where it never came out,and so the stall.
> >>Any help/pointer for further debugging would be very helpful. Problem also
> >>reproduced with 4.13.0-rc3.
> >When you say "also", which other kernel(s) show the problem? Is this a
> >recent regression? Which platform are you running on?
> 
> Other than 4.13.0-rc3 I had tested with 4.11 based kernel which has
> 4.11 vanila+some 4.12 patches back ported, and log which I had
> attached was from same kernel, thats why I mentioned that it
> reproduced with vanilla upstream as well.
> 
> >
> >It would be interesting to know what the other CPUs are doing, in particular
> >the target of the cross-call. Either it crashed spectacularly and didn't
> >unlock the csd lock, or the IPI somehow wasn't delivered.
> >
> >Do you see any other splats if you enable lock debugging?
> 
> It was same.
> 
> Following is the log from 4.13.0-rc3 + patch pointed by Peter:
> 
> [  173.649589] perf: interrupt took too long (4952 > 4902), lowering
> kernel.perf_event_max_sample_rate to 40300
> [  201.340926] INFO: rcu_sched self-detected stall on CPU
> [  201.345115] 	9-...: (6499 ticks this GP)
> idle=e1a/140000000000001/0 softirq=334/334 fqs=3250
> [  201.353617] 	 (t=6500 jiffies g=313 c=312 q=428)
> [  201.358220] Task dump for CPU 9:
> [  201.361431] perf            R  running task        0  1888   1864 0x00000202
> [  201.368462] Call trace:
> [  201.370897] [<ffff000008089498>] dump_backtrace+0x0/0x28c
> [  201.376276] [<ffff000008089748>] show_stack+0x24/0x2c
> [  201.381312] [<ffff00000810c5d0>] sched_show_task+0x19c/0x26c
> [  201.386952] [<ffff00000810df28>] dump_cpu_task+0x48/0x54
> [  201.392250] [<ffff00000816c564>] rcu_dump_cpu_stacks+0xac/0xf4
> [  201.398063] [<ffff00000816b714>] rcu_check_callbacks+0x908/0xc90
> [  201.404053] [<ffff000008172fe0>] update_process_times+0x34/0x5c
> [  201.409957] [<ffff00000818697c>] tick_sched_handle.isra.16+0x4c/0x70
> [  201.416292] [<ffff0000081869e8>] tick_sched_timer+0x48/0x88
> [  201.421847] [<ffff000008173c60>] __hrtimer_run_queues+0x17c/0x604
> [  201.427924] [<ffff000008174b90>] hrtimer_interrupt+0xa4/0x1e8
> [  201.433656] [<ffff0000086d9d34>] arch_timer_handler_phys+0x3c/0x48
> [  201.439818] [<ffff00000815b718>] handle_percpu_devid_irq+0xdc/0x42c
> [  201.446069] [<ffff00000815569c>] generic_handle_irq+0x34/0x4c
> [  201.451796] [<ffff000008155dec>] __handle_domain_irq+0x6c/0xc4
> [  201.457611] [<ffff0000080816c4>] gic_handle_irq+0xa0/0x1b0
> [  201.463080] Exception stack(0xffff8016df013a40 to 0xffff8016df013b70)
> [  201.469504] 3a40: 0000000000000000 0000000000000003
> 0000000000000000 ffff8016df013bd0
> [  201.477316] 3a60: ffff8016df013bd0 0000000000080000
> ffff8016df013bb8 ffff0000082113c8
> [  201.485129] 3a80: 0000000000000000 0000ffffe507a9f0
> 00000000ffffffff 0000000000000000
> [  201.492941] 3aa0: 0000000000000005 ffffffffffffffff
> 002f547d23157399 00003a2a9f82ac9c
> [  201.500754] 3ac0: 0000000000000000 0000000000000000
> 0000ffffe507a7e0 ffff000008f5b000
> [  201.508566] 3ae0: ffff8016df013c08 ffff000008213fcc
> 0000000000000013 ffff8017616a7800
> [  201.516379] 3b00: ffff000008f5b000 ffff0000082179d4
> 0000000000000000 ffff0000088c1000
> [  201.524191] 3b20: ffff8017616a7800 ffff8016df013b70
> ffff00000818bf04 ffff8016df013b70
> [  201.532004] 3b40: ffff00000818bf28 0000000020400145
> ffff000008213fcc 0000000000000013
> [  201.539816] 3b60: 0001000000000000 ffff8016df013bb8
> [  201.544677] [<ffff000008082eb8>] el1_irq+0xb8/0x140
> [  201.549539] [<ffff00000818bf28>] smp_call_function_single+0x160/0x184
> [  201.555965] [<ffff000008211128>] cpu_function_call+0x40/0x64
> [  201.561605] [<ffff0000082113a0>] event_function_call+0x120/0x128
> [  201.567594] [<ffff0000082113ec>] _perf_event_disable+0x44/0x64
> [  201.573410] [<ffff000008210afc>] perf_event_for_each_child+0x3c/0x84

How many events are involved here?  If there is a huge number, then
the RCU CPU stall warning is expected behavior.

If this is the case, sticking a cond_resched_rcu_qs() in the loop
should fix this.

							Thanx, Paul

> [  201.579747] [<ffff00000821bcfc>] perf_ioctl+0x21c/0x9a4
> [  201.584957] [<ffff0000082eec6c>] do_vfs_ioctl+0xcc/0x874
> [  201.590250] [<ffff0000082ef4a4>] sys_ioctl+0x90/0xa4
> [  201.595198] [<ffff0000080834a0>] __sys_trace_return+0x0/0x4
> [  239.003035] INFO: rcu_sched detected expedited stalls on
> CPUs/tasks: { 9-... } 6592 jiffies s: 1149 root: 0x1/.
> [  239.012199] blocking rcu_node structures: l=1:0-14:0x200/.
> [  239.017695] Task dump for CPU 9:
> [  239.020880] perf            R  running task        0  1888   1864 0x00000202
> [  239.027929] Call trace:
> [  239.030346] [<ffff000008085c9c>] __switch_to+0x64/0x70
> [  239.035484] [<ffff000008235a00>] free_pcppages_bulk+0x43c/0x640
> [  262.304244] perf: interrupt took too long (6221 > 6190), lowering
> kernel.perf_event_max_sample_rate to 32100
> [  367.009704] INFO: task kworker/15:2:1187 blocked for more than 120 seconds.
> [  367.015713]       Tainted: G        W       4.13.0-rc3+ #2
> [  367.021200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  367.028994] kworker/15:2    D    0  1187      2 0x00000000
> [  367.034481] Workqueue: usb_hub_wq hub_event
> [  367.038629] Call trace:
> [  367.041077] [<ffff000008085c9c>] __switch_to+0x64/0x70
> [  367.046184] [<ffff0000088a3680>] __schedule+0x410/0xcbc
> [  367.051403] [<ffff0000088a3f6c>] schedule+0x40/0xa4
> [  367.056253] [<ffff00000816a578>]
> _synchronize_rcu_expedited.constprop.64+0x370/0x3e8
> [  367.063990] [<ffff00000816a66c>] synchronize_sched_expedited+0x7c/0xf0
> [  367.070501] [<ffff00000816aa0c>] synchronize_rcu_expedited+0x14/0x1c
> [  367.076825] [<ffff00000825ec24>] bdi_unregister+0x78/0x200
> [  367.082307] [<ffff00000841fbac>] del_gendisk+0x130/0x240
> [  367.087592] [<ffff000003de0688>] sr_remove+0x30/0x90 [sr_mod]
> [  367.093333] [<ffff00000859901c>] device_release_driver_internal+0x15c/0x1dc
> [  367.100275] [<ffff0000085990c4>] device_release_driver+0x28/0x34
> [  367.106251] [<ffff00000859794c>] bus_remove_device+0xe4/0x14c
> [  367.111993] [<ffff000008593890>] device_del+0x1dc/0x308
> [  367.117191] [<ffff0000085e121c>] __scsi_remove_device+0xfc/0x128
> [  367.123192] [<ffff0000085de894>] scsi_forget_host+0x74/0x78
> [  367.128778] [<ffff0000085d0cf8>] scsi_remove_host+0x7c/0x120
> [  367.134396] [<ffff000000ad3720>] usb_stor_disconnect+0x58/0xc8 [usb_storage]
> [  367.141421] [<ffff000008636fd8>] usb_unbind_interface+0x7c/0x268
> [  367.147397] [<ffff00000859901c>] device_release_driver_internal+0x15c/0x1dc
> [  367.154355] [<ffff0000085990c4>] device_release_driver+0x28/0x34
> [  367.160344] [<ffff00000859794c>] bus_remove_device+0xe4/0x14c
> [  367.166060] [<ffff000008593890>] device_del+0x1dc/0x308
> [  367.171282] [<ffff0000086344fc>] usb_disable_device+0xc4/0x2ac
> [  367.177084] [<ffff000008629de4>] usb_disconnect+0x98/0x248
> [  367.182566] [<ffff000008629ff8>] hub_quiesce+0x64/0xac
> [  367.187674] [<ffff00000862bfc0>] hub_event+0x110/0xa8c
> [  367.192810] [<ffff0000080f2a64>] process_one_work+0x228/0x6a8
> [  367.198568] [<ffff0000080f2f44>] worker_thread+0x60/0x3bc
> [  367.203922] [<ffff0000080faa40>] kthread+0x114/0x140
> [  367.208898] [<ffff0000080833d0>] ret_from_fork+0x10/0x40
> [  367.214171] INFO: task kworker/15:3:1890 blocked for more than 120 seconds.
> [  367.221108]       Tainted: G        W       4.13.0-rc3+ #2
> [  367.226563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  367.234388] kworker/15:3    D    0  1890      2 0x00000200
> [  367.239918] Workqueue: events wait_rcu_exp_gp
> [  367.244184] Call trace:
> [  367.246615] [<ffff000008085c9c>] __switch_to+0x64/0x70
> [  367.251751] [<ffff0000088a3680>] __schedule+0x410/0xcbc
> [  367.256945] [<ffff0000088a3f6c>] schedule+0x40/0xa4
> [  367.261820] [<ffff0000088a936c>] schedule_timeout+0x1e8/0x454
> [  367.267536] [<ffff000008165da0>] rcu_exp_wait_wake+0xb8/0x5b0
> [  367.273278] [<ffff00000816a1f8>] wait_rcu_exp_gp+0x3c/0x4c
> [  367.278776] [<ffff0000080f2a64>] process_one_work+0x228/0x6a8
> [  367.284475] [<ffff0000080f2f44>] worker_thread+0x60/0x3bc
> [  367.289931] [<ffff0000080faa40>] kthread+0x114/0x140
> [  367.294793] [<ffff0000080833d0>] ret_from_fork+0x10/0x40
> [  367.300100] INFO: lockdep is turned off.
> 
> -- 
> Regards
> Pratyush
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-08-03 12:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-02  3:31 rcu_sched stall while waiting in csd_lock_wait() Pratyush Anand
2017-08-02  8:08 ` Will Deacon
2017-08-02  8:44   ` Marc Zyngier
2017-08-03  3:56     ` Pratyush Anand
2017-08-03  3:55   ` Pratyush Anand
2017-08-03  7:19     ` Marc Zyngier
2017-08-03 12:54     ` Paul E. McKenney
2017-08-02  8:14 ` Peter Zijlstra
2017-08-03  3:56   ` Pratyush Anand
2017-08-03  8:32     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).