From: Simon Kirby <sim@hostway.ca>
To: linux-raid@vger.kernel.org
Subject: raid1 lockups on 4.12.x
Date: Mon, 31 Jul 2017 11:36:16 -0700 [thread overview]
Message-ID: <20170731183616.GD22429@hostway.ca> (raw)
Hello,
I recently upgraded two old boxes to 4.12.3 only to find that they
started to lock up several times per day. I upgraded to 4.12.4 and the
problem persisted. Downgrading to 4.11.11 seems to have stopped the
issue.
I captured the following with nmi_watchdog=1 and a serial console while
on 4.12.4.
[ 9672.236351] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md2_raid1:1704]
[ 9672.236351] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack serio_raw evdev e1000
[ 9672.236351] CPU: 0 PID: 1704 Comm: md2_raid1 Not tainted 4.12.4-hw #6
[ 9672.236351] Hardware name: Dell Computer Corporation PowerEdge 750 /0R1479, BIOS A03 10/22/2004
[ 9672.236351] task: f4a1a100 task.stack: f47fa000
[ 9672.236351] EIP: md_check_recovery+0x2d4/0x460
[ 9672.236351] EFLAGS: 00000246 CPU: 0
[ 9672.236351] EAX: 00000000 EBX: f448ec00 ECX: 00000001 EDX: 00000000
[ 9672.236351] ESI: 00000000 EDI: 00000000 EBP: f47fbe38 ESP: f47fbe28
[ 9672.236351] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 9672.236351] CR0: 80050033 CR2: b77c0000 CR3: 31501000 CR4: 000006d0
[ 9672.236351] Call Trace:
[ 9672.236351] raid1d+0x27/0x16f0
[ 9672.236351] ? ktime_get+0x44/0x100
[ 9672.236351] ? lapic_next_event+0x13/0x20
[ 9672.236351] ? clockevents_program_event+0x9d/0x150
[ 9672.236351] ? __hrtimer_run_queues+0x13a/0x210
[ 9672.236351] ? hrtimer_interrupt+0xb0/0x190
[ 9672.236351] ? smp_apic_timer_interrupt+0x33/0x40
[ 9672.236351] ? apic_timer_interrupt+0x32/0x38
[ 9672.236351] ? perf_trace_thermal_power_cpu_limit+0x68/0x120
[ 9672.344374] md_thread+0x116/0x140
[ 9672.344374] ? do_wait_intr+0x70/0x70
[ 9672.344374] kthread+0xe3/0x110
[ 9672.344374] ? md_register_thread+0xc0/0xc0
[ 9672.344374] ? __kthread_init_worker+0x90/0x90
[ 9672.344374] ret_from_fork+0x19/0x24
[ 9672.344374] Code: 68 01 00 00 ef f0 0f ba b3 68 01 00 00 05 72 47 8b b3 08 01 00 00 85 f6 0f 85 d9 fe ff ff f0 80 a3 68 01 00 00 fe b9 01 00 00 00 <ba> 03 00 00 00 c7 04 24 00 00 00 00 b8 c0 a8 bc c1 e8 86 d9 97
later, there were (looping) backtraces such as:
[34552.236353] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [md0_raid1:1684]
[34552.236353] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack serio_raw evdev e1000
[34552.236353] CPU: 0 PID: 1684 Comm: md0_raid1 Tainted: G L 4.12.4-hw #6
[34552.236353] Hardware name: Dell Computer Corporation PowerEdge 750 /0R1479, BIOS A03 10/22/2004
[34552.236353] task: f4a95280 task.stack: f52c0000
[34552.236353] EIP: _raw_spin_unlock_irqrestore+0xa/0x10
[34552.236353] EFLAGS: 00000292 CPU: 0
[34552.236353] EAX: 00000292 EBX: f46a3d08 ECX: 00000001 EDX: 00000292
[34552.236353] ESI: 00000292 EDI: 00000003 EBP: f52c1de4 ESP: f52c1de4
[34552.236353] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[34552.236353] CR0: 80050033 CR2: 08062e00 CR3: 347af000 CR4: 000006d0
[34552.236353] Call Trace:
[34552.236353] __wake_up+0x40/0x50
[34552.236353] md_wakeup_thread+0x2b/0x30
[34552.236353] mddev_unlock+0x74/0xc0
[34552.236353] md_check_recovery+0x1c3/0x460
[34552.324380] raid1d+0x27/0x16f0
[34552.324380] ? ktime_get+0x44/0x100
[34552.324380] ? lapic_next_event+0x13/0x20
[34552.324380] ? clockevents_program_event+0x9d/0x150
[34552.324380] ? __hrtimer_run_queues+0x13a/0x210
[34552.344366] ? smp_apic_timer_interrupt+0x33/0x40
[34552.344366] ? apic_timer_interrupt+0x32/0x38
[34552.344366] ? perf_trace_thermal_power_cpu_limit+0xb/0x120
[34552.344366] md_thread+0x116/0x140
[34552.344366] ? do_wait_intr+0x70/0x70
[34552.344366] kthread+0xe3/0x110
[34552.344366] ? md_register_thread+0xc0/0xc0
[34552.344366] ? __kthread_init_worker+0x90/0x90
[34552.344366] ret_from_fork+0x19/0x24
[34552.344366] Code: 00 f0 0f b1 0a 85 c0 75 08 b0 01 5d c3 8d 74 26 00 31 c0 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5 c6 00 00 89 d0 50 9d <8d> 74 26 00 5d c3 55 89 c1 89 e5 b8 00 ff ff ff f0 0f c1 01 89
[34567.020353] INFO: rcu_sched self-detected stall on CPU
[34567.020361] 0-...: (104347 ticks this GP) idle=d52/140000000000001/0 softirq=615378/615378 fqs=52025
[34567.020361] (t=105006 jiffies g=349461 c=349460 q=85)
[34567.020361] NMI backtrace for cpu 0
[34567.020361] CPU: 0 PID: 1684 Comm: md0_raid1 Tainted: G L 4.12.4-hw #6
[34567.020361] Hardware name: Dell Computer Corporation PowerEdge 750 /0R1479, BIOS A03 10/22/2004
[34567.020361] Call Trace:
[34567.020361] dump_stack+0x58/0x78
[34567.020361] nmi_cpu_backtrace+0xa6/0xb0
[34567.020361] nmi_trigger_cpumask_backtrace+0x92/0xd0
[34567.020361] ? irq_force_complete_move+0xd0/0xd0
[34567.020361] arch_trigger_cpumask_backtrace+0x10/0x20
[34567.020361] rcu_dump_cpu_stacks+0x64/0xa7
[34567.020361] rcu_check_callbacks+0x6af/0x710
[34567.020361] ? account_process_tick+0x54/0x130
[34567.020361] update_process_times+0x2a/0x60
[34567.020361] tick_sched_handle.isra.11+0x30/0x60
[34567.020361] tick_sched_timer+0x3b/0x80
[34567.020361] __hrtimer_run_queues+0xb0/0x210
[34567.020361] ? tick_nohz_handler+0xa0/0xa0
[34567.020361] hrtimer_interrupt+0x95/0x190
[34567.020361] local_apic_timer_interrupt+0x2a/0x50
[34567.020361] smp_apic_timer_interrupt+0x2e/0x40
[34567.020361] apic_timer_interrupt+0x32/0x38
[34567.020361] EIP: _raw_spin_unlock_irqrestore+0xa/0x10
[34567.020361] EFLAGS: 00000292 CPU: 0
[34567.020361] EAX: 00000292 EBX: f448e5f0 ECX: 00000001 EDX: 00000292
[34567.020361] ESI: 00000292 EDI: 00000003 EBP: f52c1e00 ESP: f52c1e00
[34567.020361] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[34567.020361] __wake_up+0x40/0x50
[34567.020361] md_check_recovery+0x1bc/0x460
[34567.020361] raid1d+0x27/0x16f0
[34567.020361] ? ktime_get+0x44/0x100
[34567.020361] ? lapic_next_event+0x13/0x20
[34567.020361] ? clockevents_program_event+0x9d/0x150
[34567.020361] ? __hrtimer_run_queues+0x13a/0x210
[34567.020361] ? hrtimer_interrupt+0xb0/0x190
[34567.020361] ? hrtimer_interrupt+0xb0/0x190
[34567.020361] ? smp_apic_timer_interrupt+0x33/0x40
[34567.020361] ? apic_timer_interrupt+0x32/0x38
[34567.020361] md_thread+0x116/0x140
[34567.020361] ? do_wait_intr+0x70/0x70
[34567.020361] kthread+0xe3/0x110
[34567.020361] ? md_register_thread+0xc0/0xc0
[34567.020361] ? __kthread_init_worker+0x90/0x90
[34567.020361] ret_from_fork+0x19/0x24
[34592.236352] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md0_raid1:1684]
[34592.236352] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack serio_raw evdev e1000
[34592.236352] CPU: 0 PID: 1684 Comm: md0_raid1 Tainted: G L 4.12.4-hw #6
[34592.236352] Hardware name: Dell Computer Corporation PowerEdge 750 /0R1479, BIOS A03 10/22/2004
[34592.236352] task: f4a95280 task.stack: f52c0000
[34592.236352] EIP: _raw_spin_unlock_irqrestore+0xa/0x10
[34592.236352] EFLAGS: 00000292 CPU: 0
[34592.236352] EAX: 00000292 EBX: c1bca8c0 ECX: 00000001 EDX: 00000292
[34592.236352] ESI: 00000292 EDI: 00000003 EBP: f52c1e00 ESP: f52c1e00
[34592.236352] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[34592.236352] CR0: 80050033 CR2: 08062e00 CR3: 347af000 CR4: 000006d0
[34592.236352] Call Trace:
[34592.236352] __wake_up+0x40/0x50
[34592.236352] md_check_recovery+0x2ea/0x460
[34592.236352] raid1d+0x27/0x16f0
[34592.236352] ? ktime_get+0x44/0x100
[34592.236352] ? lapic_next_event+0x13/0x20
[34592.236352] ? clockevents_program_event+0x9d/0x150
[34592.236352] ? __hrtimer_run_queues+0x13a/0x210
[34592.236352] ? tick_program_event+0x41/0x80
[34592.236352] ? hrtimer_interrupt+0xb0/0x190
[34592.344366] ? irq_exit+0x58/0xb0
[34592.344366] ? smp_apic_timer_interrupt+0x33/0x40
[34592.344366] ? apic_timer_interrupt+0x32/0x38
[34592.356384] ? perf_trace_thermal_power_cpu_limit+0xb/0x120
[34592.356384] md_thread+0x116/0x140
[34592.356384] ? do_wait_intr+0x70/0x70
[34592.356384] kthread+0xe3/0x110
[34592.356384] ? md_register_thread+0xc0/0xc0
[34592.356384] ? __kthread_init_worker+0x90/0x90
[34592.356384] ret_from_fork+0x19/0x24
[34592.356384] Code: 00 f0 0f b1 0a 85 c0 75 08 b0 01 5d c3 8d 74 26 00 31 c0 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5 c6 00 00 89 d0 50 9d <8d> 74 26 00 5d c3 55 89 c1 89 e5 b8 00 ff ff ff f0 0f c1 01 89
Config here: http://0x.ca/sim/ref/4.12.4-32/config
Simon-
next reply other threads:[~2017-07-31 18:36 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-31 18:36 Simon Kirby [this message]
2017-08-14 7:08 ` raid1 lockups on 4.12.x Simon Kirby
2017-08-14 8:40 ` Jack Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170731183616.GD22429@hostway.ca \
--to=sim@hostway.ca \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.