* Hung RAID5 array with discard @ 2014-12-18 3:08 Terry Hardie 2015-03-04 21:47 ` Terry Hardie 0 siblings, 1 reply; 7+ messages in thread From: Terry Hardie @ 2014-12-18 3:08 UTC (permalink / raw) To: linux-raid Hi, I am testing 3 SSDs (1TB Crucial M550 with DRZAT, and I tested they do return zeros after discard) with RAID5 and discard. I create the array with a 64k chunk size, and it starts to sync. During it's initial reconstruction, I do a mkfs.ext4, which starts to do the "Discarding device blocks". After a short period (I believe when the mkfs reaches the point where the reconstruction is at, all IO to the disks freezes, and mkfs does not advance. iostat shows 2 of the 3 drives at 100% utilization with no data read or written. After 2 minutes, I get the hung task dump. Most CPUs are idle, and here are a few which are not, which look like a deadlock to me: Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154399] INFO: rcu_sched detected stalls on CPUs/tasks: { 4 5} (detected by 3, t=285032 jiffies, g=1160, c=1159, q=0) Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154902] NMI backtrace for cpu 4 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154904] CPU: 4 PID: 2146 Comm: md3_raid5 Tainted: G W IOX 3.13.0-43-generic #72~precise1 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154905] Hardware name: Supermicro SYS-2028TP-HC1R/X10DRT-P, BIOS 1.0a 08/28/2014 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154906] task: ffff88202594c800 ti: ffff8810245a0000 task.ti: ffff8810245a0000 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154907] RIP: 0010:[<ffffffff817644c1>] [<ffffffff817644c1>] _raw_spin_lock_irqsave+0x41/0x60 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154910] RSP: 0018:ffff8810245a1cc8 EFLAGS: 00000006 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154911] RAX: 0000000000002ec5 RBX: ffff882028a6ec00 RCX: 0000000000007b78 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154912] RDX: 0000000000000202 RSI: 0000000000007b78 RDI: ffff882028a6ec10 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154913] RBP: ffff8810245a1cc8 R08: 0000000000007b76 R09: ffff882023629170 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154914] R10: 0000000000000000 R11: ffff882028a6ec00 R12: ffff882028a6ee68 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154915] R13: 0000000000000003 R14: 0000000000000002 R15: ffff882028a6ec10 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154916] FS: 0000000000000000(0000) GS:ffff88103fc80000(0000) knlGS:0000000000000000 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154917] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154918] CR2: 00007f208c2d0000 CR3: 0000000001c0d000 CR4: 00000000001407e0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154919] Stack: Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154920] ffff8810245a1d18 ffffffffa0149890 0000000000000002 ffff882028a6ee88 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154923] ffff882028a6ee68 ffff882028a6ec00 0000000000000008 ffff882028a6ee68 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154926] 0000000000000000 ffff882028a6ee50 ffff8810245a1d98 ffffffffa015212f Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154929] Call Trace: Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154933] [<ffffffffa0149890>] release_inactive_stripe_list+0x50/0x160 [raid456] Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154937] [<ffffffffa015212f>] handle_active_stripes.isra.38+0x7f/0x190 [raid456] Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154940] [<ffffffffa0152758>] raid5d+0x198/0x2f0 [raid456] Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154942] [<ffffffff815d30a7>] md_thread+0x117/0x150 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154945] [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154947] [<ffffffff815d2f90>] ? md_rdev_init+0x110/0x110 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154949] [<ffffffff8108fb59>] kthread+0xc9/0xe0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154952] [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154954] [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154956] [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154957] Code: 1f 44 00 00 b8 00 00 02 00 f0 0f c1 07 89 c1 c1 e9 10 66 39 c1 75 05 48 89 d0 5d c3 83 e1 fe 0f b7 f1 b8 00 80 00 00 44 0f b7 07 <66> 44 39 c1 74 e6 f3 90 83 e8 01 75 ef 0f 1f 80 00 00 00 00 eb Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155001] NMI backtrace for cpu 5 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155003] CPU: 5 PID: 2147 Comm: md3_resync Tainted: G W IOX 3.13.0-43-generic #72~precise1 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155004] Hardware name: Supermicro SYS-2028TP-HC1R/X10DRT-P, BIOS 1.0a 08/28/2014 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155005] task: ffff88202594b000 ti: ffff8810274a0000 task.ti: ffff8810274a0000 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155006] RIP: 0010:[<ffffffffa01483b7>] [<ffffffffa01483b7>] __find_stripe+0x57/0xa0 [raid456] Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155010] RSP: 0018:ffff8810274a1b68 EFLAGS: 00000006 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155011] RAX: ffff882027092da0 RBX: 0000000000a30c10 RCX: 0000000000000001 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155012] RDX: 0000000000000c10 RSI: 0000000000a30c10 RDI: ffff882028a6ec00 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155013] RBP: ffff8810274a1b88 R08: 0000000000000000 R09: 0000000000000000 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155014] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155015] R13: ffff882028a6ec00 R14: 0000000000000000 R15: ffff882028a6eda8 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155016] FS: 0000000000000000(0000) GS:ffff88103fca0000(0000) knlGS:0000000000000000 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155018] CR2: 00000000006e1dc8 CR3: 0000000001c0d000 CR4: 00000000001407e0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155019] Stack: Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155020] ffff8810274a1ba8 ffff882028a6ec00 000000007b767b00 ffff882028a6ec10 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155023] ffff8810274a1c28 ffffffffa0150555 ffff882023773b50 ffff882028a6eda8 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155026] 0000000200000001 ffff882028a6ec08 0000000000000000 0000000000a30c10 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155029] Call Trace: Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155033] [<ffffffffa0150555>] get_active_stripe+0x115/0x3e0 [raid456] Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155036] [<ffffffffa014aea8>] ? release_stripe+0x68/0x100 [raid456] Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155040] [<ffffffffa0154f3b>] sync_request+0x11b/0x2a0 [raid456] Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155042] [<ffffffff815d5ccf>] md_do_sync+0x84f/0xdb0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155046] [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155048] [<ffffffff815d30a7>] md_thread+0x117/0x150 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155050] [<ffffffff815d2f90>] ? md_rdev_init+0x110/0x110 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155052] [<ffffffff8108fb59>] kthread+0xc9/0xe0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155054] [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155057] [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155059] [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155060] Code: e2 f8 0f 00 00 48 8b 04 02 48 85 c0 75 25 f6 05 29 25 01 00 04 75 3e 31 c0 48 83 c4 08 5b 41 5c 41 5d 5d c3 66 44 39 60 30 74 ee <48> 8b 00 48 85 c0 74 db 48 39 58 38 75 f2 eb e9 48 89 f2 48 c7 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670082] INFO: task mkfs.ext4:2235 blocked for more than 120 seconds. Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670109] Tainted: G W IOX 3.13.0-43-generic #72~precise1 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670130] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670155] mkfs.ext4 D ffff881024fe39e0 0 2235 2080 0x00000000 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670158] ffff882026eafa68 0000000000000082 ffff88103fc73480 ffff882026eaffd8 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670162] 0000000000013480 0000000000013480 ffff8820293e8000 ffff88202208b000 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670165] ffff882026eafa78 ffff882028a6ec00 ffff882028a6ed98 ffff882028a6ec0c Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670169] Call Trace: Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670175] [<ffffffff81760ae9>] schedule+0x29/0x70 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670181] [<ffffffffa01506e3>] get_active_stripe+0x2a3/0x3e0 [raid456] Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670185] [<ffffffff8134c152>] ? blk_check_plugged+0x72/0xb0 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670189] [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670193] [<ffffffffa0155e44>] make_discard_request+0x108/0x12c4 [raid456] Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670196] [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670201] [<ffffffffa0155c91>] make_request+0x581/0x590 [raid456] Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670204] [<ffffffff8109cfd6>] ? ttwu_do_activate.constprop.82+0x66/0x70 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670207] [<ffffffff8109d097>] ? ttwu_queue+0xb7/0xd0 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670210] [<ffffffff8109f950>] ? try_to_wake_up+0x190/0x210 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670212] [<ffffffff815d2c53>] md_make_request+0xd3/0x230 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670216] [<ffffffff8115b085>] ? mempool_alloc_slab+0x15/0x20 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670219] [<ffffffff8134ceb7>] generic_make_request.part.62+0x77/0xb0 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670222] [<ffffffff8134d428>] generic_make_request+0x68/0x70 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670225] [<ffffffff8134d4a8>] submit_bio+0x78/0x160 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670228] [<ffffffff81202f80>] ? bio_alloc_bioset+0xa0/0x1d0 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670232] [<ffffffff813578c0>] blkdev_issue_discard+0x1f0/0x2a0 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670235] [<ffffffff8135c1f4>] blkdev_ioctl+0x354/0x810 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670238] [<ffffffff8101361d>] ? __switch_to+0x16d/0x4d0 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670241] [<ffffffff81204370>] block_ioctl+0x40/0x50 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670244] [<ffffffff811dd5c5>] do_vfs_ioctl+0x75/0x2c0 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670247] [<ffffffff817606be>] ? __schedule+0x38e/0x700 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670249] [<ffffffff811dd8a1>] SyS_ioctl+0x91/0xb0 Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670252] [<ffffffff8176d66d>] system_call_fastpath+0x1a/0x1f If I do the mkfs.ext4 after the initial reconstruction is done, is gets all the way through. I don't want to put this system into production, since this could mean this condition could show up in the future if the array needs to reconstruct again at a future point while in service. This is a test system in a lab, so I'd be happy to try some tests. Terry ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Hung RAID5 array with discard 2014-12-18 3:08 Hung RAID5 array with discard Terry Hardie @ 2015-03-04 21:47 ` Terry Hardie 2015-03-23 2:57 ` NeilBrown 0 siblings, 1 reply; 7+ messages in thread From: Terry Hardie @ 2015-03-04 21:47 UTC (permalink / raw) To: linux-raid Well, I'm dissapointed no one responded to this. This basically means linux RAID 4/5/6 and discard is fundamentally broken, and no one wants to acknowledge it. I hope someone finds this post while I still have my lab available and I can help them troubleshoot this issue. I tried this again today on 3.13.0-44-generic (Ubuntu) and was easily able to reproduce it. On Wed, Dec 17, 2014 at 7:08 PM, Terry Hardie <thardie@instartlogic.com> wrote: > Hi, > > I am testing 3 SSDs (1TB Crucial M550 with DRZAT, and I tested they do > return zeros after discard) with RAID5 and discard. I create the array > with a 64k chunk size, and it starts to sync. During it's initial > reconstruction, I do a mkfs.ext4, which starts to do the "Discarding > device blocks". After a short period (I believe when the mkfs reaches > the point where the reconstruction is at, all IO to the disks freezes, > and mkfs does not advance. iostat shows 2 of the 3 drives at 100% > utilization with no data read or written. After 2 minutes, I get the > hung task dump. Most CPUs are idle, and here are a few which are not, > which look like a deadlock to me: > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154399] INFO: > rcu_sched detected stalls on CPUs/tasks: { 4 5} (detected by 3, > t=285032 jiffies, g=1160, c=1159, q=0) > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154902] NMI > backtrace for cpu 4 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154904] CPU: 4 PID: > 2146 Comm: md3_raid5 Tainted: G W IOX 3.13.0-43-generic > #72~precise1 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154905] Hardware > name: Supermicro SYS-2028TP-HC1R/X10DRT-P, BIOS 1.0a 08/28/2014 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154906] task: > ffff88202594c800 ti: ffff8810245a0000 task.ti: ffff8810245a0000 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154907] RIP: > 0010:[<ffffffff817644c1>] [<ffffffff817644c1>] > _raw_spin_lock_irqsave+0x41/0x60 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154910] RSP: > 0018:ffff8810245a1cc8 EFLAGS: 00000006 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154911] RAX: > 0000000000002ec5 RBX: ffff882028a6ec00 RCX: 0000000000007b78 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154912] RDX: > 0000000000000202 RSI: 0000000000007b78 RDI: ffff882028a6ec10 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154913] RBP: > ffff8810245a1cc8 R08: 0000000000007b76 R09: ffff882023629170 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154914] R10: > 0000000000000000 R11: ffff882028a6ec00 R12: ffff882028a6ee68 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154915] R13: > 0000000000000003 R14: 0000000000000002 R15: ffff882028a6ec10 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154916] FS: > 0000000000000000(0000) GS:ffff88103fc80000(0000) > knlGS:0000000000000000 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154917] CS: 0010 > DS: 0000 ES: 0000 CR0: 0000000080050033 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154918] CR2: > 00007f208c2d0000 CR3: 0000000001c0d000 CR4: 00000000001407e0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154919] Stack: > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154920] > ffff8810245a1d18 ffffffffa0149890 0000000000000002 ffff882028a6ee88 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154923] > ffff882028a6ee68 ffff882028a6ec00 0000000000000008 ffff882028a6ee68 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154926] > 0000000000000000 ffff882028a6ee50 ffff8810245a1d98 ffffffffa015212f > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154929] Call Trace: > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154933] > [<ffffffffa0149890>] release_inactive_stripe_list+0x50/0x160 [raid456] > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154937] > [<ffffffffa015212f>] handle_active_stripes.isra.38+0x7f/0x190 > [raid456] > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154940] > [<ffffffffa0152758>] raid5d+0x198/0x2f0 [raid456] > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154942] > [<ffffffff815d30a7>] md_thread+0x117/0x150 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154945] > [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154947] > [<ffffffff815d2f90>] ? md_rdev_init+0x110/0x110 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154949] > [<ffffffff8108fb59>] kthread+0xc9/0xe0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154952] > [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154954] > [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154956] > [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154957] Code: 1f 44 > 00 00 b8 00 00 02 00 f0 0f c1 07 89 c1 c1 e9 10 66 39 c1 75 05 48 89 > d0 5d c3 83 e1 fe 0f b7 f1 b8 00 80 00 00 44 0f b7 07 <66> 44 39 c1 74 > e6 f3 90 83 e8 01 75 ef 0f 1f 80 00 00 00 00 eb > > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155001] NMI > backtrace for cpu 5 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155003] CPU: 5 PID: > 2147 Comm: md3_resync Tainted: G W IOX 3.13.0-43-generic > #72~precise1 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155004] Hardware > name: Supermicro SYS-2028TP-HC1R/X10DRT-P, BIOS 1.0a 08/28/2014 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155005] task: > ffff88202594b000 ti: ffff8810274a0000 task.ti: ffff8810274a0000 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155006] RIP: > 0010:[<ffffffffa01483b7>] [<ffffffffa01483b7>] > __find_stripe+0x57/0xa0 [raid456] > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155010] RSP: > 0018:ffff8810274a1b68 EFLAGS: 00000006 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155011] RAX: > ffff882027092da0 RBX: 0000000000a30c10 RCX: 0000000000000001 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155012] RDX: > 0000000000000c10 RSI: 0000000000a30c10 RDI: ffff882028a6ec00 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155013] RBP: > ffff8810274a1b88 R08: 0000000000000000 R09: 0000000000000000 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155014] R10: > 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155015] R13: > ffff882028a6ec00 R14: 0000000000000000 R15: ffff882028a6eda8 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155016] FS: > 0000000000000000(0000) GS:ffff88103fca0000(0000) > knlGS:0000000000000000 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155017] CS: 0010 > DS: 0000 ES: 0000 CR0: 0000000080050033 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155018] CR2: > 00000000006e1dc8 CR3: 0000000001c0d000 CR4: 00000000001407e0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155019] Stack: > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155020] > ffff8810274a1ba8 ffff882028a6ec00 000000007b767b00 ffff882028a6ec10 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155023] > ffff8810274a1c28 ffffffffa0150555 ffff882023773b50 ffff882028a6eda8 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155026] > 0000000200000001 ffff882028a6ec08 0000000000000000 0000000000a30c10 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155029] Call Trace: > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155033] > [<ffffffffa0150555>] get_active_stripe+0x115/0x3e0 [raid456] > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155036] > [<ffffffffa014aea8>] ? release_stripe+0x68/0x100 [raid456] > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155040] > [<ffffffffa0154f3b>] sync_request+0x11b/0x2a0 [raid456] > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155042] > [<ffffffff815d5ccf>] md_do_sync+0x84f/0xdb0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155046] > [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155048] > [<ffffffff815d30a7>] md_thread+0x117/0x150 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155050] > [<ffffffff815d2f90>] ? md_rdev_init+0x110/0x110 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155052] > [<ffffffff8108fb59>] kthread+0xc9/0xe0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155054] > [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155057] > [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155059] > [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155060] Code: e2 f8 > 0f 00 00 48 8b 04 02 48 85 c0 75 25 f6 05 29 25 01 00 04 75 3e 31 c0 > 48 83 c4 08 5b 41 5c 41 5d 5d c3 66 44 39 60 30 74 ee <48> 8b 00 48 85 > c0 74 db 48 39 58 38 75 f2 eb e9 48 89 f2 48 c7 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670082] INFO: task > mkfs.ext4:2235 blocked for more than 120 seconds. > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670109] > Tainted: G W IOX 3.13.0-43-generic #72~precise1 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670130] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670155] mkfs.ext4 > D ffff881024fe39e0 0 2235 2080 0x00000000 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670158] > ffff882026eafa68 0000000000000082 ffff88103fc73480 ffff882026eaffd8 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670162] > 0000000000013480 0000000000013480 ffff8820293e8000 ffff88202208b000 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670165] > ffff882026eafa78 ffff882028a6ec00 ffff882028a6ed98 ffff882028a6ec0c > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670169] Call Trace: > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670175] > [<ffffffff81760ae9>] schedule+0x29/0x70 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670181] > [<ffffffffa01506e3>] get_active_stripe+0x2a3/0x3e0 [raid456] > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670185] > [<ffffffff8134c152>] ? blk_check_plugged+0x72/0xb0 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670189] > [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670193] > [<ffffffffa0155e44>] make_discard_request+0x108/0x12c4 [raid456] > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670196] > [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670201] > [<ffffffffa0155c91>] make_request+0x581/0x590 [raid456] > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670204] > [<ffffffff8109cfd6>] ? ttwu_do_activate.constprop.82+0x66/0x70 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670207] > [<ffffffff8109d097>] ? ttwu_queue+0xb7/0xd0 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670210] > [<ffffffff8109f950>] ? try_to_wake_up+0x190/0x210 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670212] > [<ffffffff815d2c53>] md_make_request+0xd3/0x230 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670216] > [<ffffffff8115b085>] ? mempool_alloc_slab+0x15/0x20 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670219] > [<ffffffff8134ceb7>] generic_make_request.part.62+0x77/0xb0 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670222] > [<ffffffff8134d428>] generic_make_request+0x68/0x70 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670225] > [<ffffffff8134d4a8>] submit_bio+0x78/0x160 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670228] > [<ffffffff81202f80>] ? bio_alloc_bioset+0xa0/0x1d0 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670232] > [<ffffffff813578c0>] blkdev_issue_discard+0x1f0/0x2a0 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670235] > [<ffffffff8135c1f4>] blkdev_ioctl+0x354/0x810 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670238] > [<ffffffff8101361d>] ? __switch_to+0x16d/0x4d0 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670241] > [<ffffffff81204370>] block_ioctl+0x40/0x50 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670244] > [<ffffffff811dd5c5>] do_vfs_ioctl+0x75/0x2c0 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670247] > [<ffffffff817606be>] ? __schedule+0x38e/0x700 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670249] > [<ffffffff811dd8a1>] SyS_ioctl+0x91/0xb0 > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670252] > [<ffffffff8176d66d>] system_call_fastpath+0x1a/0x1f > > > > > If I do the mkfs.ext4 after the initial reconstruction is done, is > gets all the way through. I don't want to put this system into > production, since this could mean this condition could show up in the > future if the array needs to reconstruct again at a future point while > in service. > > This is a test system in a lab, so I'd be happy to try some tests. > > Terry ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Hung RAID5 array with discard 2015-03-04 21:47 ` Terry Hardie @ 2015-03-23 2:57 ` NeilBrown 2015-10-22 16:07 ` Peter Kieser 0 siblings, 1 reply; 7+ messages in thread From: NeilBrown @ 2015-03-23 2:57 UTC (permalink / raw) To: Terry Hardie; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 14489 bytes --] On Wed, 4 Mar 2015 13:47:08 -0800 Terry Hardie <thardie@instartlogic.com> wrote: > Well, I'm dissapointed no one responded to this. This basically means > linux RAID 4/5/6 and discard is fundamentally broken, and no one wants > to acknowledge it. It might just mean that no-one noticed your email, or that they were busy, or were just about to leave on Christmas holidays or ...... If you don't get a response, resending after a reasonable period (couple of weeks) is perfectly acceptable. > > I hope someone finds this post while I still have my lab available and > I can help them troubleshoot this issue. > > I tried this again today on 3.13.0-44-generic (Ubuntu) and was easily > able to reproduce it. Can you try with a more recent kernel? 3.13.0 is over year old and there is at least one raid5 bugfix that went into the 3.13-stable series. If you can reproduce with 3.19, I'll definitely look into it. Thanks for the report, NeilBrown > > On Wed, Dec 17, 2014 at 7:08 PM, Terry Hardie <thardie@instartlogic.com> wrote: > > Hi, > > > > I am testing 3 SSDs (1TB Crucial M550 with DRZAT, and I tested they do > > return zeros after discard) with RAID5 and discard. I create the array > > with a 64k chunk size, and it starts to sync. During it's initial > > reconstruction, I do a mkfs.ext4, which starts to do the "Discarding > > device blocks". After a short period (I believe when the mkfs reaches > > the point where the reconstruction is at, all IO to the disks freezes, > > and mkfs does not advance. iostat shows 2 of the 3 drives at 100% > > utilization with no data read or written. After 2 minutes, I get the > > hung task dump. Most CPUs are idle, and here are a few which are not, > > which look like a deadlock to me: > > > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154399] INFO: > > rcu_sched detected stalls on CPUs/tasks: { 4 5} (detected by 3, > > t=285032 jiffies, g=1160, c=1159, q=0) > > > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154902] NMI > > backtrace for cpu 4 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154904] CPU: 4 PID: > > 2146 Comm: md3_raid5 Tainted: G W IOX 3.13.0-43-generic > > #72~precise1 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154905] Hardware > > name: Supermicro SYS-2028TP-HC1R/X10DRT-P, BIOS 1.0a 08/28/2014 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154906] task: > > ffff88202594c800 ti: ffff8810245a0000 task.ti: ffff8810245a0000 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154907] RIP: > > 0010:[<ffffffff817644c1>] [<ffffffff817644c1>] > > _raw_spin_lock_irqsave+0x41/0x60 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154910] RSP: > > 0018:ffff8810245a1cc8 EFLAGS: 00000006 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154911] RAX: > > 0000000000002ec5 RBX: ffff882028a6ec00 RCX: 0000000000007b78 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154912] RDX: > > 0000000000000202 RSI: 0000000000007b78 RDI: ffff882028a6ec10 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154913] RBP: > > ffff8810245a1cc8 R08: 0000000000007b76 R09: ffff882023629170 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154914] R10: > > 0000000000000000 R11: ffff882028a6ec00 R12: ffff882028a6ee68 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154915] R13: > > 0000000000000003 R14: 0000000000000002 R15: ffff882028a6ec10 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154916] FS: > > 0000000000000000(0000) GS:ffff88103fc80000(0000) > > knlGS:0000000000000000 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154917] CS: 0010 > > DS: 0000 ES: 0000 CR0: 0000000080050033 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154918] CR2: > > 00007f208c2d0000 CR3: 0000000001c0d000 CR4: 00000000001407e0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154919] Stack: > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154920] > > ffff8810245a1d18 ffffffffa0149890 0000000000000002 ffff882028a6ee88 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154923] > > ffff882028a6ee68 ffff882028a6ec00 0000000000000008 ffff882028a6ee68 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154926] > > 0000000000000000 ffff882028a6ee50 ffff8810245a1d98 ffffffffa015212f > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154929] Call Trace: > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154933] > > [<ffffffffa0149890>] release_inactive_stripe_list+0x50/0x160 [raid456] > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154937] > > [<ffffffffa015212f>] handle_active_stripes.isra.38+0x7f/0x190 > > [raid456] > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154940] > > [<ffffffffa0152758>] raid5d+0x198/0x2f0 [raid456] > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154942] > > [<ffffffff815d30a7>] md_thread+0x117/0x150 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154945] > > [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154947] > > [<ffffffff815d2f90>] ? md_rdev_init+0x110/0x110 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154949] > > [<ffffffff8108fb59>] kthread+0xc9/0xe0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154952] > > [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154954] > > [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154956] > > [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154957] Code: 1f 44 > > 00 00 b8 00 00 02 00 f0 0f c1 07 89 c1 c1 e9 10 66 39 c1 75 05 48 89 > > d0 5d c3 83 e1 fe 0f b7 f1 b8 00 80 00 00 44 0f b7 07 <66> 44 39 c1 74 > > e6 f3 90 83 e8 01 75 ef 0f 1f 80 00 00 00 00 eb > > > > > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155001] NMI > > backtrace for cpu 5 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155003] CPU: 5 PID: > > 2147 Comm: md3_resync Tainted: G W IOX 3.13.0-43-generic > > #72~precise1 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155004] Hardware > > name: Supermicro SYS-2028TP-HC1R/X10DRT-P, BIOS 1.0a 08/28/2014 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155005] task: > > ffff88202594b000 ti: ffff8810274a0000 task.ti: ffff8810274a0000 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155006] RIP: > > 0010:[<ffffffffa01483b7>] [<ffffffffa01483b7>] > > __find_stripe+0x57/0xa0 [raid456] > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155010] RSP: > > 0018:ffff8810274a1b68 EFLAGS: 00000006 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155011] RAX: > > ffff882027092da0 RBX: 0000000000a30c10 RCX: 0000000000000001 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155012] RDX: > > 0000000000000c10 RSI: 0000000000a30c10 RDI: ffff882028a6ec00 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155013] RBP: > > ffff8810274a1b88 R08: 0000000000000000 R09: 0000000000000000 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155014] R10: > > 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155015] R13: > > ffff882028a6ec00 R14: 0000000000000000 R15: ffff882028a6eda8 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155016] FS: > > 0000000000000000(0000) GS:ffff88103fca0000(0000) > > knlGS:0000000000000000 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155017] CS: 0010 > > DS: 0000 ES: 0000 CR0: 0000000080050033 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155018] CR2: > > 00000000006e1dc8 CR3: 0000000001c0d000 CR4: 00000000001407e0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155019] Stack: > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155020] > > ffff8810274a1ba8 ffff882028a6ec00 000000007b767b00 ffff882028a6ec10 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155023] > > ffff8810274a1c28 ffffffffa0150555 ffff882023773b50 ffff882028a6eda8 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155026] > > 0000000200000001 ffff882028a6ec08 0000000000000000 0000000000a30c10 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155029] Call Trace: > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155033] > > [<ffffffffa0150555>] get_active_stripe+0x115/0x3e0 [raid456] > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155036] > > [<ffffffffa014aea8>] ? release_stripe+0x68/0x100 [raid456] > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155040] > > [<ffffffffa0154f3b>] sync_request+0x11b/0x2a0 [raid456] > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155042] > > [<ffffffff815d5ccf>] md_do_sync+0x84f/0xdb0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155046] > > [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155048] > > [<ffffffff815d30a7>] md_thread+0x117/0x150 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155050] > > [<ffffffff815d2f90>] ? md_rdev_init+0x110/0x110 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155052] > > [<ffffffff8108fb59>] kthread+0xc9/0xe0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155054] > > [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155057] > > [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155059] > > [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 > > Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155060] Code: e2 f8 > > 0f 00 00 48 8b 04 02 48 85 c0 75 25 f6 05 29 25 01 00 04 75 3e 31 c0 > > 48 83 c4 08 5b 41 5c 41 5d 5d c3 66 44 39 60 30 74 ee <48> 8b 00 48 85 > > c0 74 db 48 39 58 38 75 f2 eb e9 48 89 f2 48 c7 > > > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670082] INFO: task > > mkfs.ext4:2235 blocked for more than 120 seconds. > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670109] > > Tainted: G W IOX 3.13.0-43-generic #72~precise1 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670130] "echo 0 > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670155] mkfs.ext4 > > D ffff881024fe39e0 0 2235 2080 0x00000000 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670158] > > ffff882026eafa68 0000000000000082 ffff88103fc73480 ffff882026eaffd8 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670162] > > 0000000000013480 0000000000013480 ffff8820293e8000 ffff88202208b000 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670165] > > ffff882026eafa78 ffff882028a6ec00 ffff882028a6ed98 ffff882028a6ec0c > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670169] Call Trace: > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670175] > > [<ffffffff81760ae9>] schedule+0x29/0x70 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670181] > > [<ffffffffa01506e3>] get_active_stripe+0x2a3/0x3e0 [raid456] > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670185] > > [<ffffffff8134c152>] ? blk_check_plugged+0x72/0xb0 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670189] > > [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670193] > > [<ffffffffa0155e44>] make_discard_request+0x108/0x12c4 [raid456] > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670196] > > [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670201] > > [<ffffffffa0155c91>] make_request+0x581/0x590 [raid456] > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670204] > > [<ffffffff8109cfd6>] ? ttwu_do_activate.constprop.82+0x66/0x70 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670207] > > [<ffffffff8109d097>] ? ttwu_queue+0xb7/0xd0 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670210] > > [<ffffffff8109f950>] ? try_to_wake_up+0x190/0x210 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670212] > > [<ffffffff815d2c53>] md_make_request+0xd3/0x230 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670216] > > [<ffffffff8115b085>] ? mempool_alloc_slab+0x15/0x20 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670219] > > [<ffffffff8134ceb7>] generic_make_request.part.62+0x77/0xb0 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670222] > > [<ffffffff8134d428>] generic_make_request+0x68/0x70 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670225] > > [<ffffffff8134d4a8>] submit_bio+0x78/0x160 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670228] > > [<ffffffff81202f80>] ? bio_alloc_bioset+0xa0/0x1d0 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670232] > > [<ffffffff813578c0>] blkdev_issue_discard+0x1f0/0x2a0 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670235] > > [<ffffffff8135c1f4>] blkdev_ioctl+0x354/0x810 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670238] > > [<ffffffff8101361d>] ? __switch_to+0x16d/0x4d0 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670241] > > [<ffffffff81204370>] block_ioctl+0x40/0x50 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670244] > > [<ffffffff811dd5c5>] do_vfs_ioctl+0x75/0x2c0 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670247] > > [<ffffffff817606be>] ? __schedule+0x38e/0x700 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670249] > > [<ffffffff811dd8a1>] SyS_ioctl+0x91/0xb0 > > Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670252] > > [<ffffffff8176d66d>] system_call_fastpath+0x1a/0x1f > > > > > > > > > > If I do the mkfs.ext4 after the initial reconstruction is done, is > > gets all the way through. I don't want to put this system into > > production, since this could mean this condition could show up in the > > future if the array needs to reconstruct again at a future point while > > in service. > > > > This is a test system in a lab, so I'd be happy to try some tests. > > > > Terry > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Hung RAID5 array with discard 2015-03-23 2:57 ` NeilBrown @ 2015-10-22 16:07 ` Peter Kieser 2015-10-23 4:42 ` Neil Brown 0 siblings, 1 reply; 7+ messages in thread From: Peter Kieser @ 2015-10-22 16:07 UTC (permalink / raw) To: NeilBrown, Terry Hardie; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 14710 bytes --] FYI I ran into this problem in 3.18.22 when forcing RAID5 TRIM support w/ raid456.devices_handle_discard_safely on Intel DC S3500 SSDs. -Peter On 2015-03-22 7:57 PM, NeilBrown wrote: > On Wed, 4 Mar 2015 13:47:08 -0800 Terry Hardie <thardie@instartlogic.com> > wrote: > >> Well, I'm dissapointed no one responded to this. This basically means >> linux RAID 4/5/6 and discard is fundamentally broken, and no one wants >> to acknowledge it. > It might just mean that no-one noticed your email, or that they were busy, or > were just about to leave on Christmas holidays or ...... > > If you don't get a response, resending after a reasonable period (couple of > weeks) is perfectly acceptable. >> I hope someone finds this post while I still have my lab available and >> I can help them troubleshoot this issue. >> >> I tried this again today on 3.13.0-44-generic (Ubuntu) and was easily >> able to reproduce it. > Can you try with a more recent kernel? 3.13.0 is over year old and there is > at least one raid5 bugfix that went into the 3.13-stable series. > > If you can reproduce with 3.19, I'll definitely look into it. > > Thanks for the report, > > NeilBrown > >> On Wed, Dec 17, 2014 at 7:08 PM, Terry Hardie <thardie@instartlogic.com> wrote: >>> Hi, >>> >>> I am testing 3 SSDs (1TB Crucial M550 with DRZAT, and I tested they do >>> return zeros after discard) with RAID5 and discard. I create the array >>> with a 64k chunk size, and it starts to sync. During it's initial >>> reconstruction, I do a mkfs.ext4, which starts to do the "Discarding >>> device blocks". After a short period (I believe when the mkfs reaches >>> the point where the reconstruction is at, all IO to the disks freezes, >>> and mkfs does not advance. iostat shows 2 of the 3 drives at 100% >>> utilization with no data read or written. After 2 minutes, I get the >>> hung task dump. Most CPUs are idle, and here are a few which are not, >>> which look like a deadlock to me: >>> >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154399] INFO: >>> rcu_sched detected stalls on CPUs/tasks: { 4 5} (detected by 3, >>> t=285032 jiffies, g=1160, c=1159, q=0) >>> >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154902] NMI >>> backtrace for cpu 4 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154904] CPU: 4 PID: >>> 2146 Comm: md3_raid5 Tainted: G W IOX 3.13.0-43-generic >>> #72~precise1 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154905] Hardware >>> name: Supermicro SYS-2028TP-HC1R/X10DRT-P, BIOS 1.0a 08/28/2014 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154906] task: >>> ffff88202594c800 ti: ffff8810245a0000 task.ti: ffff8810245a0000 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154907] RIP: >>> 0010:[<ffffffff817644c1>] [<ffffffff817644c1>] >>> _raw_spin_lock_irqsave+0x41/0x60 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154910] RSP: >>> 0018:ffff8810245a1cc8 EFLAGS: 00000006 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154911] RAX: >>> 0000000000002ec5 RBX: ffff882028a6ec00 RCX: 0000000000007b78 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154912] RDX: >>> 0000000000000202 RSI: 0000000000007b78 RDI: ffff882028a6ec10 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154913] RBP: >>> ffff8810245a1cc8 R08: 0000000000007b76 R09: ffff882023629170 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154914] R10: >>> 0000000000000000 R11: ffff882028a6ec00 R12: ffff882028a6ee68 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154915] R13: >>> 0000000000000003 R14: 0000000000000002 R15: ffff882028a6ec10 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154916] FS: >>> 0000000000000000(0000) GS:ffff88103fc80000(0000) >>> knlGS:0000000000000000 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154917] CS: 0010 >>> DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154918] CR2: >>> 00007f208c2d0000 CR3: 0000000001c0d000 CR4: 00000000001407e0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154919] Stack: >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154920] >>> ffff8810245a1d18 ffffffffa0149890 0000000000000002 ffff882028a6ee88 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154923] >>> ffff882028a6ee68 ffff882028a6ec00 0000000000000008 ffff882028a6ee68 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154926] >>> 0000000000000000 ffff882028a6ee50 ffff8810245a1d98 ffffffffa015212f >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154929] Call Trace: >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154933] >>> [<ffffffffa0149890>] release_inactive_stripe_list+0x50/0x160 [raid456] >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154937] >>> [<ffffffffa015212f>] handle_active_stripes.isra.38+0x7f/0x190 >>> [raid456] >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154940] >>> [<ffffffffa0152758>] raid5d+0x198/0x2f0 [raid456] >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154942] >>> [<ffffffff815d30a7>] md_thread+0x117/0x150 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154945] >>> [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154947] >>> [<ffffffff815d2f90>] ? md_rdev_init+0x110/0x110 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154949] >>> [<ffffffff8108fb59>] kthread+0xc9/0xe0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154952] >>> [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154954] >>> [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154956] >>> [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.154957] Code: 1f 44 >>> 00 00 b8 00 00 02 00 f0 0f c1 07 89 c1 c1 e9 10 66 39 c1 75 05 48 89 >>> d0 5d c3 83 e1 fe 0f b7 f1 b8 00 80 00 00 44 0f b7 07 <66> 44 39 c1 74 >>> e6 f3 90 83 e8 01 75 ef 0f 1f 80 00 00 00 00 eb >>> >>> >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155001] NMI >>> backtrace for cpu 5 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155003] CPU: 5 PID: >>> 2147 Comm: md3_resync Tainted: G W IOX 3.13.0-43-generic >>> #72~precise1 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155004] Hardware >>> name: Supermicro SYS-2028TP-HC1R/X10DRT-P, BIOS 1.0a 08/28/2014 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155005] task: >>> ffff88202594b000 ti: ffff8810274a0000 task.ti: ffff8810274a0000 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155006] RIP: >>> 0010:[<ffffffffa01483b7>] [<ffffffffa01483b7>] >>> __find_stripe+0x57/0xa0 [raid456] >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155010] RSP: >>> 0018:ffff8810274a1b68 EFLAGS: 00000006 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155011] RAX: >>> ffff882027092da0 RBX: 0000000000a30c10 RCX: 0000000000000001 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155012] RDX: >>> 0000000000000c10 RSI: 0000000000a30c10 RDI: ffff882028a6ec00 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155013] RBP: >>> ffff8810274a1b88 R08: 0000000000000000 R09: 0000000000000000 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155014] R10: >>> 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155015] R13: >>> ffff882028a6ec00 R14: 0000000000000000 R15: ffff882028a6eda8 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155016] FS: >>> 0000000000000000(0000) GS:ffff88103fca0000(0000) >>> knlGS:0000000000000000 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155017] CS: 0010 >>> DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155018] CR2: >>> 00000000006e1dc8 CR3: 0000000001c0d000 CR4: 00000000001407e0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155019] Stack: >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155020] >>> ffff8810274a1ba8 ffff882028a6ec00 000000007b767b00 ffff882028a6ec10 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155023] >>> ffff8810274a1c28 ffffffffa0150555 ffff882023773b50 ffff882028a6eda8 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155026] >>> 0000000200000001 ffff882028a6ec08 0000000000000000 0000000000a30c10 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155029] Call Trace: >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155033] >>> [<ffffffffa0150555>] get_active_stripe+0x115/0x3e0 [raid456] >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155036] >>> [<ffffffffa014aea8>] ? release_stripe+0x68/0x100 [raid456] >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155040] >>> [<ffffffffa0154f3b>] sync_request+0x11b/0x2a0 [raid456] >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155042] >>> [<ffffffff815d5ccf>] md_do_sync+0x84f/0xdb0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155046] >>> [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155048] >>> [<ffffffff815d30a7>] md_thread+0x117/0x150 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155050] >>> [<ffffffff815d2f90>] ? md_rdev_init+0x110/0x110 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155052] >>> [<ffffffff8108fb59>] kthread+0xc9/0xe0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155054] >>> [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155057] >>> [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155059] >>> [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0 >>> Dec 18 00:57:41 unassigned-hostname kernel: [ 1606.155060] Code: e2 f8 >>> 0f 00 00 48 8b 04 02 48 85 c0 75 25 f6 05 29 25 01 00 04 75 3e 31 c0 >>> 48 83 c4 08 5b 41 5c 41 5d 5d c3 66 44 39 60 30 74 ee <48> 8b 00 48 85 >>> c0 74 db 48 39 58 38 75 f2 eb e9 48 89 f2 48 c7 >>> >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670082] INFO: task >>> mkfs.ext4:2235 blocked for more than 120 seconds. >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670109] >>> Tainted: G W IOX 3.13.0-43-generic #72~precise1 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670130] "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670155] mkfs.ext4 >>> D ffff881024fe39e0 0 2235 2080 0x00000000 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670158] >>> ffff882026eafa68 0000000000000082 ffff88103fc73480 ffff882026eaffd8 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670162] >>> 0000000000013480 0000000000013480 ffff8820293e8000 ffff88202208b000 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670165] >>> ffff882026eafa78 ffff882028a6ec00 ffff882028a6ed98 ffff882028a6ec0c >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670169] Call Trace: >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670175] >>> [<ffffffff81760ae9>] schedule+0x29/0x70 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670181] >>> [<ffffffffa01506e3>] get_active_stripe+0x2a3/0x3e0 [raid456] >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670185] >>> [<ffffffff8134c152>] ? blk_check_plugged+0x72/0xb0 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670189] >>> [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670193] >>> [<ffffffffa0155e44>] make_discard_request+0x108/0x12c4 [raid456] >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670196] >>> [<ffffffff810affe0>] ? __wake_up_sync+0x20/0x20 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670201] >>> [<ffffffffa0155c91>] make_request+0x581/0x590 [raid456] >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670204] >>> [<ffffffff8109cfd6>] ? ttwu_do_activate.constprop.82+0x66/0x70 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670207] >>> [<ffffffff8109d097>] ? ttwu_queue+0xb7/0xd0 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670210] >>> [<ffffffff8109f950>] ? try_to_wake_up+0x190/0x210 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670212] >>> [<ffffffff815d2c53>] md_make_request+0xd3/0x230 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670216] >>> [<ffffffff8115b085>] ? mempool_alloc_slab+0x15/0x20 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670219] >>> [<ffffffff8134ceb7>] generic_make_request.part.62+0x77/0xb0 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670222] >>> [<ffffffff8134d428>] generic_make_request+0x68/0x70 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670225] >>> [<ffffffff8134d4a8>] submit_bio+0x78/0x160 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670228] >>> [<ffffffff81202f80>] ? bio_alloc_bioset+0xa0/0x1d0 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670232] >>> [<ffffffff813578c0>] blkdev_issue_discard+0x1f0/0x2a0 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670235] >>> [<ffffffff8135c1f4>] blkdev_ioctl+0x354/0x810 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670238] >>> [<ffffffff8101361d>] ? __switch_to+0x16d/0x4d0 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670241] >>> [<ffffffff81204370>] block_ioctl+0x40/0x50 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670244] >>> [<ffffffff811dd5c5>] do_vfs_ioctl+0x75/0x2c0 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670247] >>> [<ffffffff817606be>] ? __schedule+0x38e/0x700 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670249] >>> [<ffffffff811dd8a1>] SyS_ioctl+0x91/0xb0 >>> Dec 18 00:58:57 unassigned-hostname kernel: [ 1682.670252] >>> [<ffffffff8176d66d>] system_call_fastpath+0x1a/0x1f >>> >>> >>> >>> >>> If I do the mkfs.ext4 after the initial reconstruction is done, is >>> gets all the way through. I don't want to put this system into >>> production, since this could mean this condition could show up in the >>> future if the array needs to reconstruct again at a future point while >>> in service. >>> >>> This is a test system in a lab, so I'd be happy to try some tests. >>> >>> Terry >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4311 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Hung RAID5 array with discard 2015-10-22 16:07 ` Peter Kieser @ 2015-10-23 4:42 ` Neil Brown 2015-10-23 4:57 ` Peter Kieser 0 siblings, 1 reply; 7+ messages in thread From: Neil Brown @ 2015-10-23 4:42 UTC (permalink / raw) To: Peter Kieser, Terry Hardie; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 390 bytes --] Peter Kieser <peter@kieser.ca> writes: > FYI I ran into this problem in 3.18.22 when forcing RAID5 TRIM support > w/ raid456.devices_handle_discard_safely on Intel DC S3500 SSDs. Can you please be precise about the problem you experienced. Complete kernel messages are a minimum. Anything interesting that might have been happening at the time might help. Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Hung RAID5 array with discard 2015-10-23 4:42 ` Neil Brown @ 2015-10-23 4:57 ` Peter Kieser 2015-10-23 5:51 ` Neil Brown 0 siblings, 1 reply; 7+ messages in thread From: Peter Kieser @ 2015-10-23 4:57 UTC (permalink / raw) To: Neil Brown, Terry Hardie; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 652 bytes --] On 2015-10-22 9:42 PM, Neil Brown wrote: > Peter Kieser <peter@kieser.ca> writes: > >> FYI I ran into this problem in 3.18.22 when forcing RAID5 TRIM support >> w/ raid456.devices_handle_discard_safely on Intel DC S3500 SSDs. > Can you please be precise about the problem you experienced. > Complete kernel messages are a minimum. > Anything interesting that might have been happening at the time might > help. > > Thanks, > NeilBrown Same behaviour as the original poster. Enabled the module knob, then ran mkfs.ext4, which starts to do "Discarding device blocks" and then the machine hard locked. No kernel messages. -Peter [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4311 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Hung RAID5 array with discard 2015-10-23 4:57 ` Peter Kieser @ 2015-10-23 5:51 ` Neil Brown 0 siblings, 0 replies; 7+ messages in thread From: Neil Brown @ 2015-10-23 5:51 UTC (permalink / raw) To: Peter Kieser, Terry Hardie; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1141 bytes --] Peter Kieser <peter@kieser.ca> writes: > On 2015-10-22 9:42 PM, Neil Brown wrote: >> Peter Kieser <peter@kieser.ca> writes: >> >>> FYI I ran into this problem in 3.18.22 when forcing RAID5 TRIM support >>> w/ raid456.devices_handle_discard_safely on Intel DC S3500 SSDs. >> Can you please be precise about the problem you experienced. >> Complete kernel messages are a minimum. >> Anything interesting that might have been happening at the time might >> help. >> >> Thanks, >> NeilBrown > > Same behaviour as the original poster. Enabled the module knob, then ran > mkfs.ext4, which starts to do "Discarding device blocks" and then the > machine hard locked. No kernel messages. > Thanks. The original poster reported a hung-task dump... When you say "hard locked" - can you still access from another window/login, or is it totally forzen? If former, are then any processes in D start? maybe mdXX-raid5? Can you get /proc/$PID/stack of that process? If totally frozen - do you have a console? Can you alt-sysrq-W ?? Or have you since rebooted and the problem isn't repeatable? Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-10-23 5:51 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-18 3:08 Hung RAID5 array with discard Terry Hardie 2015-03-04 21:47 ` Terry Hardie 2015-03-23 2:57 ` NeilBrown 2015-10-22 16:07 ` Peter Kieser 2015-10-23 4:42 ` Neil Brown 2015-10-23 4:57 ` Peter Kieser 2015-10-23 5:51 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).