* Raid10 device hangs during resync and heavy I/O. @ 2010-07-16 18:46 Justin Bronder 2010-07-16 18:49 ` Justin Bronder 2010-07-22 18:49 ` Justin Bronder 0 siblings, 2 replies; 9+ messages in thread From: Justin Bronder @ 2010-07-16 18:46 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 9135 bytes --] I've been able to reproduce this across a number of machines with the same hardware configuration. During a raid10 resync, it's possible to hang the device so that any further I/O operations will also block. This can be fairly simply done using dd. Interestingly, this is not reproducible when using a non-partitioned device. That is, creating the device with --auto=yes and then directly using it functions as expected. However, using --auto=yes or --auto=mdp and then creating a partition across the device will cause the hang. From all appearances, this is not just slow I/O, days later the same tasks are still blocked. The rest of the system continues to function normally, including other raid devices. Below I'm going to include the script I'm using to reproduce, the relevant kernel tracebacks, and /proc/mdstat. Thanks in advance for any help resolving this. === md10-hang.sh === #!/bin/bash MDP=false # Pick two unused drives here. MD_DRIVES="sdc sdd" if ${MDP}; then MD_DEV="md_d99" else MD_DEV="md99" fi M="/mnt/mdmount" SIZE=8192 die () { echo echo "ERROR: $*" echo exit 1 } mkraid() { local d local drives local mdargs="--auto=yes" ${MDP} && mdargs="--auto=mdp" mkdir -p ${M} umount -f ${M} &>/dev/null mdadm --stop /dev/md_d99 &>/dev/null mdadm --stop /dev/md99 &>/dev/null for d in ${MD_DRIVES}; do sfdisk -uM /dev/${d} <<-EOF ,${SIZE},83 ,,83 EOF mdadm --zero-superblock /dev/${d}1 &>/dev/null drives="${drives} /dev/${d}1" done mdadm --create /dev/${MD_DEV} \ --run \ --force \ --level=10 \ --layout=f2 \ --raid-devices=2 \ ${mdargs} ${drives} || die "mdadm --create failed" if ${MDP}; then printf ",,83\n" | sfdisk -uM /dev/${MD_DEV} mkfs.ext2 -q /dev/${MD_DEV}p1 mount /dev/${MD_DEV}p1 ${M} || die "Mount failed" else printf ",,83\n" | sfdisk -uM /dev/${MD_DEV} mkfs.ext2 -q /dev/${MD_DEV}p1 mount /dev/${MD_DEV}p1 ${M} || die "Mount failed" fi echo "Creating tmp file" dd if=/dev/zero of=${M}/tmpfile bs=1M count=4000 } mkraid i=1 while [ "$(</sys/block/${MD_DEV}/md/sync_action)" != "idle" ]; do echo "Attempt ${i} to cause crash" cat /proc/mdstat dd if=${M}/tmpfile of=${M}/cpfile bs=1M i=$((i++)) done === kernel trace === [ 9002.405247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9002.433361] ffff88025436fc30 0000000000000046 ffff88025436fc10 ffff880254616800 [ 9002.460415] ffff88025d40dd70 ffff88025d40a3f0 0000000354616800 00000000000de600 [ 9002.487497] ffff88025436fc10 ffff8801570343c0 ffff880157034420 ffff880157034448 [ 9002.514575] Call Trace: [ 9002.526609] [<ffffffff81320efb>] raise_barrier+0x167/0x1a3 [ 9002.548139] [<ffffffff810383b6>] ? default_wake_function+0x0/0xf [ 9002.571218] [<ffffffff813238e1>] sync_request+0x57d/0x8a8 [ 9002.592430] [<ffffffff81320ca5>] ? raid10_unplug+0x24/0x28 [ 9002.613833] [<ffffffff8132ad63>] ? md_thread+0x0/0xe8 [ 9002.633938] [<ffffffff8132dab2>] md_do_sync+0x685/0xa9d [ 9002.654556] [<ffffffff8132ad63>] ? md_thread+0x0/0xe8 [ 9002.674650] [<ffffffff8132ae31>] md_thread+0xce/0xe8 [ 9002.694435] [<ffffffff81034aa6>] ? spin_unlock_irqrestore+0x9/0xb [ 9002.717583] [<ffffffff81056cc0>] kthread+0x69/0x71 [ 9002.736753] [<ffffffff810037e4>] kernel_thread_helper+0x4/0x10 [ 9002.759062] [<ffffffff81056c57>] ? kthread+0x0/0x71 [ 9002.778478] [<ffffffff810037e0>] ? kernel_thread_helper+0x0/0x10 [ 9002.801286] INFO: task flush-9:99:5896 blocked for more than 120 seconds. [ 9002.826287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9002.854475] ffff88012fa8b870 0000000000000046 ffff88012fa8b850 ffff880254616800 [ 9002.881589] ffff88025d40ebd0 ffff88025d40a3f0 000000036fb73300 0000000000000001 [ 9002.908691] ffff88012fa8b850 ffff8801570343c0 ffff880157034420 ffff880157034448 [ 9002.935778] Call Trace: [ 9002.947781] [<ffffffff81320d5b>] wait_barrier+0xa7/0xe0 [ 9002.968438] [<ffffffff810383b6>] ? default_wake_function+0x0/0xf [ 9002.991452] [<ffffffff8132163e>] make_request+0x121/0x507 [ 9003.012697] [<ffffffff8132d2aa>] md_make_request+0xc7/0x101 [ 9003.034515] [<ffffffff811dc817>] generic_make_request+0x1af/0x276 [ 9003.057953] [<ffffffff811dda3b>] submit_bio+0x9e/0xa7 [ 9003.078197] [<ffffffff810e950d>] submit_bh+0x11b/0x13f [ 9003.098648] [<ffffffff810ebba9>] __block_write_full_page+0x20b/0x310 [ 9003.122755] [<ffffffff810ec383>] ? end_buffer_async_write+0x0/0x13a [ 9003.146576] [<ffffffff810ef5b2>] ? blkdev_get_block+0x0/0x50 [ 9003.168612] [<ffffffff810ec383>] ? end_buffer_async_write+0x0/0x13a [ 9003.192521] [<ffffffff810ef5b2>] ? blkdev_get_block+0x0/0x50 [ 9003.214684] [<ffffffff810ebd30>] block_write_full_page_endio+0x82/0x8e [ 9003.239426] [<ffffffff810ebd4c>] block_write_full_page+0x10/0x12 [ 9003.262539] [<ffffffff810eea92>] blkdev_writepage+0x13/0x15 [ 9003.284297] [<ffffffff8109e005>] __writepage+0x12/0x2b [ 9003.304718] [<ffffffff8109e46c>] write_cache_pages+0x1fa/0x306 [ 9003.327254] [<ffffffff8109dff3>] ? __writepage+0x0/0x2b [ 9003.347976] [<ffffffff810e9f65>] ? mark_buffer_dirty+0x85/0x89 [ 9003.370488] [<ffffffff8109e597>] generic_writepages+0x1f/0x25 [ 9003.392661] [<ffffffff8109e5b9>] do_writepages+0x1c/0x25 [ 9003.413477] [<ffffffff810e43e0>] writeback_single_inode+0xb0/0x1c7 [ 9003.436962] [<ffffffff810e4b5a>] writeback_inodes_wb+0x2bf/0x35a [ 9003.459949] [<ffffffff810e4d1a>] wb_writeback+0x125/0x1a1 [ 9003.481145] [<ffffffff810e4f66>] wb_do_writeback+0x138/0x14f [ 9003.503124] [<ffffffff810ab3e7>] ? bdi_start_fn+0x0/0xca [ 9003.524024] [<ffffffff810e4fa4>] bdi_writeback_task+0x27/0x92 [ 9003.546241] [<ffffffff810ab44c>] bdi_start_fn+0x65/0xca [ 9003.566875] [<ffffffff81056cc0>] kthread+0x69/0x71 [ 9003.586179] [<ffffffff810037e4>] kernel_thread_helper+0x4/0x10 [ 9003.608532] [<ffffffff81056c57>] ? kthread+0x0/0x71 [ 9003.627949] [<ffffffff810037e0>] ? kernel_thread_helper+0x0/0x10 [ 9003.650791] INFO: task dd:5912 blocked for more than 120 seconds. [ 9003.673621] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9003.701721] ffff88024c84d7b8 0000000000000082 ffff88024c84d798 ffff880254616800 [ 9003.728793] ffff88025d6840b0 ffff88025f065640 0000000296364968 0000000000000000 [ 9003.755893] 000000014c84d798 ffff8801570343c0 ffff880157034420 ffff880157034448 [ 9003.773323] Call Trace: [ 9003.773326] [<ffffffff81320d5b>] wait_barrier+0xa7/0xe0 [ 9003.773328] [<ffffffff810383b6>] ? default_wake_function+0x0/0xf [ 9003.773330] [<ffffffff8132163e>] make_request+0x121/0x507 [ 9003.773332] [<ffffffff810edbd7>] ? bio_split+0xca/0x183 [ 9003.773334] [<ffffffff813215d5>] make_request+0xb8/0x507 [ 9003.773337] [<ffffffff811d780d>] ? __elv_add_request+0xa1/0xaa [ 9003.773339] [<ffffffff8132d2aa>] md_make_request+0xc7/0x101 [ 9003.773341] [<ffffffff811dc817>] generic_make_request+0x1af/0x276 [ 9003.773343] [<ffffffff810ed885>] ? bio_alloc_bioset+0x70/0xc0 [ 9003.773345] [<ffffffff811dda3b>] submit_bio+0x9e/0xa7 [ 9003.773347] [<ffffffff810f0d0b>] mpage_bio_submit+0x22/0x26 [ 9003.773349] [<ffffffff810f17df>] do_mpage_readpage+0x462/0x54e [ 9003.773352] [<ffffffff8109fb21>] ? get_page+0x9/0xf [ 9003.773354] [<ffffffff810a004d>] ? __lru_cache_add+0x40/0x58 [ 9003.773357] [<ffffffff8112c194>] ? ext2_get_block+0x0/0x78a [ 9003.773359] [<ffffffff810f1a66>] mpage_readpages+0xc9/0x10f [ 9003.773361] [<ffffffff8112c194>] ? ext2_get_block+0x0/0x78a [ 9003.773363] [<ffffffff81001d89>] ? __switch_to+0x10e/0x1e1 [ 9003.773366] [<ffffffff8112b40c>] ext2_readpages+0x1a/0x1c [ 9003.773368] [<ffffffff8109f4d0>] __do_page_cache_readahead+0xf6/0x191 [ 9003.773370] [<ffffffff8109f587>] ra_submit+0x1c/0x20 [ 9003.773372] [<ffffffff8109f7e3>] ondemand_readahead+0x17b/0x18e [ 9003.773374] [<ffffffff8109f870>] page_cache_async_readahead+0x7a/0xa2 [ 9003.773379] [<ffffffff81098a59>] generic_file_aio_read+0x26e/0x55d [ 9003.773382] [<ffffffff810cb32e>] do_sync_read+0xc2/0x106 [ 9003.773384] [<ffffffff810a009d>] ? lru_cache_add_lru+0x38/0x3d [ 9003.773387] [<ffffffff8100338e>] ? apic_timer_interrupt+0xe/0x20 [ 9003.773389] [<ffffffff810cb980>] vfs_read+0xa4/0xde [ 9003.773391] [<ffffffff810cbc02>] sys_read+0x47/0x6d [ 9003.773393] [<ffffffff81002a42>] system_call_fastpath+0x16/0x1b === /proc/mdstat === Personalities : [raid1] [raid10] md99 : active raid10 sdd1[1] sdc1[0] 8393856 blocks 64K chunks 2 far-copies [2/2] [UU] [=>...................] resync = 5.4% (455360/8393856) finish=3938.0min speed=33K/sec md1 : active raid10 sda2[0] sdb2[1] 976703488 blocks 512K chunks 2 far-copies [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 56128 blocks [2/2] [UU] unused devices: <none> -- Justin Bronder [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid10 device hangs during resync and heavy I/O. 2010-07-16 18:46 Raid10 device hangs during resync and heavy I/O Justin Bronder @ 2010-07-16 18:49 ` Justin Bronder 2010-07-22 18:49 ` Justin Bronder 1 sibling, 0 replies; 9+ messages in thread From: Justin Bronder @ 2010-07-16 18:49 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 1107 bytes --] On 16/07/10 14:46 -0400, Justin Bronder wrote: > I've been able to reproduce this across a number of machines with the same > hardware configuration. During a raid10 resync, it's possible to hang the > device so that any further I/O operations will also block. This can be > fairly simply done using dd. > > Interestingly, this is not reproducible when using a non-partitioned device. > That is, creating the device with --auto=yes and then directly using it > functions as expected. However, using --auto=yes or --auto=mdp and then > creating a partition across the device will cause the hang. > > From all appearances, this is not just slow I/O, days later the same tasks > are still blocked. The rest of the system continues to function normally, > including other raid devices. > > Below I'm going to include the script I'm using to reproduce, the relevant > kernel tracebacks, and /proc/mdstat. Thanks in advance for any help > resolving this. My apologies, I should have proof read this first. I've reproduced using 2.6.28, 2.6.32 and 2.6.34.1. -- Justin Bronder [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid10 device hangs during resync and heavy I/O. 2010-07-16 18:46 Raid10 device hangs during resync and heavy I/O Justin Bronder 2010-07-16 18:49 ` Justin Bronder @ 2010-07-22 18:49 ` Justin Bronder 2010-07-23 3:19 ` Neil Brown 1 sibling, 1 reply; 9+ messages in thread From: Justin Bronder @ 2010-07-22 18:49 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 13002 bytes --] On 16/07/10 14:46 -0400, Justin Bronder wrote: > I've been able to reproduce this across a number of machines with the same > hardware configuration. During a raid10 resync, it's possible to hang the > device so that any further I/O operations will also block. This can be > fairly simply done using dd. > > Interestingly, this is not reproducible when using a non-partitioned device. > That is, creating the device with --auto=yes and then directly using it > functions as expected. However, using --auto=yes or --auto=mdp and then > creating a partition across the device will cause the hang. > I've done some more research that may potentially help. All of the following was done with 2.6.34.1. Still produces the hang: - Using cp (may take a bit longer). - Using jfs as the filesystem. - Dropping RESYNC_DEPTH to 32 - Using the offset layout. Does not produce the hang: - Using the near layout. - Using dd on the partition directly instead of on a filesystem via something like: dd if=/dev/${MD_DEV}p1 of=/dev/${MD_DEV}p1 seek=4001 bs=1M As the barrier code is very similiar, I repeated a number of these tests using raid1 instead of raid10. In every case, I was unable to cause the system to hang. I focused on the barriers due to the tracebacks in the previous email. For the heck of it, I added some tracing (patch below) where the reason for the hang is fairly obvious. Of course, how it happened isn't. The last bit of the trace before the hang. <idle>-0 [003] 188.987489: allow_barrier: swapper - w:2 p:1 b:1 <idle>-0 [003] 188.987495: allow_barrier: swapper - w:2 p:0 b:1 <...>-4910 [003] 188.987518: raise_barrier: out: md99_resync - w:2 p:0 b:1 <...>-4910 [003] 188.987733: raise_barrier: in: md99_resync - w:2 p:0 b:1 <...>-4909 [003] 188.988174: lower_barrier: md99_raid10 - w:2 p:0 b:0 <...>-4915 [002] 188.988188: wait_barrier: out: flush-9:99 - w:1 p:1 b:0 <...>-4996 [003] 188.988199: wait_barrier: out: dd - w:0 p:2 b:0 <...>-4915 [002] 188.988203: wait_barrier: in: flush-9:99 - w:0 p:2 b:0 <...>-4915 [002] 188.988204: wait_barrier: out: flush-9:99 - w:0 p:3 b:0 <...>-4915 [002] 188.988208: wait_barrier: in: flush-9:99 - w:0 p:3 b:0 <...>-4915 [002] 188.988209: wait_barrier: out: flush-9:99 - w:0 p:4 b:0 <...>-4996 [003] 188.988211: wait_barrier: in: dd - w:0 p:4 b:0 <...>-4996 [003] 188.988211: wait_barrier: out: dd - w:0 p:5 b:0 <...>-4996 [003] 188.988214: wait_barrier: in: dd - w:0 p:5 b:0 <...>-4996 [003] 188.988214: wait_barrier: out: dd - w:0 p:6 b:0 <...>-4915 [002] 188.988234: wait_barrier: in: flush-9:99 - w:0 p:6 b:0 <...>-4915 [002] 188.988235: wait_barrier: out: flush-9:99 - w:0 p:7 b:0 <...>-4915 [002] 188.988244: wait_barrier: in: flush-9:99 - w:0 p:7 b:0 <...>-4915 [002] 188.988244: wait_barrier: out: flush-9:99 - w:0 p:8 b:0 <...>-4915 [002] 188.988248: wait_barrier: in: flush-9:99 - w:0 p:8 b:0 <...>-4915 [002] 188.988249: wait_barrier: out: flush-9:99 - w:0 p:9 b:0 <...>-4996 [003] 188.988251: wait_barrier: in: dd - w:0 p:9 b:0 <...>-4996 [003] 188.988252: wait_barrier: out: dd - w:0 p:10 b:0 <...>-4996 [003] 188.988261: wait_barrier: in: dd - w:0 p:10 b:0 <...>-4996 [003] 188.988262: wait_barrier: out: dd - w:0 p:11 b:0 <...>-4996 [003] 188.988264: wait_barrier: in: dd - w:0 p:11 b:0 <...>-4996 [003] 188.988264: wait_barrier: out: dd - w:0 p:12 b:0 <...>-4915 [002] 188.988272: wait_barrier: in: flush-9:99 - w:0 p:12 b:0 <...>-4915 [002] 188.988272: wait_barrier: out: flush-9:99 - w:0 p:13 b:0 <...>-4915 [002] 188.988281: wait_barrier: in: flush-9:99 - w:0 p:13 b:0 <...>-4915 [002] 188.988281: wait_barrier: out: flush-9:99 - w:0 p:14 b:0 <...>-4996 [003] 188.988282: wait_barrier: in: dd - w:0 p:14 b:0 <...>-4996 [003] 188.988283: wait_barrier: out: dd - w:0 p:15 b:0 <...>-4915 [002] 188.988285: wait_barrier: in: flush-9:99 - w:0 p:15 b:0 <...>-4915 [002] 188.988286: wait_barrier: out: flush-9:99 - w:0 p:16 b:0 <...>-4996 [003] 188.988290: wait_barrier: in: dd - w:0 p:16 b:0 <...>-4996 [003] 188.988290: wait_barrier: out: dd - w:0 p:17 b:0 <...>-4996 [003] 188.988292: wait_barrier: in: dd - w:0 p:17 b:0 <...>-4996 [003] 188.988293: wait_barrier: out: dd - w:0 p:18 b:0 <...>-4915 [002] 188.988309: wait_barrier: in: flush-9:99 - w:0 p:18 b:0 <...>-4915 [002] 188.988310: wait_barrier: out: flush-9:99 - w:0 p:19 b:0 <...>-4996 [003] 188.988310: wait_barrier: in: dd - w:0 p:19 b:0 <...>-4996 [003] 188.988311: wait_barrier: out: dd - w:0 p:20 b:0 <...>-4996 [003] 188.988317: wait_barrier: in: dd - w:0 p:20 b:0 <...>-4996 [003] 188.988318: wait_barrier: out: dd - w:0 p:21 b:0 <...>-4996 [003] 188.988321: wait_barrier: in: dd - w:0 p:21 b:0 <...>-4996 [003] 188.988321: wait_barrier: out: dd - w:0 p:22 b:0 <...>-4915 [002] 188.988323: allow_barrier: flush-9:99 - w:0 p:21 b:0 <...>-4996 [003] 188.988327: wait_barrier: in: dd - w:0 p:21 b:0 <...>-4996 [003] 188.988328: wait_barrier: out: dd - w:0 p:22 b:0 <...>-4915 [002] 188.988356: wait_barrier: in: flush-9:99 - w:0 p:22 b:0 <...>-4915 [002] 188.988356: wait_barrier: out: flush-9:99 - w:0 p:23 b:0 <...>-4915 [002] 188.988361: wait_barrier: in: flush-9:99 - w:0 p:23 b:0 <...>-4915 [002] 188.988361: wait_barrier: out: flush-9:99 - w:0 p:24 b:0 <...>-4910 [003] 188.988370: raise_barrier: mid: md99_resync - w:0 p:24 b:1 <...>-4915 [002] 188.988378: wait_barrier: in: flush-9:99 - w:0 p:24 b:1 <...>-4996 [003] 188.988638: wait_barrier: in: dd - w:1 p:24 b:1 <idle>-0 [003] 188.988887: allow_barrier: swapper - w:2 p:23 b:1 <idle>-0 [003] 188.988895: allow_barrier: swapper - w:2 p:22 b:1 <idle>-0 [003] 188.988899: allow_barrier: swapper - w:2 p:21 b:1 <idle>-0 [003] 188.988905: allow_barrier: swapper - w:2 p:20 b:1 <idle>-0 [003] 188.988909: allow_barrier: swapper - w:2 p:19 b:1 <idle>-0 [002] 188.996299: allow_barrier: swapper - w:2 p:18 b:1 <idle>-0 [002] 188.996310: allow_barrier: swapper - w:2 p:17 b:1 <idle>-0 [002] 188.996317: allow_barrier: swapper - w:2 p:16 b:1 <idle>-0 [002] 188.996325: allow_barrier: swapper - w:2 p:15 b:1 <idle>-0 [002] 188.996330: allow_barrier: swapper - w:2 p:14 b:1 <idle>-0 [002] 188.996336: allow_barrier: swapper - w:2 p:13 b:1 <idle>-0 [002] 188.996340: allow_barrier: swapper - w:2 p:12 b:1 <idle>-0 [003] 189.004270: allow_barrier: swapper - w:2 p:11 b:1 <idle>-0 [003] 189.004278: allow_barrier: swapper - w:2 p:10 b:1 <idle>-0 [003] 189.004284: allow_barrier: swapper - w:2 p:9 b:1 <idle>-0 [003] 189.004302: allow_barrier: swapper - w:2 p:8 b:1 <idle>-0 [003] 189.004313: allow_barrier: swapper - w:2 p:7 b:1 <idle>-0 [003] 189.004322: allow_barrier: swapper - w:2 p:6 b:1 <idle>-0 [002] 189.004936: allow_barrier: swapper - w:2 p:5 b:1 <idle>-0 [002] 189.004959: allow_barrier: swapper - w:2 p:4 b:1 <idle>-0 [002] 189.004970: allow_barrier: swapper - w:2 p:3 b:1 <idle>-0 [002] 189.004979: allow_barrier: swapper - w:2 p:2 b:1 <idle>-0 [002] 189.004996: allow_barrier: swapper - w:2 p:1 b:1 <idle>-0 [002] 189.005006: allow_barrier: swapper - w:2 p:0 b:1 <...>-4910 [002] 189.005030: raise_barrier: out: md99_resync - w:2 p:0 b:1 <...>-4910 [002] 189.005251: raise_barrier: in: md99_resync - w:2 p:0 b:1 <...>-4909 [002] 189.005676: lower_barrier: md99_raid10 - w:2 p:0 b:0 <...>-4915 [003] 189.005687: wait_barrier: out: flush-9:99 - w:1 p:1 b:0 <...>-4996 [003] 189.005713: wait_barrier: out: dd - w:0 p:2 b:0 <...>-4910 [002] 189.005715: raise_barrier: mid: md99_resync - w:0 p:2 b:1 <...>-4996 [003] 189.005724: wait_barrier: in: dd - w:0 p:2 b:1 <idle>-0 [002] 189.012925: allow_barrier: swapper - w:1 p:1 b:1 <idle>-0 [002] 189.013271: allow_barrier: swapper - w:1 p:0 b:1 <...>-4910 [002] 189.013295: raise_barrier: out: md99_resync - w:1 p:0 b:1 <...>-4910 [002] 189.013414: raise_barrier: in: md99_resync - w:1 p:0 b:1 <...>-4909 [002] 189.021171: lower_barrier: md99_raid10 - w:1 p:0 b:0 <...>-4996 [003] 189.021179: wait_barrier: out: dd - w:0 p:1 b:0 <...>-4910 [002] 189.021182: raise_barrier: mid: md99_resync - w:0 p:1 b:1 <...>-4996 [003] 189.021184: wait_barrier: in: dd - w:0 p:1 b:1 <...>-4915 [003] 218.929283: wait_barrier: in: flush-9:99 - w:1 p:1 b:1 diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index ad945cc..35ad593 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -22,6 +22,7 @@ #include <linux/delay.h> #include <linux/blkdev.h> #include <linux/seq_file.h> +#include <linux/sched.h> #include "md.h" #include "raid10.h" #include "bitmap.h" @@ -704,6 +705,9 @@ static void raise_barrier(conf_t *conf, int force) BUG_ON(force && !conf->barrier); spin_lock_irq(&conf->resync_lock); + if ( conf->mddev->md_minor == 99 ) + trace_printk("in: %s - w:%u p:%u b:%u\n", current->comm, conf->nr_waiting, conf->nr_pending, conf->barrier); + /* Wait until no block IO is waiting (unless 'force') */ wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting, conf->resync_lock, @@ -712,12 +716,18 @@ static void raise_barrier(conf_t *conf, int force) /* block any new IO from starting */ conf->barrier++; + if ( conf->mddev->md_minor == 99 ) + trace_printk("mid: %s - w:%u p:%u b:%u\n", current->comm, conf->nr_waiting, conf->nr_pending, conf->barrier); + /* No wait for all pending IO to complete */ wait_event_lock_irq(conf->wait_barrier, !conf->nr_pending && conf->barrier < RESYNC_DEPTH, conf->resync_lock, raid10_unplug(conf->mddev->queue)); + if ( conf->mddev->md_minor == 99 ) + trace_printk("out: %s - w:%u p:%u b:%u\n", current->comm, conf->nr_waiting, conf->nr_pending, conf->barrier); + spin_unlock_irq(&conf->resync_lock); } @@ -726,6 +736,8 @@ static void lower_barrier(conf_t *conf) unsigned long flags; spin_lock_irqsave(&conf->resync_lock, flags); conf->barrier--; + if ( conf->mddev->md_minor == 99 ) + trace_printk(" %s - w:%u p:%u b:%u\n", current->comm, conf->nr_waiting, conf->nr_pending, conf->barrier); spin_unlock_irqrestore(&conf->resync_lock, flags); wake_up(&conf->wait_barrier); } @@ -733,7 +745,9 @@ static void lower_barrier(conf_t *conf) static void wait_barrier(conf_t *conf) { spin_lock_irq(&conf->resync_lock); - if (conf->barrier) { + if ( conf->mddev->md_minor == 99 ) + trace_printk("in: %s - w:%u p:%u b:%u\n", current->comm, conf->nr_waiting, conf->nr_pending, conf->barrier); + if (conf->barrier) { conf->nr_waiting++; wait_event_lock_irq(conf->wait_barrier, !conf->barrier, conf->resync_lock, @@ -741,6 +755,8 @@ static void wait_barrier(conf_t *conf) conf->nr_waiting--; } conf->nr_pending++; + if ( conf->mddev->md_minor == 99 ) + trace_printk("out: %s - w:%u p:%u b:%u\n", current->comm, conf->nr_waiting, conf->nr_pending, conf->barrier); spin_unlock_irq(&conf->resync_lock); } @@ -749,6 +765,8 @@ static void allow_barrier(conf_t *conf) unsigned long flags; spin_lock_irqsave(&conf->resync_lock, flags); conf->nr_pending--; + if ( conf->mddev->md_minor == 99 ) + trace_printk(" %s - w:%u p:%u b:%u\n", current->comm, conf->nr_waiting, conf->nr_pending, conf->barrier); spin_unlock_irqrestore(&conf->resync_lock, flags); wake_up(&conf->wait_barrier); } -- Justin Bronder [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Raid10 device hangs during resync and heavy I/O. 2010-07-22 18:49 ` Justin Bronder @ 2010-07-23 3:19 ` Neil Brown 2010-07-23 15:47 ` Justin Bronder 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2010-07-23 3:19 UTC (permalink / raw) To: Justin Bronder; +Cc: linux-raid On Thu, 22 Jul 2010 14:49:33 -0400 Justin Bronder <jsbronder@gentoo.org> wrote: > On 16/07/10 14:46 -0400, Justin Bronder wrote: > > I've been able to reproduce this across a number of machines with the same > > hardware configuration. During a raid10 resync, it's possible to hang the > > device so that any further I/O operations will also block. This can be > > fairly simply done using dd. > > > > Interestingly, this is not reproducible when using a non-partitioned device. > > That is, creating the device with --auto=yes and then directly using it > > functions as expected. However, using --auto=yes or --auto=mdp and then > > creating a partition across the device will cause the hang. > > > > I've done some more research that may potentially help. All of > the following was done with 2.6.34.1. > > Still produces the hang: > - Using cp (may take a bit longer). > - Using jfs as the filesystem. > - Dropping RESYNC_DEPTH to 32 > - Using the offset layout. > > Does not produce the hang: > - Using the near layout. > - Using dd on the partition directly instead of on a > filesystem via something like: > dd if=/dev/${MD_DEV}p1 of=/dev/${MD_DEV}p1 seek=4001 bs=1M > > > As the barrier code is very similiar, I repeated a number of > these tests using raid1 instead of raid10. In every case, I was > unable to cause the system to hang. I focused on the barriers > due to the tracebacks in the previous email. For the heck of it, > I added some tracing (patch below) where the reason for the hang > is fairly obvious. Of course, how it happened isn't. > > The last bit of the trace before the hang. Thanks for doing this! See below... > > <idle>-0 [003] 188.987489: allow_barrier: swapper - w:2 p:1 b:1 > <idle>-0 [003] 188.987495: allow_barrier: swapper - w:2 p:0 b:1 > <...>-4910 [003] 188.987518: raise_barrier: out: md99_resync - w:2 p:0 b:1 > <...>-4910 [003] 188.987733: raise_barrier: in: md99_resync - w:2 p:0 b:1 > <...>-4909 [003] 188.988174: lower_barrier: md99_raid10 - w:2 p:0 b:0 > <...>-4915 [002] 188.988188: wait_barrier: out: flush-9:99 - w:1 p:1 b:0 > <...>-4996 [003] 188.988199: wait_barrier: out: dd - w:0 p:2 b:0 > <...>-4915 [002] 188.988203: wait_barrier: in: flush-9:99 - w:0 p:2 b:0 > <...>-4915 [002] 188.988204: wait_barrier: out: flush-9:99 - w:0 p:3 b:0 > <...>-4915 [002] 188.988208: wait_barrier: in: flush-9:99 - w:0 p:3 b:0 > <...>-4915 [002] 188.988209: wait_barrier: out: flush-9:99 - w:0 p:4 b:0 > <...>-4996 [003] 188.988211: wait_barrier: in: dd - w:0 p:4 b:0 > <...>-4996 [003] 188.988211: wait_barrier: out: dd - w:0 p:5 b:0 > <...>-4996 [003] 188.988214: wait_barrier: in: dd - w:0 p:5 b:0 > <...>-4996 [003] 188.988214: wait_barrier: out: dd - w:0 p:6 b:0 > <...>-4915 [002] 188.988234: wait_barrier: in: flush-9:99 - w:0 p:6 b:0 > <...>-4915 [002] 188.988235: wait_barrier: out: flush-9:99 - w:0 p:7 b:0 > <...>-4915 [002] 188.988244: wait_barrier: in: flush-9:99 - w:0 p:7 b:0 > <...>-4915 [002] 188.988244: wait_barrier: out: flush-9:99 - w:0 p:8 b:0 > <...>-4915 [002] 188.988248: wait_barrier: in: flush-9:99 - w:0 p:8 b:0 > <...>-4915 [002] 188.988249: wait_barrier: out: flush-9:99 - w:0 p:9 b:0 > <...>-4996 [003] 188.988251: wait_barrier: in: dd - w:0 p:9 b:0 > <...>-4996 [003] 188.988252: wait_barrier: out: dd - w:0 p:10 b:0 > <...>-4996 [003] 188.988261: wait_barrier: in: dd - w:0 p:10 b:0 > <...>-4996 [003] 188.988262: wait_barrier: out: dd - w:0 p:11 b:0 > <...>-4996 [003] 188.988264: wait_barrier: in: dd - w:0 p:11 b:0 > <...>-4996 [003] 188.988264: wait_barrier: out: dd - w:0 p:12 b:0 > <...>-4915 [002] 188.988272: wait_barrier: in: flush-9:99 - w:0 p:12 b:0 > <...>-4915 [002] 188.988272: wait_barrier: out: flush-9:99 - w:0 p:13 b:0 > <...>-4915 [002] 188.988281: wait_barrier: in: flush-9:99 - w:0 p:13 b:0 > <...>-4915 [002] 188.988281: wait_barrier: out: flush-9:99 - w:0 p:14 b:0 > <...>-4996 [003] 188.988282: wait_barrier: in: dd - w:0 p:14 b:0 > <...>-4996 [003] 188.988283: wait_barrier: out: dd - w:0 p:15 b:0 > <...>-4915 [002] 188.988285: wait_barrier: in: flush-9:99 - w:0 p:15 b:0 > <...>-4915 [002] 188.988286: wait_barrier: out: flush-9:99 - w:0 p:16 b:0 > <...>-4996 [003] 188.988290: wait_barrier: in: dd - w:0 p:16 b:0 > <...>-4996 [003] 188.988290: wait_barrier: out: dd - w:0 p:17 b:0 > <...>-4996 [003] 188.988292: wait_barrier: in: dd - w:0 p:17 b:0 > <...>-4996 [003] 188.988293: wait_barrier: out: dd - w:0 p:18 b:0 > <...>-4915 [002] 188.988309: wait_barrier: in: flush-9:99 - w:0 p:18 b:0 > <...>-4915 [002] 188.988310: wait_barrier: out: flush-9:99 - w:0 p:19 b:0 > <...>-4996 [003] 188.988310: wait_barrier: in: dd - w:0 p:19 b:0 > <...>-4996 [003] 188.988311: wait_barrier: out: dd - w:0 p:20 b:0 > <...>-4996 [003] 188.988317: wait_barrier: in: dd - w:0 p:20 b:0 > <...>-4996 [003] 188.988318: wait_barrier: out: dd - w:0 p:21 b:0 > <...>-4996 [003] 188.988321: wait_barrier: in: dd - w:0 p:21 b:0 > <...>-4996 [003] 188.988321: wait_barrier: out: dd - w:0 p:22 b:0 > <...>-4915 [002] 188.988323: allow_barrier: flush-9:99 - w:0 p:21 b:0 > <...>-4996 [003] 188.988327: wait_barrier: in: dd - w:0 p:21 b:0 > <...>-4996 [003] 188.988328: wait_barrier: out: dd - w:0 p:22 b:0 > <...>-4915 [002] 188.988356: wait_barrier: in: flush-9:99 - w:0 p:22 b:0 > <...>-4915 [002] 188.988356: wait_barrier: out: flush-9:99 - w:0 p:23 b:0 > <...>-4915 [002] 188.988361: wait_barrier: in: flush-9:99 - w:0 p:23 b:0 > <...>-4915 [002] 188.988361: wait_barrier: out: flush-9:99 - w:0 p:24 b:0 > <...>-4910 [003] 188.988370: raise_barrier: mid: md99_resync - w:0 p:24 b:1 > <...>-4915 [002] 188.988378: wait_barrier: in: flush-9:99 - w:0 p:24 b:1 > <...>-4996 [003] 188.988638: wait_barrier: in: dd - w:1 p:24 b:1 > <idle>-0 [003] 188.988887: allow_barrier: swapper - w:2 p:23 b:1 > <idle>-0 [003] 188.988895: allow_barrier: swapper - w:2 p:22 b:1 > <idle>-0 [003] 188.988899: allow_barrier: swapper - w:2 p:21 b:1 > <idle>-0 [003] 188.988905: allow_barrier: swapper - w:2 p:20 b:1 > <idle>-0 [003] 188.988909: allow_barrier: swapper - w:2 p:19 b:1 > <idle>-0 [002] 188.996299: allow_barrier: swapper - w:2 p:18 b:1 > <idle>-0 [002] 188.996310: allow_barrier: swapper - w:2 p:17 b:1 > <idle>-0 [002] 188.996317: allow_barrier: swapper - w:2 p:16 b:1 > <idle>-0 [002] 188.996325: allow_barrier: swapper - w:2 p:15 b:1 > <idle>-0 [002] 188.996330: allow_barrier: swapper - w:2 p:14 b:1 > <idle>-0 [002] 188.996336: allow_barrier: swapper - w:2 p:13 b:1 > <idle>-0 [002] 188.996340: allow_barrier: swapper - w:2 p:12 b:1 > <idle>-0 [003] 189.004270: allow_barrier: swapper - w:2 p:11 b:1 > <idle>-0 [003] 189.004278: allow_barrier: swapper - w:2 p:10 b:1 > <idle>-0 [003] 189.004284: allow_barrier: swapper - w:2 p:9 b:1 > <idle>-0 [003] 189.004302: allow_barrier: swapper - w:2 p:8 b:1 > <idle>-0 [003] 189.004313: allow_barrier: swapper - w:2 p:7 b:1 > <idle>-0 [003] 189.004322: allow_barrier: swapper - w:2 p:6 b:1 > <idle>-0 [002] 189.004936: allow_barrier: swapper - w:2 p:5 b:1 > <idle>-0 [002] 189.004959: allow_barrier: swapper - w:2 p:4 b:1 > <idle>-0 [002] 189.004970: allow_barrier: swapper - w:2 p:3 b:1 > <idle>-0 [002] 189.004979: allow_barrier: swapper - w:2 p:2 b:1 > <idle>-0 [002] 189.004996: allow_barrier: swapper - w:2 p:1 b:1 > <idle>-0 [002] 189.005006: allow_barrier: swapper - w:2 p:0 b:1 > <...>-4910 [002] 189.005030: raise_barrier: out: md99_resync - w:2 p:0 b:1 > <...>-4910 [002] 189.005251: raise_barrier: in: md99_resync - w:2 p:0 b:1 > <...>-4909 [002] 189.005676: lower_barrier: md99_raid10 - w:2 p:0 b:0 > <...>-4915 [003] 189.005687: wait_barrier: out: flush-9:99 - w:1 p:1 b:0 > <...>-4996 [003] 189.005713: wait_barrier: out: dd - w:0 p:2 b:0 > <...>-4910 [002] 189.005715: raise_barrier: mid: md99_resync - w:0 p:2 b:1 > <...>-4996 [003] 189.005724: wait_barrier: in: dd - w:0 p:2 b:1 > <idle>-0 [002] 189.012925: allow_barrier: swapper - w:1 p:1 b:1 > <idle>-0 [002] 189.013271: allow_barrier: swapper - w:1 p:0 b:1 > <...>-4910 [002] 189.013295: raise_barrier: out: md99_resync - w:1 p:0 b:1 > <...>-4910 [002] 189.013414: raise_barrier: in: md99_resync - w:1 p:0 b:1 > <...>-4909 [002] 189.021171: lower_barrier: md99_raid10 - w:1 p:0 b:0 > <...>-4996 [003] 189.021179: wait_barrier: out: dd - w:0 p:1 b:0 > <...>-4910 [002] 189.021182: raise_barrier: mid: md99_resync - w:0 p:1 b:1 > <...>-4996 [003] 189.021184: wait_barrier: in: dd - w:0 p:1 b:1 > <...>-4915 [003] 218.929283: wait_barrier: in: flush-9:99 - w:1 p:1 b:1 > > So the 'dd' process successfully waited for the barrier to be gone at 189.021179, and thus set pending to '1'. It then submitted the IO request. We should then see swapper (or possibly some other thread) calling allow_barrier when the request completes. But we don't. A request could possibly take many milliseconds to complete, but it shouldn't take seconds and certainly not minutes. It might be helpful if you could run this again, and in make_request(), after the call to "wait_barrier()" print out: bio->bi_sector, bio->bi_size, bio->bi_rw I'm guessing that the last request that doesn't seem to complete will be different from the other in some important way. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid10 device hangs during resync and heavy I/O. 2010-07-23 3:19 ` Neil Brown @ 2010-07-23 15:47 ` Justin Bronder 2010-08-02 2:29 ` Neil Brown 0 siblings, 1 reply; 9+ messages in thread From: Justin Bronder @ 2010-07-23 15:47 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 7321 bytes --] On 23/07/10 13:19 +1000, Neil Brown wrote: > On Thu, 22 Jul 2010 14:49:33 -0400 > Justin Bronder <jsbronder@gentoo.org> wrote: > > > On 16/07/10 14:46 -0400, Justin Bronder wrote: > > > > I've done some more research that may potentially help. All of > > the following was done with 2.6.34.1. > > > > Still produces the hang: > > - Using cp (may take a bit longer). > > - Using jfs as the filesystem. > > - Dropping RESYNC_DEPTH to 32 > > - Using the offset layout. > > > > Does not produce the hang: > > - Using the near layout. > > - Using dd on the partition directly instead of on a > > filesystem via something like: > > dd if=/dev/${MD_DEV}p1 of=/dev/${MD_DEV}p1 seek=4001 bs=1M > > > > > > As the barrier code is very similiar, I repeated a number of > > these tests using raid1 instead of raid10. In every case, I was > > unable to cause the system to hang. I focused on the barriers > > due to the tracebacks in the previous email. For the heck of it, > > I added some tracing (patch below) where the reason for the hang > > is fairly obvious. Of course, how it happened isn't. > > > > The last bit of the trace before the hang. > > Thanks for doing this! > > See below... <previous trace cut> > > > So the 'dd' process successfully waited for the barrier to be gone at > 189.021179, and thus set pending to '1'. It then submitted the IO request. > We should then see swapper (or possibly some other thread) calling > allow_barrier when the request completes. But we don't. > A request could possibly take many milliseconds to complete, but it shouldn't > take seconds and certainly not minutes. > > It might be helpful if you could run this again, and in make_request(), after > the call to "wait_barrier()" print out: > bio->bi_sector, bio->bi_size, bio->bi_rw > > I'm guessing that the last request that doesn't seem to complete will be > different from the other in some important way. Nothing stood out to me, but here's the tail end of a couple of different traces. <...>-5047 [002] 207.023784: wait_barrier: in: dd - w:0 p:11 b:0 <...>-5047 [002] 207.023784: wait_barrier: out: dd - w:0 p:12 b:0 <...>-5047 [002] 207.023785: make_request: dd - sector:7472001 sz:40960 rw:0 <...>-4958 [002] 207.023872: raise_barrier: mid: md99_resync - w:0 p:12 b:1 <...>-5047 [002] 207.024689: allow_barrier: dd - w:0 p:11 b:1 <...>-5047 [002] 207.024695: allow_barrier: dd - w:0 p:10 b:1 <...>-5047 [002] 207.024697: allow_barrier: dd - w:0 p:9 b:1 <...>-5047 [002] 207.024710: allow_barrier: dd - w:0 p:8 b:1 <...>-5047 [002] 207.024713: allow_barrier: dd - w:0 p:7 b:1 <...>-5047 [002] 207.026679: wait_barrier: in: dd - w:0 p:7 b:1 <idle>-0 [003] 207.043049: allow_barrier: swapper - w:1 p:6 b:1 <idle>-0 [003] 207.043058: allow_barrier: swapper - w:1 p:5 b:1 <idle>-0 [003] 207.043063: allow_barrier: swapper - w:1 p:4 b:1 <idle>-0 [003] 207.043070: allow_barrier: swapper - w:1 p:3 b:1 <idle>-0 [003] 207.043074: allow_barrier: swapper - w:1 p:2 b:1 <idle>-0 [003] 207.043079: allow_barrier: swapper - w:1 p:1 b:1 <idle>-0 [003] 207.043084: allow_barrier: swapper - w:1 p:0 b:1 <...>-4958 [003] 207.043108: raise_barrier: out: md99_resync - w:1 p:0 b:1 <...>-4958 [003] 207.043150: raise_barrier: in: md99_resync - w:1 p:0 b:1 <...>-4957 [003] 207.051206: lower_barrier: md99_raid10 - w:1 p:0 b:0 <...>-5047 [002] 207.051215: wait_barrier: out: dd - w:0 p:1 b:0 <...>-5047 [002] 207.051216: make_request: dd - sector:7472081 sz:20480 rw:0 <...>-4958 [003] 207.051218: raise_barrier: mid: md99_resync - w:0 p:1 b:1 <...>-5047 [002] 207.051227: wait_barrier: in: dd - w:0 p:1 b:1 <idle>-0 [002] 207.058929: allow_barrier: swapper - w:1 p:0 b:1 <...>-4958 [003] 207.058938: raise_barrier: out: md99_resync - w:1 p:0 b:1 <...>-4958 [003] 207.059044: raise_barrier: in: md99_resync - w:1 p:0 b:1 <...>-4957 [003] 207.067171: lower_barrier: md99_raid10 - w:1 p:0 b:0 <...>-5047 [002] 207.067179: wait_barrier: out: dd - w:0 p:1 b:0 <...>-5047 [002] 207.067180: make_request: dd - sector:7472121 sz:3584 rw:0 <...>-4958 [003] 207.067182: raise_barrier: mid: md99_resync - w:0 p:1 b:1 <...>-5047 [002] 207.067184: wait_barrier: in: dd - w:0 p:1 b:1 <idle>-0 [000] 463.231730: allow_barrier: swapper - w:2 p:4 b:1 <idle>-0 [000] 463.231739: allow_barrier: swapper - w:2 p:3 b:1 <idle>-0 [000] 463.231746: allow_barrier: swapper - w:2 p:2 b:1 <idle>-0 [000] 463.231765: allow_barrier: swapper - w:2 p:1 b:1 <idle>-0 [000] 463.231774: allow_barrier: swapper - w:2 p:0 b:1 <...>-5004 [000] 463.231792: raise_barrier: out: md99_resync - w:2 p:0 b:1 <...>-5004 [000] 463.232005: raise_barrier: in: md99_resync - w:2 p:0 b:1 <...>-5003 [001] 463.232453: lower_barrier: md99_raid10 - w:2 p:0 b:0 <...>-5009 [000] 463.232463: wait_barrier: out: flush-9:99 - w:1 p:1 b:0 <...>-5009 [000] 463.232464: make_request: flush-9:99 - sector:13931137 sz:61440 rw:1 <...>-5105 [001] 463.232466: wait_barrier: out: dd - w:0 p:2 b:0 <...>-5105 [001] 463.232467: make_request: dd - sector:7204393 sz:40960 rw:0 <...>-5009 [000] 463.232476: wait_barrier: in: flush-9:99 - w:0 p:2 b:0 <...>-5009 [000] 463.232477: wait_barrier: out: flush-9:99 - w:0 p:3 b:0 <...>-5009 [000] 463.232477: make_request: flush-9:99 - sector:13931257 sz:3584 rw:1 <...>-5009 [000] 463.232481: wait_barrier: in: flush-9:99 - w:0 p:3 b:0 <...>-5009 [000] 463.232482: wait_barrier: out: flush-9:99 - w:0 p:4 b:0 <...>-5009 [000] 463.232483: make_request: flush-9:99 - sector:13931264 sz:512 rw:1 <...>-5105 [001] 463.232492: wait_barrier: in: dd - w:0 p:4 b:0 <...>-5105 [001] 463.232493: wait_barrier: out: dd - w:0 p:5 b:0 <...>-5105 [001] 463.232494: make_request: dd - sector:7204473 sz:3584 rw:0 <...>-5004 [000] 463.232495: raise_barrier: mid: md99_resync - w:0 p:5 b:1 <...>-5105 [001] 463.232496: wait_barrier: in: dd - w:0 p:5 b:1 <...>-5009 [000] 463.232522: wait_barrier: in: flush-9:99 - w:1 p:5 b:1 <idle>-0 [000] 463.232726: allow_barrier: swapper - w:2 p:4 b:1 <idle>-0 [001] 463.240520: allow_barrier: swapper - w:2 p:3 b:1 <idle>-0 [000] 463.240946: allow_barrier: swapper - w:2 p:2 b:1 <idle>-0 [000] 463.240955: allow_barrier: swapper - w:2 p:1 b:1 Thanks, -- Justin Bronder [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid10 device hangs during resync and heavy I/O. 2010-07-23 15:47 ` Justin Bronder @ 2010-08-02 2:29 ` Neil Brown 2010-08-02 2:58 ` Neil Brown 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2010-08-02 2:29 UTC (permalink / raw) To: Justin Bronder; +Cc: linux-raid On Fri, 23 Jul 2010 11:47:01 -0400 Justin Bronder <jsbronder@gentoo.org> wrote: > On 23/07/10 13:19 +1000, Neil Brown wrote: > > On Thu, 22 Jul 2010 14:49:33 -0400 > > > > So the 'dd' process successfully waited for the barrier to be gone at > > 189.021179, and thus set pending to '1'. It then submitted the IO request. > > We should then see swapper (or possibly some other thread) calling > > allow_barrier when the request completes. But we don't. > > A request could possibly take many milliseconds to complete, but it shouldn't > > take seconds and certainly not minutes. > > > > It might be helpful if you could run this again, and in make_request(), after > > the call to "wait_barrier()" print out: > > bio->bi_sector, bio->bi_size, bio->bi_rw > > > > I'm guessing that the last request that doesn't seem to complete will be > > different from the other in some important way. > > Nothing stood out to me, but here's the tail end of a couple of different > traces. Thanks a lot! Something does stand out for me!.... > <...>-5047 [002] 207.051215: wait_barrier: out: dd - w:0 p:1 b:0 > <...>-5047 [002] 207.051216: make_request: dd - sector:7472081 sz:20480 rw:0 > <...>-4958 [003] 207.051218: raise_barrier: mid: md99_resync - w:0 p:1 b:1 > <...>-5047 [002] 207.051227: wait_barrier: in: dd - w:0 p:1 b:1 > <idle>-0 [002] 207.058929: allow_barrier: swapper - w:1 p:0 b:1 > <...>-4958 [003] 207.058938: raise_barrier: out: md99_resync - w:1 p:0 b:1 > <...>-4958 [003] 207.059044: raise_barrier: in: md99_resync - w:1 p:0 b:1 > <...>-4957 [003] 207.067171: lower_barrier: md99_raid10 - w:1 p:0 b:0 > <...>-5047 [002] 207.067179: wait_barrier: out: dd - w:0 p:1 b:0 > <...>-5047 [002] 207.067180: make_request: dd - sector:7472121 sz:3584 rw:0 > <...>-4958 [003] 207.067182: raise_barrier: mid: md99_resync - w:0 p:1 b:1 > <...>-5047 [002] 207.067184: wait_barrier: in: dd - w:0 p:1 b:1 The last successful IO is only 3584 bytes - 7 sectors. All the others are much larger. It is almost certain that the read needed to cross a chunk boundary, so some goes to one device, some to the next. It was probably a 64K read or similar. The first 5 pages all fit in one device, and so goes through as a 20K read. The next page doesn't so it comes down to md/raid10 as a 1 page read. raid10 splits it into a 7 sector read and a 1 sector read. We see the 7 sector read being initiated, but it doesn't complete for some reason and the resync barrier gets in the way so the 1 sector read blocks in wait_barrier. In the next trace.... > > > > <idle>-0 [000] 463.231730: allow_barrier: swapper - w:2 p:4 b:1 > <idle>-0 [000] 463.231739: allow_barrier: swapper - w:2 p:3 b:1 > <idle>-0 [000] 463.231746: allow_barrier: swapper - w:2 p:2 b:1 > <idle>-0 [000] 463.231765: allow_barrier: swapper - w:2 p:1 b:1 > <idle>-0 [000] 463.231774: allow_barrier: swapper - w:2 p:0 b:1 > <...>-5004 [000] 463.231792: raise_barrier: out: md99_resync - w:2 p:0 b:1 > <...>-5004 [000] 463.232005: raise_barrier: in: md99_resync - w:2 p:0 b:1 > <...>-5003 [001] 463.232453: lower_barrier: md99_raid10 - w:2 p:0 b:0 > <...>-5009 [000] 463.232463: wait_barrier: out: flush-9:99 - w:1 p:1 b:0 > <...>-5009 [000] 463.232464: make_request: flush-9:99 - sector:13931137 sz:61440 rw:1 > <...>-5105 [001] 463.232466: wait_barrier: out: dd - w:0 p:2 b:0 > <...>-5105 [001] 463.232467: make_request: dd - sector:7204393 sz:40960 rw:0 > <...>-5009 [000] 463.232476: wait_barrier: in: flush-9:99 - w:0 p:2 b:0 > <...>-5009 [000] 463.232477: wait_barrier: out: flush-9:99 - w:0 p:3 b:0 > <...>-5009 [000] 463.232477: make_request: flush-9:99 - sector:13931257 sz:3584 rw:1 > <...>-5009 [000] 463.232481: wait_barrier: in: flush-9:99 - w:0 p:3 b:0 > <...>-5009 [000] 463.232482: wait_barrier: out: flush-9:99 - w:0 p:4 b:0 > <...>-5009 [000] 463.232483: make_request: flush-9:99 - sector:13931264 sz:512 rw:1 > <...>-5105 [001] 463.232492: wait_barrier: in: dd - w:0 p:4 b:0 > <...>-5105 [001] 463.232493: wait_barrier: out: dd - w:0 p:5 b:0 We see a similar thing with a write being broken into a 15 page writes, then a 7 sector write, then a 1 sector write - that all works. > <...>-5105 [001] 463.232494: make_request: dd - sector:7204473 sz:3584 rw:0 > <...>-5004 [000] 463.232495: raise_barrier: mid: md99_resync - w:0 p:5 b:1 > <...>-5105 [001] 463.232496: wait_barrier: in: dd - w:0 p:5 b:1 > <...>-5009 [000] 463.232522: wait_barrier: in: flush-9:99 - w:1 p:5 b:1 > <idle>-0 [000] 463.232726: allow_barrier: swapper - w:2 p:4 b:1 > <idle>-0 [001] 463.240520: allow_barrier: swapper - w:2 p:3 b:1 > <idle>-0 [000] 463.240946: allow_barrier: swapper - w:2 p:2 b:1 > <idle>-0 [000] 463.240955: allow_barrier: swapper - w:2 p:1 b:1 > But again we see a 7 sector read following a larger read, and the 1 sector read that should follow gets blocked. So it is somehow related to the need to split one-page requests across multiple devices, and it could be specific to read requests. Ahhhh.... I see the problem. Because a 'generic_make_request' is already active, the once called by raid10::make_request just queues the request until the top level one completes. This results in a deadlock. I'll have to ponder a bit to figure out the best way to fix this. Thanks again for the report and the help tracking down the problem. NeilBrown > Thanks, > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid10 device hangs during resync and heavy I/O. 2010-08-02 2:29 ` Neil Brown @ 2010-08-02 2:58 ` Neil Brown 2010-08-02 20:37 ` Justin Bronder 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2010-08-02 2:58 UTC (permalink / raw) To: Justin Bronder; +Cc: linux-raid On Mon, 2 Aug 2010 12:29:49 +1000 Neil Brown <neilb@suse.de> wrote: > Ahhhh.... I see the problem. Because a 'generic_make_request' is already > active, the once called by raid10::make_request just queues the request until > the top level one completes. This results in a deadlock. > > I'll have to ponder a bit to figure out the best way to fix this. > So, one good strong cup of tea later I think I have a good solution. Would you be able to test with this patch and confirm that you cannot reproduce the hang? Thanks. NeilBrown diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 42e64e4..d1d6891 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio) */ bp = bio_split(bio, chunk_sects - (bio->bi_sector & (chunk_sects - 1)) ); + + /* Each of these 'make_request' calls will call 'wait_barrier'. + * If the first succeeds but the second blocks due to the resync + * thread raising the barrier, we will deadlock because the + * IO to the underlying device will be queued in generic_make_request + * and will never complete, so will never reduce nr_pending. + * So increment nr_waiting here so no new raise_barriers will + * succeed, and so the second wait_barrier cannot block. + */ + spin_lock_irq(&conf->resync_lock); + conf->nr_waiting++; + spin_unlock_irq(&conf->resync_lock); + if (make_request(mddev, &bp->bio1)) generic_make_request(&bp->bio1); if (make_request(mddev, &bp->bio2)) generic_make_request(&bp->bio2); + spin_lock_irq(&conf->resync_lock); + conf->nr_waiting--; + wake_up(&conf->wait_barrier); + spin_unlock_irq(&conf->resync_lock); + bio_pair_release(bp); return 0; bad_map: ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Raid10 device hangs during resync and heavy I/O. 2010-08-02 2:58 ` Neil Brown @ 2010-08-02 20:37 ` Justin Bronder 2010-08-07 11:22 ` Neil Brown 0 siblings, 1 reply; 9+ messages in thread From: Justin Bronder @ 2010-08-02 20:37 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2461 bytes --] On 02/08/10 12:58 +1000, Neil Brown wrote: > On Mon, 2 Aug 2010 12:29:49 +1000 > Neil Brown <neilb@suse.de> wrote: > > > > Ahhhh.... I see the problem. Because a 'generic_make_request' is already > > active, the once called by raid10::make_request just queues the request until > > the top level one completes. This results in a deadlock. > > > > I'll have to ponder a bit to figure out the best way to fix this. > > > > So, one good strong cup of tea later I think I have a good solution. > > Would you be able to test with this patch and confirm that you cannot > reproduce the hang? I've been running with this patch on 2.6.34.1 all day and have yet to cause the hang. Given it took under 5 minutes earlier, feel free to add: Tested-by: Justin Bronder <jsbronder@gentoo.org> I really appreciate you taking care of this. Thanks. > Thanks. > > NeilBrown > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index 42e64e4..d1d6891 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio) > */ > bp = bio_split(bio, > chunk_sects - (bio->bi_sector & (chunk_sects - 1)) ); > + > + /* Each of these 'make_request' calls will call 'wait_barrier'. > + * If the first succeeds but the second blocks due to the resync > + * thread raising the barrier, we will deadlock because the > + * IO to the underlying device will be queued in generic_make_request > + * and will never complete, so will never reduce nr_pending. > + * So increment nr_waiting here so no new raise_barriers will > + * succeed, and so the second wait_barrier cannot block. > + */ > + spin_lock_irq(&conf->resync_lock); > + conf->nr_waiting++; > + spin_unlock_irq(&conf->resync_lock); > + > if (make_request(mddev, &bp->bio1)) > generic_make_request(&bp->bio1); > if (make_request(mddev, &bp->bio2)) > generic_make_request(&bp->bio2); > > + spin_lock_irq(&conf->resync_lock); > + conf->nr_waiting--; > + wake_up(&conf->wait_barrier); > + spin_unlock_irq(&conf->resync_lock); > + > bio_pair_release(bp); > return 0; > bad_map: > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Justin Bronder [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid10 device hangs during resync and heavy I/O. 2010-08-02 20:37 ` Justin Bronder @ 2010-08-07 11:22 ` Neil Brown 0 siblings, 0 replies; 9+ messages in thread From: Neil Brown @ 2010-08-07 11:22 UTC (permalink / raw) To: Justin Bronder; +Cc: linux-raid On Mon, 2 Aug 2010 16:37:54 -0400 Justin Bronder <jsbronder@gentoo.org> wrote: > On 02/08/10 12:58 +1000, Neil Brown wrote: > > On Mon, 2 Aug 2010 12:29:49 +1000 > > Neil Brown <neilb@suse.de> wrote: > > > > > > > Ahhhh.... I see the problem. Because a 'generic_make_request' is already > > > active, the once called by raid10::make_request just queues the request until > > > the top level one completes. This results in a deadlock. > > > > > > I'll have to ponder a bit to figure out the best way to fix this. > > > > > > > So, one good strong cup of tea later I think I have a good solution. > > > > Would you be able to test with this patch and confirm that you cannot > > reproduce the hang? > > I've been running with this patch on 2.6.34.1 all day and have yet to cause > the hang. Given it took under 5 minutes earlier, feel free to add: > > Tested-by: Justin Bronder <jsbronder@gentoo.org> > > I really appreciate you taking care of this. Thanks. And thank you for testing. I've queued this up now and will send it to Linus and -stable shortly. NeilBrown > > > Thanks. > > > > NeilBrown > > > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > > index 42e64e4..d1d6891 100644 > > --- a/drivers/md/raid10.c > > +++ b/drivers/md/raid10.c > > @@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio) > > */ > > bp = bio_split(bio, > > chunk_sects - (bio->bi_sector & (chunk_sects - 1)) ); > > + > > + /* Each of these 'make_request' calls will call 'wait_barrier'. > > + * If the first succeeds but the second blocks due to the resync > > + * thread raising the barrier, we will deadlock because the > > + * IO to the underlying device will be queued in generic_make_request > > + * and will never complete, so will never reduce nr_pending. > > + * So increment nr_waiting here so no new raise_barriers will > > + * succeed, and so the second wait_barrier cannot block. > > + */ > > + spin_lock_irq(&conf->resync_lock); > > + conf->nr_waiting++; > > + spin_unlock_irq(&conf->resync_lock); > > + > > if (make_request(mddev, &bp->bio1)) > > generic_make_request(&bp->bio1); > > if (make_request(mddev, &bp->bio2)) > > generic_make_request(&bp->bio2); > > > > + spin_lock_irq(&conf->resync_lock); > > + conf->nr_waiting--; > > + wake_up(&conf->wait_barrier); > > + spin_unlock_irq(&conf->resync_lock); > > + > > bio_pair_release(bp); > > return 0; > > bad_map: > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-08-07 11:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-16 18:46 Raid10 device hangs during resync and heavy I/O Justin Bronder 2010-07-16 18:49 ` Justin Bronder 2010-07-22 18:49 ` Justin Bronder 2010-07-23 3:19 ` Neil Brown 2010-07-23 15:47 ` Justin Bronder 2010-08-02 2:29 ` Neil Brown 2010-08-02 2:58 ` Neil Brown 2010-08-02 20:37 ` Justin Bronder 2010-08-07 11:22 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).