From mboxrd@z Thu Jan 1 00:00:00 1970 From: BillStuff Subject: Re: Raid5 hang in 3.14.19 Date: Tue, 30 Sep 2014 16:21:32 -0500 Message-ID: <542B1EDC.7060803@sbcglobal.net> References: <5425E9D6.1050102@sbcglobal.net> <20140929122533.3b91a543@notabene.brown> <5428D863.7090409@sbcglobal.net> <20140929140818.1086972e@notabene.brown> <5428DFE1.9080600@sbcglobal.net> <20140930075950.1d1e3865@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20140930075950.1d1e3865@notabene.brown> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown Cc: linux-raid List-Id: linux-raid.ids On 09/29/2014 04:59 PM, NeilBrown wrote: > On Sun, 28 Sep 2014 23:28:17 -0500 BillStuff > wrote: > >> On 09/28/2014 11:08 PM, NeilBrown wrote: >>> On Sun, 28 Sep 2014 22:56:19 -0500 BillStuff >>> wrote: >>> >>>> On 09/28/2014 09:25 PM, NeilBrown wrote: >>>>> On Fri, 26 Sep 2014 17:33:58 -0500 BillStuff >>>>> wrote: >>>>> >>>>>> Hi Neil, >>>>>> >>>>>> I found something that looks similar to the problem described in >>>>>> "Re: seems like a deadlock in workqueue when md do a flush" from Sept 14th. >>>>>> >>>>>> It's on 3.14.19 with 7 recent patches for fixing raid1 recovery hangs. >>>>>> >>>>>> on this array: >>>>>> md3 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] >>>>>> 104171200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU] >>>>>> bitmap: 1/5 pages [4KB], 2048KB chunk >>>>>> >>>>>> I was running a test doing parallel kernel builds, read/write loops, and >>>>>> disk add / remove / check loops, >>>>>> on both this array and a raid1 array. >>>>>> >>>>>> I was trying to stress test your recent raid1 fixes, which went well, >>>>>> but then after 5 days, >>>>>> the raid5 array hung up with this in dmesg: >>>>> I think this is different to the workqueue problem you mentioned, though as I >>>>> don't know exactly what caused either I cannot be certain. >>>>> >>>>> From the data you provided it looks like everything is waiting on >>>>> get_active_stripe(), or on a process that is waiting on that. >>>>> That seems pretty common whenever anything goes wrong in raid5 :-( >>>>> >>>>> The md3_raid5 task is listed as blocked, but not stack trace is given. >>>>> If the machine is still in the state, then >>>>> >>>>> cat /proc/1698/stack >>>>> >>>>> might be useful. >>>>> (echo t > /proc/sysrq-trigger is always a good idea) >>>> Might this help? I believe the array was doing a "check" when things >>>> hung up. >>> It looks like it was trying to start doing a 'check'. >>> The 'resync' thread hadn't been started yet. >>> What is 'kthreadd' doing? >>> My guess is that it is in try_to_free_pages() waiting for writeout >>> for some xfs file page onto the md array ... which won't progress until >>> the thread gets started. >>> >>> That would suggest that we need an async way to start threads... >>> >>> Thanks, >>> NeilBrown >>> >> I suspect your guess is correct: > Thanks for the confirmation. > > I'm thinking of something like that. Very basic suggestion suggests it > instantly crash. > > If you were to apply this patch and run your test for a week or two, that > would increase my confidence (though of course testing doesn't prove the > absence of bugs....) > > Thanks, > NeilBrown > Neil, It locked up already, but in a different way. In this case it was triggered by a re-add in a fail / remove / add loop. It kinda seems like md3_raid5 tried to "collect" (reap) md3_resync before it got fully started. INFO: task md3_raid5:1698 blocked for more than 120 seconds. Tainted: P O 3.14.19fe-dirty #3 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. e8371dc4 00000046 c106d92d ea483770 d18a9994 ffd2177f ffffffff e8371dac c17d6700 c17d6700 e9d3bcc0 c231bcc0 00000420 e8371d98 00007400 00000088 00000000 00000000 c33ead90 000003be 00000000 00000005 00000000 0000528b Call Trace: [] ? __enqueue_entity+0x6d/0x80 [] ? enqueue_task_fair+0x2d3/0x660 [] schedule+0x23/0x60 [] schedule_timeout+0x145/0x1c0 [] ? default_wake_function+0x10/0x20 [] ? update_rq_clock.part.92+0x18/0x50 [] ? check_preempt_curr+0x65/0x90 [] ? ttwu_do_wakeup+0x18/0x120 [] wait_for_common+0x9b/0x110 [] ? wake_up_process+0x40/0x40 [] wait_for_completion+0x17/0x20 [] kthread_stop+0x41/0xb0 [] md_unregister_thread+0x31/0x40 [] md_reap_sync_thread+0x19/0x140 [] md_check_recovery+0xe8/0x480 [] raid5d+0x20/0x4c0 [raid456] [] ? try_to_del_timer_sync+0x42/0x60 [] ? del_timer_sync+0x41/0x50 [] ? schedule_timeout+0xfd/0x1c0 [] ? detach_if_pending+0xa0/0xa0 [] ? finish_wait+0x44/0x60 [] md_thread+0xe8/0x100 [] ? __wake_up_sync+0x20/0x20 [] ? md_start_sync+0xb0/0xb0 [] kthread+0xa1/0xc0 [] ret_from_kernel_thread+0x1b/0x28 [] ? kthread_create_on_node+0x110/0x110 INFO: task kworker/0:0:12491 blocked for more than 120 seconds. Tainted: P O 3.14.19fe-dirty #3 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. e6f2ba58 00000046 00000086 00000086 e6f2ba08 c10795ff f395c035 000030ea c17d6700 c17d6700 d18a9440 e8973cc0 e6f2ba14 c144fa3d 00000000 e6f2ba2c c2054600 f3dbc81b ea5f4c80 c9f64060 c2054600 00000001 c9f64060 e6f2ba5c Call Trace: [] ? __wake_up+0x3f/0x50 [] ? md_wakeup_thread+0x2d/0x30 [] ? raid5_unplug+0xbb/0x120 [raid456] [] ? raid5_unplug+0xbb/0x120 [raid456] [] schedule+0x23/0x60 [] schedule_timeout+0x145/0x1c0 [] ? blk_finish_plug+0x12/0x40 [] ? _xfs_buf_ioapply+0x287/0x300 [xfs] [] wait_for_common+0x9b/0x110 [] ? wake_up_process+0x40/0x40 [] wait_for_completion+0x17/0x20 [] xfs_buf_iowait+0x50/0xb0 [xfs] [] ? _xfs_buf_read+0x37/0x40 [xfs] [] _xfs_buf_read+0x37/0x40 [xfs] [] xfs_buf_read_map+0x85/0xe0 [xfs] [] xfs_trans_read_buf_map+0x1e9/0x3e0 [xfs] [] xfs_alloc_read_agfl+0x97/0xc0 [xfs] [] xfs_alloc_fix_freelist+0x193/0x430 [xfs] [] ? xlog_grant_sub_space.isra.4+0x1e/0x70 [xfs] [] xfs_free_extent+0x8e/0x100 [xfs] [] ? kmem_cache_alloc+0xae/0xf0 [] xfs_bmap_finish+0x13c/0x190 [xfs] [] xfs_itruncate_extents+0x1b2/0x2b0 [xfs] [] xfs_free_eofblocks+0x243/0x2e0 [xfs] [] xfs_inode_free_eofblocks+0x96/0x140 [xfs] [] ? xfs_inode_clear_eofblocks_tag+0x160/0x160 [xfs] [] xfs_inode_ag_walk.isra.7+0x1cd/0x2f0 [xfs] [] ? xfs_inode_clear_eofblocks_tag+0x160/0x160 [xfs] [] ? radix_tree_gang_lookup_tag+0x71/0xb0 [] ? xfs_perag_get_tag+0x2b/0xb0 [xfs] [] ? xfs_inode_ag_iterator_tag+0x3d/0xa0 [xfs] [] xfs_inode_ag_iterator_tag+0x6a/0xa0 [xfs] [] ? xfs_inode_clear_eofblocks_tag+0x160/0x160 [xfs] [] xfs_icache_free_eofblocks+0x2e/0x40 [xfs] [] xfs_eofblocks_worker+0x18/0x30 [xfs] [] process_one_work+0x10c/0x340 [] worker_thread+0x101/0x330 [] ? manage_workers.isra.27+0x250/0x250 [] kthread+0xa1/0xc0 [] ret_from_kernel_thread+0x1b/0x28 [] ? kthread_create_on_node+0x110/0x110 INFO: task xfs_fsr:24262 blocked for more than 120 seconds. Tainted: P O 3.14.19fe-dirty #3 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. e4d17aec 00000082 00000096 00000096 e4d17a9c c10795ff 00000000 00000000 c17d6700 c17d6700 e9d3b7b0 c70a8000 e4d17aa8 c144fa3d 00000000 e4d17ac0 c2054600 f3dbc81b ea7ace80 d7b74cc0 c2054600 00000002 d7b74cc0 e4d17af0 Call Trace: [] ? __wake_up+0x3f/0x50 [] ? md_wakeup_thread+0x2d/0x30 [] ? raid5_unplug+0xbb/0x120 [raid456] [] ? raid5_unplug+0xbb/0x120 [raid456] [] schedule+0x23/0x60 [] schedule_timeout+0x145/0x1c0 [] ? blk_finish_plug+0x12/0x40 [] ? _xfs_buf_ioapply+0x287/0x300 [xfs] [] wait_for_common+0x9b/0x110 [] ? wake_up_process+0x40/0x40 [] wait_for_completion+0x17/0x20 [] xfs_buf_iowait+0x50/0xb0 [xfs] [] ? _xfs_buf_read+0x37/0x40 [xfs] [] _xfs_buf_read+0x37/0x40 [xfs] [] xfs_buf_read_map+0x85/0xe0 [xfs] [] xfs_trans_read_buf_map+0x179/0x3e0 [xfs] [] xfs_imap_to_bp+0x67/0xe0 [xfs] [] xfs_iread+0x7d/0x3e0 [xfs] [] ? xfs_inode_alloc+0x58/0x1b0 [xfs] [] xfs_iget+0x192/0x580 [xfs] [] ? kmem_free+0x19/0x50 [xfs] [] xfs_bulkstat_one_int+0x86/0x2c0 [xfs] [] xfs_bulkstat_one+0x34/0x40 [xfs] [] ? xfs_internal_inum+0xa0/0xa0 [xfs] [] xfs_bulkstat+0x400/0x870 [xfs] [] xfs_ioc_bulkstat+0xb6/0x160 [xfs] [] ? xfs_bulkstat_one_int+0x2c0/0x2c0 [xfs] [] ? xfs_ioc_swapext+0x160/0x160 [xfs] [] xfs_file_ioctl+0x783/0xa60 [xfs] [] ? __update_cpu_load+0xab/0xd0 [] ? hrtimer_forward+0xa8/0x1b0 [] ? timerqueue_add+0x50/0xb0 [] ? ktime_get+0x53/0xe0 [] ? clockevents_program_event+0x95/0x130 [] ? xfs_ioc_swapext+0x160/0x160 [xfs] [] do_vfs_ioctl+0x2e2/0x4c0 [] ? tick_program_event+0x29/0x30 [] ? hrtimer_interrupt+0x13c/0x2a0 [] ? lockref_put_or_lock+0xb/0x30 [] SyS_ioctl+0x78/0x80 [] syscall_call+0x7/0x7 INFO: task md3_resync:24817 blocked for more than 120 seconds. Tainted: P O 3.14.19fe-dirty #3 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. c2d95e08 00000046 00000092 00000000 00000020 00000092 1befb1af 000030e0 c17d6700 c17d6700 e9d3c6e0 d18a9950 c2d95ddc c12adbc5 00000000 00000001 00000000 00000001 c2d95dec e95f0000 00000001 c2d95e08 00000292 00000292 Call Trace: [] ? queue_unplugged+0x45/0x90 [] ? prepare_to_wait_event+0x6b/0xd0 [] schedule+0x23/0x60 [] md_do_sync+0x8dd/0x1010 [] ? __wake_up_sync+0x20/0x20 [] ? md_start_sync+0xb0/0xb0 [] md_thread+0xe8/0x100 [] ? __wake_up_locked+0x1f/0x30 [] ? md_start_sync+0xb0/0xb0 [] kthread+0xa1/0xc0 [] ret_from_kernel_thread+0x1b/0x28 [] ? kthread_create_on_node+0x110/0x110 Again, the blocked task info is edited for space; hopefully these are the important ones, but I've got more if it helps. Thanks, Bill