* [GIT PULL] Fix for btrfs/070 checksum error @ 2015-07-08 3:35 Qu Wenruo 2015-07-22 9:28 ` Qu Wenruo 0 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2015-07-08 3:35 UTC (permalink / raw) To: Chris Mason, btrfs Hi Chris, Sorry for the late pull request, this one should be sent on Monday. :( This patchset is meant to fix an annoying bug triggered by btrfs/070 and some other concurrency scrub + IO load test. The bug itself is triggered by chance and took Zhao Lei quite a long time to trace and debug it. Although previous patchset sent to mail-list has some problem, it turned out that's rebasing typo, and can be fixed quite easy. We have spend a whole weekend running tests this time to ensure the patch is OK. The fix itself is small enough and only fixes a long standing problem. IMHO it is OK for the late fix merge window. So please merge the following branch: https://github.com/adam900710/linux.git for_chris_4.2_070_fix Thanks, Qu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-08 3:35 [GIT PULL] Fix for btrfs/070 checksum error Qu Wenruo @ 2015-07-22 9:28 ` Qu Wenruo 2015-07-22 11:58 ` Chris Mason 2015-07-23 20:21 ` Chris Mason 0 siblings, 2 replies; 10+ messages in thread From: Qu Wenruo @ 2015-07-22 9:28 UTC (permalink / raw) To: Chris Mason, btrfs Hi Chris, Is there anything wrong with it? It has been 2 weeks, and it's still not in your for linus branch. Is there anything wrong? Thanks, Qu Qu Wenruo wrote on 2015/07/08 11:35 +0800: > Hi Chris, > > Sorry for the late pull request, this one should be sent on Monday. :( > > This patchset is meant to fix an annoying bug triggered by btrfs/070 and > some other concurrency scrub + IO load test. > > The bug itself is triggered by chance and took Zhao Lei quite a long > time to trace and debug it. > > Although previous patchset sent to mail-list has some problem, it turned > out that's rebasing typo, and can be fixed quite easy. > > We have spend a whole weekend running tests this time to ensure the > patch is OK. > > The fix itself is small enough and only fixes a long standing problem. > IMHO it is OK for the late fix merge window. > > So please merge the following branch: > https://github.com/adam900710/linux.git for_chris_4.2_070_fix > > Thanks, > Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-22 9:28 ` Qu Wenruo @ 2015-07-22 11:58 ` Chris Mason 2015-07-23 20:21 ` Chris Mason 1 sibling, 0 replies; 10+ messages in thread From: Chris Mason @ 2015-07-22 11:58 UTC (permalink / raw) To: Qu Wenruo; +Cc: btrfs On Wed, Jul 22, 2015 at 05:28:48PM +0800, Qu Wenruo wrote: > Hi Chris, > > Is there anything wrong with it? > > It has been 2 weeks, and it's still not in your for linus branch. > > Is there anything wrong? Nothing wrong at all, I've got it queued here. Thanks for the resend. -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-22 9:28 ` Qu Wenruo 2015-07-22 11:58 ` Chris Mason @ 2015-07-23 20:21 ` Chris Mason 2015-07-24 0:29 ` Qu Wenruo 1 sibling, 1 reply; 10+ messages in thread From: Chris Mason @ 2015-07-23 20:21 UTC (permalink / raw) To: Qu Wenruo; +Cc: btrfs On Wed, Jul 22, 2015 at 05:28:48PM +0800, Qu Wenruo wrote: > Hi Chris, > > Is there anything wrong with it? > > It has been 2 weeks, and it's still not in your for linus branch. > > Is there anything wrong? I ran this through xfstests again, and got tasks deadlocked during btrfs/061. Looks like scrub is leaking an extent buffer lock? I'll see what I can find, but posting here in case you've seen it already. 155540 (kworker/u65:6) D [<ffffffffa06898d0>] btrfs_tree_lock+0xf0/0x260 [btrfs] [<ffffffffa066a7dc>] lock_extent_buffer_for_io+0x1ac/0x1e0 [btrfs] [<ffffffffa066adc2>] btree_write_cache_pages+0x242/0x430 [btrfs] [<ffffffffa064010f>] btree_writepages+0x6f/0x80 [btrfs] [<ffffffff81172b23>] do_writepages+0x23/0x40 [<ffffffff81208c2d>] __writeback_single_inode+0x7d/0x780 [<ffffffff81209931>] writeback_sb_inodes+0x2b1/0x570 [<ffffffff81209de6>] wb_writeback+0x136/0x760 [<ffffffff8120a520>] wb_do_writeback+0x110/0x440 [<ffffffff8120a8e5>] wb_workfn+0x95/0x440 [<ffffffff81078788>] process_one_work+0x1e8/0x730 [<ffffffff81078dea>] worker_thread+0x11a/0x4d0 [<ffffffff8107e7b9>] kthread+0xe9/0x110 [<ffffffff816b128f>] ret_from_fork+0x3f/0x70 [<ffffffffffffffff>] 0xffffffffffffffff 240994 (btrfs-transacti) D [<ffffffffa06470b9>] wait_for_commit+0x59/0x90 [btrfs] [<ffffffffa0649b94>] btrfs_commit_transaction+0x284/0xd10 [btrfs] [<ffffffffa0643c46>] transaction_kthread+0x246/0x2a0 [btrfs] [<ffffffff8107e7b9>] kthread+0xe9/0x110 [<ffffffff816b128f>] ret_from_fork+0x3f/0x70 [<ffffffffffffffff>] 0xffffffffffffffff 241000 (btrfs) D [<ffffffffa06898d0>] btrfs_tree_lock+0xf0/0x260 [btrfs] [<ffffffffa066a7dc>] lock_extent_buffer_for_io+0x1ac/0x1e0 [btrfs] [<ffffffffa066adc2>] btree_write_cache_pages+0x242/0x430 [btrfs] [<ffffffffa064010f>] btree_writepages+0x6f/0x80 [btrfs] [<ffffffff81172b23>] do_writepages+0x23/0x40 [<ffffffff811638a5>] __filemap_fdatawrite_range+0xb5/0x100 [<ffffffff81163c73>] filemap_fdatawrite_range+0x13/0x20 [<ffffffffa0648b24>] btrfs_write_marked_extents+0xf4/0x140 [btrfs] [<ffffffffa0648bbb>] btrfs_write_and_wait_transaction+0x4b/0x90 [btrfs] [<ffffffffa064a1c5>] btrfs_commit_transaction+0x8b5/0xd10 [btrfs] [<ffffffffa06a646e>] relocate_block_group+0x41e/0x5e0 [btrfs] [<ffffffffa06a67b4>] btrfs_relocate_block_group+0x184/0x2a0 [btrfs] [<ffffffffa067722a>] btrfs_relocate_chunk+0x7a/0x110 [btrfs] [<ffffffffa067836c>] btrfs_balance+0x9bc/0x1060 [btrfs] [<ffffffffa0680ed8>] btrfs_ioctl_balance+0x1c8/0x330 [btrfs] [<ffffffffa06888f9>] btrfs_ioctl+0x409/0x1150 [btrfs] [<ffffffff811e9dfa>] do_vfs_ioctl+0x8a/0x570 [<ffffffff811ea372>] SyS_ioctl+0x92/0xa0 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241024 (btrfs) D [<ffffffffa06aa9de>] __scrub_blocked_if_needed+0x7e/0xc0 [btrfs] [<ffffffffa06aab22>] scrub_pause_off+0x32/0x70 [btrfs] [<ffffffffa06b1b73>] scrub_enumerate_chunks+0x4d3/0x5d0 [btrfs] [<ffffffffa06b1e36>] btrfs_scrub_dev+0x1c6/0x5a0 [btrfs] [<ffffffffa0686381>] btrfs_ioctl_scrub+0xb1/0x120 [btrfs] [<ffffffffa0688ef3>] btrfs_ioctl+0xa03/0x1150 [btrfs] [<ffffffff811e9dfa>] do_vfs_ioctl+0x8a/0x570 [<ffffffff811ea372>] SyS_ioctl+0x92/0xa0 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241002 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241003 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241004 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241005 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241006 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241007 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241008 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241009 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241010 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241012 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241013 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241015 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241016 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241019 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241020 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 241021 (fsstress) D [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff 16 hits: [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 [<ffffffff8120ec05>] sys_sync+0x35/0x90 [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff ----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-23 20:21 ` Chris Mason @ 2015-07-24 0:29 ` Qu Wenruo 2015-07-24 1:57 ` Chris Mason 0 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2015-07-24 0:29 UTC (permalink / raw) To: Chris Mason, btrfs Thanks Chris We will investigate it with highest priority. Thanks, Qu Chris Mason wrote on 2015/07/23 16:21 -0400: > On Wed, Jul 22, 2015 at 05:28:48PM +0800, Qu Wenruo wrote: >> Hi Chris, >> >> Is there anything wrong with it? >> >> It has been 2 weeks, and it's still not in your for linus branch. >> >> Is there anything wrong? > > I ran this through xfstests again, and got tasks deadlocked during > btrfs/061. Looks like scrub is leaking an extent buffer lock? > > I'll see what I can find, but posting here in case you've seen it > already. > > 155540 (kworker/u65:6) D > [<ffffffffa06898d0>] btrfs_tree_lock+0xf0/0x260 [btrfs] > [<ffffffffa066a7dc>] lock_extent_buffer_for_io+0x1ac/0x1e0 [btrfs] > [<ffffffffa066adc2>] btree_write_cache_pages+0x242/0x430 [btrfs] > [<ffffffffa064010f>] btree_writepages+0x6f/0x80 [btrfs] > [<ffffffff81172b23>] do_writepages+0x23/0x40 > [<ffffffff81208c2d>] __writeback_single_inode+0x7d/0x780 > [<ffffffff81209931>] writeback_sb_inodes+0x2b1/0x570 > [<ffffffff81209de6>] wb_writeback+0x136/0x760 > [<ffffffff8120a520>] wb_do_writeback+0x110/0x440 > [<ffffffff8120a8e5>] wb_workfn+0x95/0x440 > [<ffffffff81078788>] process_one_work+0x1e8/0x730 > [<ffffffff81078dea>] worker_thread+0x11a/0x4d0 > [<ffffffff8107e7b9>] kthread+0xe9/0x110 > [<ffffffff816b128f>] ret_from_fork+0x3f/0x70 > [<ffffffffffffffff>] 0xffffffffffffffff > > 240994 (btrfs-transacti) D > [<ffffffffa06470b9>] wait_for_commit+0x59/0x90 [btrfs] > [<ffffffffa0649b94>] btrfs_commit_transaction+0x284/0xd10 [btrfs] > [<ffffffffa0643c46>] transaction_kthread+0x246/0x2a0 [btrfs] > [<ffffffff8107e7b9>] kthread+0xe9/0x110 > [<ffffffff816b128f>] ret_from_fork+0x3f/0x70 > [<ffffffffffffffff>] 0xffffffffffffffff > > 241000 (btrfs) D > [<ffffffffa06898d0>] btrfs_tree_lock+0xf0/0x260 [btrfs] > [<ffffffffa066a7dc>] lock_extent_buffer_for_io+0x1ac/0x1e0 [btrfs] > [<ffffffffa066adc2>] btree_write_cache_pages+0x242/0x430 [btrfs] > [<ffffffffa064010f>] btree_writepages+0x6f/0x80 [btrfs] > [<ffffffff81172b23>] do_writepages+0x23/0x40 > [<ffffffff811638a5>] __filemap_fdatawrite_range+0xb5/0x100 > [<ffffffff81163c73>] filemap_fdatawrite_range+0x13/0x20 > [<ffffffffa0648b24>] btrfs_write_marked_extents+0xf4/0x140 [btrfs] > [<ffffffffa0648bbb>] btrfs_write_and_wait_transaction+0x4b/0x90 [btrfs] > [<ffffffffa064a1c5>] btrfs_commit_transaction+0x8b5/0xd10 [btrfs] > [<ffffffffa06a646e>] relocate_block_group+0x41e/0x5e0 [btrfs] > [<ffffffffa06a67b4>] btrfs_relocate_block_group+0x184/0x2a0 [btrfs] > [<ffffffffa067722a>] btrfs_relocate_chunk+0x7a/0x110 [btrfs] > [<ffffffffa067836c>] btrfs_balance+0x9bc/0x1060 [btrfs] > [<ffffffffa0680ed8>] btrfs_ioctl_balance+0x1c8/0x330 [btrfs] > [<ffffffffa06888f9>] btrfs_ioctl+0x409/0x1150 [btrfs] > [<ffffffff811e9dfa>] do_vfs_ioctl+0x8a/0x570 > [<ffffffff811ea372>] SyS_ioctl+0x92/0xa0 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241024 (btrfs) D > [<ffffffffa06aa9de>] __scrub_blocked_if_needed+0x7e/0xc0 [btrfs] > [<ffffffffa06aab22>] scrub_pause_off+0x32/0x70 [btrfs] > [<ffffffffa06b1b73>] scrub_enumerate_chunks+0x4d3/0x5d0 [btrfs] > [<ffffffffa06b1e36>] btrfs_scrub_dev+0x1c6/0x5a0 [btrfs] > [<ffffffffa0686381>] btrfs_ioctl_scrub+0xb1/0x120 [btrfs] > [<ffffffffa0688ef3>] btrfs_ioctl+0xa03/0x1150 [btrfs] > [<ffffffff811e9dfa>] do_vfs_ioctl+0x8a/0x570 > [<ffffffff811ea372>] SyS_ioctl+0x92/0xa0 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241002 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241003 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241004 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241005 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241006 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241007 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241008 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241009 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241010 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241012 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241013 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241015 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241016 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241019 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241020 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 241021 (fsstress) D > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > 16 hits: > [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90 > [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0 > [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20 > [<ffffffff811d947f>] iterate_supers+0xaf/0xe0 > [<ffffffff8120ec05>] sys_sync+0x35/0x90 > [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f > [<ffffffffffffffff>] 0xffffffffffffffff > > ----- > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-24 0:29 ` Qu Wenruo @ 2015-07-24 1:57 ` Chris Mason 2015-07-24 2:50 ` Qu Wenruo 2015-07-28 7:10 ` Qu Wenruo 0 siblings, 2 replies; 10+ messages in thread From: Chris Mason @ 2015-07-24 1:57 UTC (permalink / raw) To: Qu Wenruo; +Cc: btrfs On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote: [ deadlock with the 070 patches ] > Thanks Chris > > We will investigate it with highest priority. > > Thanks, > Qu > Thanks! I'm doing a few more runs to make sure the lockup is new with these patches. -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-24 1:57 ` Chris Mason @ 2015-07-24 2:50 ` Qu Wenruo 2015-07-28 7:10 ` Qu Wenruo 1 sibling, 0 replies; 10+ messages in thread From: Qu Wenruo @ 2015-07-24 2:50 UTC (permalink / raw) To: Chris Mason, btrfs Chris Mason wrote on 2015/07/23 21:57 -0400: > On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote: > > [ deadlock with the 070 patches ] > >> Thanks Chris >> >> We will investigate it with highest priority. >> >> Thanks, >> Qu >> > > Thanks! I'm doing a few more runs to make sure the lockup is new with > these patches. > > -chris > BTW, are this patchset rebased to your for-linus-4.2 branch? Also, how reproducible is the lockup in btrfs/061? Thanks, Qu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-24 1:57 ` Chris Mason 2015-07-24 2:50 ` Qu Wenruo @ 2015-07-28 7:10 ` Qu Wenruo 2015-07-29 8:21 ` Zhao Lei 1 sibling, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2015-07-28 7:10 UTC (permalink / raw) To: Chris Mason, btrfs Chris Mason wrote on 2015/07/23 21:57 -0400: > On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote: > > [ deadlock with the 070 patches ] > >> Thanks Chris >> >> We will investigate it with highest priority. >> >> Thanks, >> Qu >> > > Thanks! I'm doing a few more runs to make sure the lockup is new with > these patches. > > -chris > Hi Chris, I'm very sorry that we are unable to fix the lockup in a short time, so it may not fit in the v4.2 merge window. Please ignore this patchset for now. Thanks, Qu ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-28 7:10 ` Qu Wenruo @ 2015-07-29 8:21 ` Zhao Lei 2015-07-29 14:52 ` Chris Mason 0 siblings, 1 reply; 10+ messages in thread From: Zhao Lei @ 2015-07-29 8:21 UTC (permalink / raw) To: 'Chris Mason', 'btrfs'; +Cc: 'Qu Wenruo' Hi, Chris > -----Original Message----- > From: linux-btrfs-owner@vger.kernel.org > [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo > Sent: Tuesday, July 28, 2015 3:11 PM > To: Chris Mason; btrfs > Subject: Re: [GIT PULL] Fix for btrfs/070 checksum error > > Chris Mason wrote on 2015/07/23 21:57 -0400: > > On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote: > > > > [ deadlock with the 070 patches ] > > > >> Thanks Chris > >> > >> We will investigate it with highest priority. > >> > >> Thanks, > >> Qu > >> > > > > Thanks! I'm doing a few more runs to make sure the lockup is new with > > these patches. > > > > -chris > > > Hi Chris, > > I'm very sorry that we are unable to fix the lockup in a short time, so it may not > fit in the v4.2 merge window. > > Please ignore this patchset for now. > Sorry for taking quite a long time for investigate because it is randomly happened. We got reason of process blocking: 1: In some case, this patch caused __btrfs_cow_block()->btrfs_reloc_cow_block() failed from btrfs_balance operation.(need more investigation) 2: __btrfs_cow_block()'s error handle code hadn't unlock/free new_allocated tree block before return error. 3: do_relocation(), which is caller of __btrfs_cow_block(), have error handle code, but also can't work in this case, because new_allocated eb is not returned. 4: subsequent code in do_relocation() try to lock above eb again, and caused dead lock. In short: do_relocation() -> __btrfs_cow_block() failed without unlock eb *1 ... -> btrfs_search_slot() try to lock above eb again ... *1: this fail is caused by scrub Because eb locking code is not normal lock, we can't get information from lockldep in this case. Things to do: 1: Fix this patch to avoid making __btrfs_cow_block() fails. 2: Fix __btrfs_cow_block() to do enough cleanup in error handle code. 3: Some enhance for eb locking, to report some information to helps similar error. Thanks Zhaolei > Thanks, > Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body > of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] Fix for btrfs/070 checksum error 2015-07-29 8:21 ` Zhao Lei @ 2015-07-29 14:52 ` Chris Mason 0 siblings, 0 replies; 10+ messages in thread From: Chris Mason @ 2015-07-29 14:52 UTC (permalink / raw) To: Zhao Lei; +Cc: 'btrfs', 'Qu Wenruo' On Wed, Jul 29, 2015 at 04:21:33PM +0800, Zhao Lei wrote: > Hi, Chris > > > -----Original Message----- > > From: linux-btrfs-owner@vger.kernel.org > > [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo > > Sent: Tuesday, July 28, 2015 3:11 PM > > To: Chris Mason; btrfs > > Subject: Re: [GIT PULL] Fix for btrfs/070 checksum error > > > > Chris Mason wrote on 2015/07/23 21:57 -0400: > > > On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote: > > > > > > [ deadlock with the 070 patches ] > > > > > >> Thanks Chris > > >> > > >> We will investigate it with highest priority. > > >> > > >> Thanks, > > >> Qu > > >> > > > > > > Thanks! I'm doing a few more runs to make sure the lockup is new with > > > these patches. > > > > > > -chris > > > > > Hi Chris, > > > > I'm very sorry that we are unable to fix the lockup in a short time, so it may not > > fit in the v4.2 merge window. > > > > Please ignore this patchset for now. > > > > Sorry for taking quite a long time for investigate because it is > randomly happened. > > We got reason of process blocking: > 1: In some case, this patch caused __btrfs_cow_block()->btrfs_reloc_cow_block() > failed from btrfs_balance operation.(need more investigation) > > 2: __btrfs_cow_block()'s error handle code hadn't unlock/free > new_allocated tree block before return error. > > 3: do_relocation(), which is caller of __btrfs_cow_block(), have error handle > code, but also can't work in this case, because new_allocated eb is not > returned. > > 4: subsequent code in do_relocation() try to lock above eb again, > and caused dead lock. Excellent, thanks for tracking this down. I agree investigating #1 is the top priority, since it's possible the patches are just making it happen more often. -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-07-29 14:52 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-08 3:35 [GIT PULL] Fix for btrfs/070 checksum error Qu Wenruo 2015-07-22 9:28 ` Qu Wenruo 2015-07-22 11:58 ` Chris Mason 2015-07-23 20:21 ` Chris Mason 2015-07-24 0:29 ` Qu Wenruo 2015-07-24 1:57 ` Chris Mason 2015-07-24 2:50 ` Qu Wenruo 2015-07-28 7:10 ` Qu Wenruo 2015-07-29 8:21 ` Zhao Lei 2015-07-29 14:52 ` Chris Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).