linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] Fix for btrfs/070 checksum error
@ 2015-07-08  3:35 Qu Wenruo
  2015-07-22  9:28 ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2015-07-08  3:35 UTC (permalink / raw)
  To: Chris Mason, btrfs

Hi Chris,

Sorry for the late pull request, this one should be sent on Monday. :(

This patchset is meant to fix an annoying bug triggered by btrfs/070 and 
some other concurrency scrub + IO load test.

The bug itself is triggered by chance and took Zhao Lei quite a long 
time to trace and debug it.

Although previous patchset sent to mail-list has some problem, it turned 
out that's rebasing typo, and can be fixed quite easy.

We have spend a whole weekend running tests this time to ensure the 
patch is OK.

The fix itself is small enough and only fixes a long standing problem.
IMHO it is OK for the late fix merge window.

So please merge the following branch:
https://github.com/adam900710/linux.git  for_chris_4.2_070_fix

Thanks,
Qu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-08  3:35 [GIT PULL] Fix for btrfs/070 checksum error Qu Wenruo
@ 2015-07-22  9:28 ` Qu Wenruo
  2015-07-22 11:58   ` Chris Mason
  2015-07-23 20:21   ` Chris Mason
  0 siblings, 2 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-22  9:28 UTC (permalink / raw)
  To: Chris Mason, btrfs

Hi Chris,

Is there anything wrong with it?

It has been 2 weeks, and it's still not in your for linus branch.

Is there anything wrong?

Thanks,
Qu

Qu Wenruo wrote on 2015/07/08 11:35 +0800:
> Hi Chris,
>
> Sorry for the late pull request, this one should be sent on Monday. :(
>
> This patchset is meant to fix an annoying bug triggered by btrfs/070 and
> some other concurrency scrub + IO load test.
>
> The bug itself is triggered by chance and took Zhao Lei quite a long
> time to trace and debug it.
>
> Although previous patchset sent to mail-list has some problem, it turned
> out that's rebasing typo, and can be fixed quite easy.
>
> We have spend a whole weekend running tests this time to ensure the
> patch is OK.
>
> The fix itself is small enough and only fixes a long standing problem.
> IMHO it is OK for the late fix merge window.
>
> So please merge the following branch:
> https://github.com/adam900710/linux.git  for_chris_4.2_070_fix
>
> Thanks,
> Qu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-22  9:28 ` Qu Wenruo
@ 2015-07-22 11:58   ` Chris Mason
  2015-07-23 20:21   ` Chris Mason
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Mason @ 2015-07-22 11:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs

On Wed, Jul 22, 2015 at 05:28:48PM +0800, Qu Wenruo wrote:
> Hi Chris,
> 
> Is there anything wrong with it?
> 
> It has been 2 weeks, and it's still not in your for linus branch.
> 
> Is there anything wrong?

Nothing wrong at all, I've got it queued here.  Thanks for the resend.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-22  9:28 ` Qu Wenruo
  2015-07-22 11:58   ` Chris Mason
@ 2015-07-23 20:21   ` Chris Mason
  2015-07-24  0:29     ` Qu Wenruo
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Mason @ 2015-07-23 20:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs

On Wed, Jul 22, 2015 at 05:28:48PM +0800, Qu Wenruo wrote:
> Hi Chris,
> 
> Is there anything wrong with it?
> 
> It has been 2 weeks, and it's still not in your for linus branch.
> 
> Is there anything wrong?

I ran this through xfstests again, and got tasks deadlocked during
btrfs/061.  Looks like scrub is leaking an extent buffer lock?

I'll see what I can find, but posting here in case you've seen it
already.

155540 (kworker/u65:6) D
[<ffffffffa06898d0>] btrfs_tree_lock+0xf0/0x260 [btrfs]
[<ffffffffa066a7dc>] lock_extent_buffer_for_io+0x1ac/0x1e0 [btrfs]
[<ffffffffa066adc2>] btree_write_cache_pages+0x242/0x430 [btrfs]
[<ffffffffa064010f>] btree_writepages+0x6f/0x80 [btrfs]
[<ffffffff81172b23>] do_writepages+0x23/0x40
[<ffffffff81208c2d>] __writeback_single_inode+0x7d/0x780
[<ffffffff81209931>] writeback_sb_inodes+0x2b1/0x570
[<ffffffff81209de6>] wb_writeback+0x136/0x760
[<ffffffff8120a520>] wb_do_writeback+0x110/0x440
[<ffffffff8120a8e5>] wb_workfn+0x95/0x440
[<ffffffff81078788>] process_one_work+0x1e8/0x730
[<ffffffff81078dea>] worker_thread+0x11a/0x4d0
[<ffffffff8107e7b9>] kthread+0xe9/0x110
[<ffffffff816b128f>] ret_from_fork+0x3f/0x70
[<ffffffffffffffff>] 0xffffffffffffffff

240994 (btrfs-transacti) D
[<ffffffffa06470b9>] wait_for_commit+0x59/0x90 [btrfs]
[<ffffffffa0649b94>] btrfs_commit_transaction+0x284/0xd10 [btrfs]
[<ffffffffa0643c46>] transaction_kthread+0x246/0x2a0 [btrfs]
[<ffffffff8107e7b9>] kthread+0xe9/0x110
[<ffffffff816b128f>] ret_from_fork+0x3f/0x70
[<ffffffffffffffff>] 0xffffffffffffffff

241000 (btrfs) D
[<ffffffffa06898d0>] btrfs_tree_lock+0xf0/0x260 [btrfs]
[<ffffffffa066a7dc>] lock_extent_buffer_for_io+0x1ac/0x1e0 [btrfs]
[<ffffffffa066adc2>] btree_write_cache_pages+0x242/0x430 [btrfs]
[<ffffffffa064010f>] btree_writepages+0x6f/0x80 [btrfs]
[<ffffffff81172b23>] do_writepages+0x23/0x40
[<ffffffff811638a5>] __filemap_fdatawrite_range+0xb5/0x100
[<ffffffff81163c73>] filemap_fdatawrite_range+0x13/0x20
[<ffffffffa0648b24>] btrfs_write_marked_extents+0xf4/0x140 [btrfs]
[<ffffffffa0648bbb>] btrfs_write_and_wait_transaction+0x4b/0x90 [btrfs]
[<ffffffffa064a1c5>] btrfs_commit_transaction+0x8b5/0xd10 [btrfs]
[<ffffffffa06a646e>] relocate_block_group+0x41e/0x5e0 [btrfs]
[<ffffffffa06a67b4>] btrfs_relocate_block_group+0x184/0x2a0 [btrfs]
[<ffffffffa067722a>] btrfs_relocate_chunk+0x7a/0x110 [btrfs]
[<ffffffffa067836c>] btrfs_balance+0x9bc/0x1060 [btrfs]
[<ffffffffa0680ed8>] btrfs_ioctl_balance+0x1c8/0x330 [btrfs]
[<ffffffffa06888f9>] btrfs_ioctl+0x409/0x1150 [btrfs]
[<ffffffff811e9dfa>] do_vfs_ioctl+0x8a/0x570
[<ffffffff811ea372>] SyS_ioctl+0x92/0xa0
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241024 (btrfs) D
[<ffffffffa06aa9de>] __scrub_blocked_if_needed+0x7e/0xc0 [btrfs]
[<ffffffffa06aab22>] scrub_pause_off+0x32/0x70 [btrfs]
[<ffffffffa06b1b73>] scrub_enumerate_chunks+0x4d3/0x5d0 [btrfs]
[<ffffffffa06b1e36>] btrfs_scrub_dev+0x1c6/0x5a0 [btrfs]
[<ffffffffa0686381>] btrfs_ioctl_scrub+0xb1/0x120 [btrfs]
[<ffffffffa0688ef3>] btrfs_ioctl+0xa03/0x1150 [btrfs]
[<ffffffff811e9dfa>] do_vfs_ioctl+0x8a/0x570
[<ffffffff811ea372>] SyS_ioctl+0x92/0xa0
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241002 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241003 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241004 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241005 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241006 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241007 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241008 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241009 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241010 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241012 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241013 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241015 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241016 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241019 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241020 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

241021 (fsstress) D
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

16 hits: 
[<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
[<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
[<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
[<ffffffff811d947f>] iterate_supers+0xaf/0xe0
[<ffffffff8120ec05>] sys_sync+0x35/0x90
[<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff

-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-23 20:21   ` Chris Mason
@ 2015-07-24  0:29     ` Qu Wenruo
  2015-07-24  1:57       ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2015-07-24  0:29 UTC (permalink / raw)
  To: Chris Mason, btrfs

Thanks Chris

We will investigate it with highest priority.

Thanks,
Qu

Chris Mason wrote on 2015/07/23 16:21 -0400:
> On Wed, Jul 22, 2015 at 05:28:48PM +0800, Qu Wenruo wrote:
>> Hi Chris,
>>
>> Is there anything wrong with it?
>>
>> It has been 2 weeks, and it's still not in your for linus branch.
>>
>> Is there anything wrong?
>
> I ran this through xfstests again, and got tasks deadlocked during
> btrfs/061.  Looks like scrub is leaking an extent buffer lock?
>
> I'll see what I can find, but posting here in case you've seen it
> already.
>
> 155540 (kworker/u65:6) D
> [<ffffffffa06898d0>] btrfs_tree_lock+0xf0/0x260 [btrfs]
> [<ffffffffa066a7dc>] lock_extent_buffer_for_io+0x1ac/0x1e0 [btrfs]
> [<ffffffffa066adc2>] btree_write_cache_pages+0x242/0x430 [btrfs]
> [<ffffffffa064010f>] btree_writepages+0x6f/0x80 [btrfs]
> [<ffffffff81172b23>] do_writepages+0x23/0x40
> [<ffffffff81208c2d>] __writeback_single_inode+0x7d/0x780
> [<ffffffff81209931>] writeback_sb_inodes+0x2b1/0x570
> [<ffffffff81209de6>] wb_writeback+0x136/0x760
> [<ffffffff8120a520>] wb_do_writeback+0x110/0x440
> [<ffffffff8120a8e5>] wb_workfn+0x95/0x440
> [<ffffffff81078788>] process_one_work+0x1e8/0x730
> [<ffffffff81078dea>] worker_thread+0x11a/0x4d0
> [<ffffffff8107e7b9>] kthread+0xe9/0x110
> [<ffffffff816b128f>] ret_from_fork+0x3f/0x70
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 240994 (btrfs-transacti) D
> [<ffffffffa06470b9>] wait_for_commit+0x59/0x90 [btrfs]
> [<ffffffffa0649b94>] btrfs_commit_transaction+0x284/0xd10 [btrfs]
> [<ffffffffa0643c46>] transaction_kthread+0x246/0x2a0 [btrfs]
> [<ffffffff8107e7b9>] kthread+0xe9/0x110
> [<ffffffff816b128f>] ret_from_fork+0x3f/0x70
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241000 (btrfs) D
> [<ffffffffa06898d0>] btrfs_tree_lock+0xf0/0x260 [btrfs]
> [<ffffffffa066a7dc>] lock_extent_buffer_for_io+0x1ac/0x1e0 [btrfs]
> [<ffffffffa066adc2>] btree_write_cache_pages+0x242/0x430 [btrfs]
> [<ffffffffa064010f>] btree_writepages+0x6f/0x80 [btrfs]
> [<ffffffff81172b23>] do_writepages+0x23/0x40
> [<ffffffff811638a5>] __filemap_fdatawrite_range+0xb5/0x100
> [<ffffffff81163c73>] filemap_fdatawrite_range+0x13/0x20
> [<ffffffffa0648b24>] btrfs_write_marked_extents+0xf4/0x140 [btrfs]
> [<ffffffffa0648bbb>] btrfs_write_and_wait_transaction+0x4b/0x90 [btrfs]
> [<ffffffffa064a1c5>] btrfs_commit_transaction+0x8b5/0xd10 [btrfs]
> [<ffffffffa06a646e>] relocate_block_group+0x41e/0x5e0 [btrfs]
> [<ffffffffa06a67b4>] btrfs_relocate_block_group+0x184/0x2a0 [btrfs]
> [<ffffffffa067722a>] btrfs_relocate_chunk+0x7a/0x110 [btrfs]
> [<ffffffffa067836c>] btrfs_balance+0x9bc/0x1060 [btrfs]
> [<ffffffffa0680ed8>] btrfs_ioctl_balance+0x1c8/0x330 [btrfs]
> [<ffffffffa06888f9>] btrfs_ioctl+0x409/0x1150 [btrfs]
> [<ffffffff811e9dfa>] do_vfs_ioctl+0x8a/0x570
> [<ffffffff811ea372>] SyS_ioctl+0x92/0xa0
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241024 (btrfs) D
> [<ffffffffa06aa9de>] __scrub_blocked_if_needed+0x7e/0xc0 [btrfs]
> [<ffffffffa06aab22>] scrub_pause_off+0x32/0x70 [btrfs]
> [<ffffffffa06b1b73>] scrub_enumerate_chunks+0x4d3/0x5d0 [btrfs]
> [<ffffffffa06b1e36>] btrfs_scrub_dev+0x1c6/0x5a0 [btrfs]
> [<ffffffffa0686381>] btrfs_ioctl_scrub+0xb1/0x120 [btrfs]
> [<ffffffffa0688ef3>] btrfs_ioctl+0xa03/0x1150 [btrfs]
> [<ffffffff811e9dfa>] do_vfs_ioctl+0x8a/0x570
> [<ffffffff811ea372>] SyS_ioctl+0x92/0xa0
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241002 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241003 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241004 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241005 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241006 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241007 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241008 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241009 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241010 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241012 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241013 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241015 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241016 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241019 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241020 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 241021 (fsstress) D
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> 16 hits:
> [<ffffffff81202ecf>] wb_wait_for_completion+0x5f/0x90
> [<ffffffff81208199>] sync_inodes_sb+0x99/0x1c0
> [<ffffffff8120ebc6>] sync_inodes_one_sb+0x16/0x20
> [<ffffffff811d947f>] iterate_supers+0xaf/0xe0
> [<ffffffff8120ec05>] sys_sync+0x35/0x90
> [<ffffffff816b0e97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> -----
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-24  0:29     ` Qu Wenruo
@ 2015-07-24  1:57       ` Chris Mason
  2015-07-24  2:50         ` Qu Wenruo
  2015-07-28  7:10         ` Qu Wenruo
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Mason @ 2015-07-24  1:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs

On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote:

[ deadlock with the 070 patches ]

> Thanks Chris
> 
> We will investigate it with highest priority.
> 
> Thanks,
> Qu
> 

Thanks!  I'm doing a few more runs to make sure the lockup is new with
these patches.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-24  1:57       ` Chris Mason
@ 2015-07-24  2:50         ` Qu Wenruo
  2015-07-28  7:10         ` Qu Wenruo
  1 sibling, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-24  2:50 UTC (permalink / raw)
  To: Chris Mason, btrfs



Chris Mason wrote on 2015/07/23 21:57 -0400:
> On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote:
>
> [ deadlock with the 070 patches ]
>
>> Thanks Chris
>>
>> We will investigate it with highest priority.
>>
>> Thanks,
>> Qu
>>
>
> Thanks!  I'm doing a few more runs to make sure the lockup is new with
> these patches.
>
> -chris
>
BTW, are this patchset rebased to your for-linus-4.2 branch?

Also, how reproducible is the lockup in btrfs/061?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-24  1:57       ` Chris Mason
  2015-07-24  2:50         ` Qu Wenruo
@ 2015-07-28  7:10         ` Qu Wenruo
  2015-07-29  8:21           ` Zhao Lei
  1 sibling, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2015-07-28  7:10 UTC (permalink / raw)
  To: Chris Mason, btrfs

Chris Mason wrote on 2015/07/23 21:57 -0400:
> On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote:
>
> [ deadlock with the 070 patches ]
>
>> Thanks Chris
>>
>> We will investigate it with highest priority.
>>
>> Thanks,
>> Qu
>>
>
> Thanks!  I'm doing a few more runs to make sure the lockup is new with
> these patches.
>
> -chris
>
Hi Chris,

I'm very sorry that we are unable to fix the lockup in a short time,
so it may not fit in the v4.2 merge window.

Please ignore this patchset for now.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-28  7:10         ` Qu Wenruo
@ 2015-07-29  8:21           ` Zhao Lei
  2015-07-29 14:52             ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Zhao Lei @ 2015-07-29  8:21 UTC (permalink / raw)
  To: 'Chris Mason', 'btrfs'; +Cc: 'Qu Wenruo'

Hi, Chris

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org
> [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 28, 2015 3:11 PM
> To: Chris Mason; btrfs
> Subject: Re: [GIT PULL] Fix for btrfs/070 checksum error
> 
> Chris Mason wrote on 2015/07/23 21:57 -0400:
> > On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote:
> >
> > [ deadlock with the 070 patches ]
> >
> >> Thanks Chris
> >>
> >> We will investigate it with highest priority.
> >>
> >> Thanks,
> >> Qu
> >>
> >
> > Thanks!  I'm doing a few more runs to make sure the lockup is new with
> > these patches.
> >
> > -chris
> >
> Hi Chris,
> 
> I'm very sorry that we are unable to fix the lockup in a short time, so it may not
> fit in the v4.2 merge window.
> 
> Please ignore this patchset for now.
> 

Sorry for taking quite a long time for investigate because it is
randomly happened.

We got reason of process blocking:
1: In some case, this patch caused __btrfs_cow_block()->btrfs_reloc_cow_block()
  failed from btrfs_balance operation.(need more investigation)

2: __btrfs_cow_block()'s error handle code hadn't unlock/free
  new_allocated tree block before return error.

3: do_relocation(), which is caller of __btrfs_cow_block(), have error handle
  code, but also can't work in this case, because new_allocated eb is not
  returned.

4: subsequent code in do_relocation() try to lock above eb again,
  and caused dead lock.

In short:
do_relocation()
-> __btrfs_cow_block() failed without unlock eb *1
...
-> btrfs_search_slot() try to lock above eb again
...
*1: this fail is caused by scrub

Because eb locking code is not normal lock, we can't get information
from lockldep in this case.

Things to do:
1: Fix this patch to avoid making __btrfs_cow_block() fails.
2: Fix __btrfs_cow_block() to do enough cleanup in error handle code.
3: Some enhance for eb locking, to report some information to helps
  similar error.

Thanks
Zhaolei

> Thanks,
> Qu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Fix for btrfs/070 checksum error
  2015-07-29  8:21           ` Zhao Lei
@ 2015-07-29 14:52             ` Chris Mason
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Mason @ 2015-07-29 14:52 UTC (permalink / raw)
  To: Zhao Lei; +Cc: 'btrfs', 'Qu Wenruo'

On Wed, Jul 29, 2015 at 04:21:33PM +0800, Zhao Lei wrote:
> Hi, Chris
> 
> > -----Original Message-----
> > From: linux-btrfs-owner@vger.kernel.org
> > [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> > Sent: Tuesday, July 28, 2015 3:11 PM
> > To: Chris Mason; btrfs
> > Subject: Re: [GIT PULL] Fix for btrfs/070 checksum error
> > 
> > Chris Mason wrote on 2015/07/23 21:57 -0400:
> > > On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote:
> > >
> > > [ deadlock with the 070 patches ]
> > >
> > >> Thanks Chris
> > >>
> > >> We will investigate it with highest priority.
> > >>
> > >> Thanks,
> > >> Qu
> > >>
> > >
> > > Thanks!  I'm doing a few more runs to make sure the lockup is new with
> > > these patches.
> > >
> > > -chris
> > >
> > Hi Chris,
> > 
> > I'm very sorry that we are unable to fix the lockup in a short time, so it may not
> > fit in the v4.2 merge window.
> > 
> > Please ignore this patchset for now.
> > 
> 
> Sorry for taking quite a long time for investigate because it is
> randomly happened.
> 
> We got reason of process blocking:
> 1: In some case, this patch caused __btrfs_cow_block()->btrfs_reloc_cow_block()
>   failed from btrfs_balance operation.(need more investigation)
> 
> 2: __btrfs_cow_block()'s error handle code hadn't unlock/free
>   new_allocated tree block before return error.
> 
> 3: do_relocation(), which is caller of __btrfs_cow_block(), have error handle
>   code, but also can't work in this case, because new_allocated eb is not
>   returned.
> 
> 4: subsequent code in do_relocation() try to lock above eb again,
>   and caused dead lock.

Excellent, thanks for tracking this down.  I agree investigating #1 is
the top priority, since it's possible the patches are just making it
happen more often.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-29 14:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-08  3:35 [GIT PULL] Fix for btrfs/070 checksum error Qu Wenruo
2015-07-22  9:28 ` Qu Wenruo
2015-07-22 11:58   ` Chris Mason
2015-07-23 20:21   ` Chris Mason
2015-07-24  0:29     ` Qu Wenruo
2015-07-24  1:57       ` Chris Mason
2015-07-24  2:50         ` Qu Wenruo
2015-07-28  7:10         ` Qu Wenruo
2015-07-29  8:21           ` Zhao Lei
2015-07-29 14:52             ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).