* Re: [PATCH] Btrfs: fix tree mod logging
2013-12-13 19:41 [PATCH] Btrfs: fix tree mod logging Filipe David Borba Manana
@ 2013-12-13 20:06 ` Josef Bacik
2013-12-13 20:10 ` Filipe David Manana
2013-12-13 22:57 ` [PATCH v2] " Filipe David Borba Manana
2013-12-20 15:17 ` [PATCH v3] " Filipe David Borba Manana
2 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2013-12-13 20:06 UTC (permalink / raw)
To: Filipe David Borba Manana, linux-btrfs
On 12/13/2013 02:41 PM, Filipe David Borba Manana wrote:
> While running the test btrfs/004 from xfstests in a loop, it failed
> about 1 time out of 20 runs in my desktop. The failure happend in
> the backref walking part of the test, and the test's error message was
> like this:
>
> btrfs/004 93s ... [failed, exit status 1] - output mismatch (see /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad)
> --- tests/btrfs/004.out 2013-11-26 18:25:29.263333714 +0000
> +++ /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad 2013-12-10 15:25:10.327518516 +0000
> @@ -1,3 +1,8 @@
> QA output created by 004
> *** test backref walking
> -*** done
> +unexpected output from
> + /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
> +expected inum: 405, expected address: 454656, file: /home/fdmanana/btrfs-tests/scratch_1/snap1/p0/d6/d3d/d156/fce, got:
> +
> ...
> (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad' to see the entire diff)
> Ran: btrfs/004
> Failures: btrfs/004
> Failed 1 of 1 tests
>
> But immediately after the test finished, the btrfs inspect-internal command
> returned the expected output:
>
> $ btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
> inode 405 offset 454656 root 258
> inode 405 offset 454656 root 5
>
> It turned out this was because the btrfs_search_old_slot() calls performed
> during backref walking (backref.c:__resolve_indirect_ref) were not finding
> anything. The reason for this turned out to be that the tree mod logging
> code was not logging some node multi-step operations atomically, therefore
> btrfs_search_old_slot() callers iterated often over an incomplete tree that
> wasn't fully consistent with any tree state from the past. Besides missing
> items, this often (but not always) resulted in -EIO errors during old slot
> searches, reported in dmesg like this:
>
> [ 4299.933936] ------------[ cut here ]------------
> [ 4299.933949] WARNING: CPU: 0 PID: 23190 at fs/btrfs/ctree.c:1343 btrfs_search_old_slot+0x57b/0xab0 [btrfs]()
> [ 4299.933950] Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rfcomm bluetooth parport_pc ppdev binfmt_misc joydev snd_hda_codec_h
> [ 4299.933977] CPU: 0 PID: 23190 Comm: btrfs Tainted: G W O 3.12.0-fdm-btrfs-next-16+ #70
> [ 4299.933978] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
> [ 4299.933979] 000000000000053f ffff8806f3fd98f8 ffffffff8176d284 0000000000000007
> [ 4299.933982] 0000000000000000 ffff8806f3fd9938 ffffffff8104a81c ffff880659c64b70
> [ 4299.933984] ffff880659c643d0 ffff8806599233d8 ffff880701e2e938 0000160000000000
> [ 4299.933987] Call Trace:
> [ 4299.933991] [<ffffffff8176d284>] dump_stack+0x55/0x76
> [ 4299.933994] [<ffffffff8104a81c>] warn_slowpath_common+0x8c/0xc0
> [ 4299.933997] [<ffffffff8104a86a>] warn_slowpath_null+0x1a/0x20
> [ 4299.934003] [<ffffffffa065d3bb>] btrfs_search_old_slot+0x57b/0xab0 [btrfs]
> [ 4299.934005] [<ffffffff81775f3b>] ? _raw_read_unlock+0x2b/0x50
> [ 4299.934010] [<ffffffffa0655001>] ? __tree_mod_log_search+0x81/0xc0 [btrfs]
> [ 4299.934019] [<ffffffffa06dd9b0>] __resolve_indirect_refs+0x130/0x5f0 [btrfs]
> [ 4299.934027] [<ffffffffa06a21f1>] ? free_extent_buffer+0x61/0xc0 [btrfs]
> [ 4299.934034] [<ffffffffa06de39c>] find_parent_nodes+0x1fc/0xe40 [btrfs]
> [ 4299.934042] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
> [ 4299.934048] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
> [ 4299.934056] [<ffffffffa06df980>] iterate_extent_inodes+0xe0/0x250 [btrfs]
> [ 4299.934058] [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
> [ 4299.934065] [<ffffffffa06dfb82>] iterate_inodes_from_logical+0x92/0xb0 [btrfs]
> [ 4299.934071] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
> [ 4299.934078] [<ffffffffa06b7015>] btrfs_ioctl+0xf65/0x1f60 [btrfs]
> [ 4299.934080] [<ffffffff811658b8>] ? handle_mm_fault+0x278/0xb00
> [ 4299.934083] [<ffffffff81075563>] ? up_read+0x23/0x40
> [ 4299.934085] [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
> [ 4299.934088] [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
> [ 4299.934090] [<ffffffff81776e23>] ? error_sti+0x5/0x6
> [ 4299.934093] [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
> [ 4299.934096] [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
> [ 4299.934098] [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
> [ 4299.934100] [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
> [ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
> [ 4299.934104] ---[ end trace 48f0cfc902491414 ]---
> [ 4299.934378] btrfs bad fsid on block 0
>
> These tree mod log operations that must be performed atomically, tree_mod_log_free_eb,
> tree_mod_log_eb_copy, tree_mod_log_insert_root and tree_mod_log_insert_move, used to
> be performed atomically before the following commit:
>
> c8cc6341653721b54760480b0d0d9b5f09b46741
> (Btrfs: stop using GFP_ATOMIC for the tree mod log allocations)
>
> That change removed the atomicity of such operations. This patch restores the
> atomicity while still not doing the GFP_ATOMIC allocations of tree_mod_elem
> structures, so it has to do the allocations using GFP_NOFS before acquiring
> the mod log lock.
>
> This issue has been experienced by several users recently, such as for example:
>
> http://www.spinics.net/lists/linux-btrfs/msg28574.html
>
> After running the btrfs/004 test for 679 consecutive iterations with this
> patch applied, I didn't ran into the issue anymore.
Thanks for tracking this down, just some return error problems below.
>
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> ---
> fs/btrfs/ctree.c | 266 +++++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 204 insertions(+), 62 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 59664f6..e99e5f6 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -474,6 +474,8 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
> * the index is the shifted logical of the *new* root node for root replace
> * operations, or the shifted logical of the affected block for all other
> * operations.
> + *
> + * Note: must be called with write lock (tree_mod_log_write_lock).
> */
> static noinline int
> __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
> @@ -486,20 +488,6 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
>
> BUG_ON(!tm);
>
> - tree_mod_log_write_lock(fs_info);
> - if (list_empty(&fs_info->tree_mod_seq_list)) {
> - tree_mod_log_write_unlock(fs_info);
> - /*
> - * Ok we no longer care about logging modifications, free up tm
> - * and return 0. Any callers shouldn't be using tm after
> - * calling tree_mod_log_insert, but if they do we can just
> - * change this to return a special error code to let the callers
> - * do their own thing.
> - */
> - kfree(tm);
> - return 0;
> - }
> -
> spin_lock(&fs_info->tree_mod_seq_lock);
> tm->seq = btrfs_inc_tree_mod_seq_minor(fs_info);
> spin_unlock(&fs_info->tree_mod_seq_lock);
> @@ -527,7 +515,6 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
> rb_link_node(&tm->node, parent, new);
> rb_insert_color(&tm->node, tm_root);
> out:
> - tree_mod_log_write_unlock(fs_info);
> return ret;
> }
>
> @@ -544,19 +531,38 @@ static inline int tree_mod_dont_log(struct btrfs_fs_info *fs_info,
> return 1;
> if (eb && btrfs_header_level(eb) == 0)
> return 1;
> +
> + tree_mod_log_write_lock(fs_info);
> + if (list_empty(&(fs_info)->tree_mod_seq_list)) {
> + tree_mod_log_write_unlock(fs_info);
> + return 1;
> + }
> +
> return 0;
> }
>
> -static inline int
> -__tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
> - struct extent_buffer *eb, int slot,
> - enum mod_log_op op, gfp_t flags)
> +/* Similar to tree_mod_dont_log, but doesn't acquire any locks. */
> +static inline int tree_mod_need_log(const struct btrfs_fs_info *fs_info,
> + struct extent_buffer *eb)
> +{
> + smp_mb();
> + if (list_empty(&(fs_info)->tree_mod_seq_list))
> + return 0;
> + if (eb && btrfs_header_level(eb) == 0)
> + return 0;
> +
> + return 1;
> +}
> +
> +static struct tree_mod_elem *
> +alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
> + enum mod_log_op op, gfp_t flags)
> {
> struct tree_mod_elem *tm;
>
> tm = kzalloc(sizeof(*tm), flags);
> if (!tm)
> - return -ENOMEM;
> + return NULL;
>
> tm->index = eb->start >> PAGE_CACHE_SHIFT;
> if (op != MOD_LOG_KEY_ADD) {
> @@ -567,7 +573,7 @@ __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
> tm->slot = slot;
> tm->generation = btrfs_node_ptr_generation(eb, slot);
>
> - return __tree_mod_log_insert(fs_info, tm);
> + return tm;
> }
>
> static noinline int
> @@ -575,10 +581,25 @@ tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
> struct extent_buffer *eb, int slot,
> enum mod_log_op op, gfp_t flags)
> {
> - if (tree_mod_dont_log(fs_info, eb))
> + struct tree_mod_elem *tm;
> + int ret;
> +
> + if (!tree_mod_need_log(fs_info, eb))
> return 0;
>
> - return __tree_mod_log_insert_key(fs_info, eb, slot, op, flags);
> + tm = alloc_tree_mod_elem(eb, slot, op, flags);
> + if (!tm)
> + return -ENOMEM;
> +
> + if (tree_mod_dont_log(fs_info, eb)) {
> + kfree(tm);
> + return 0;
> + }
> +
> + ret = __tree_mod_log_insert(fs_info, tm);
> + tree_mod_log_write_unlock(fs_info);
> +
> + return ret;
> }
>
> static noinline int
> @@ -586,51 +607,76 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
> struct extent_buffer *eb, int dst_slot, int src_slot,
> int nr_items, gfp_t flags)
> {
> - struct tree_mod_elem *tm;
> - int ret;
> + struct tree_mod_elem *tm = NULL;
> + struct tree_mod_elem **tm_list = NULL;
> + int ret = 0;
> int i;
>
> - if (tree_mod_dont_log(fs_info, eb))
> + if (!tree_mod_need_log(fs_info, eb))
> return 0;
>
> + tm = kzalloc(sizeof(*tm), flags);
> + if (!tm)
> + return -ENOMEM;
> +
> + tm->index = eb->start >> PAGE_CACHE_SHIFT;
> + tm->slot = src_slot;
> + tm->move.dst_slot = dst_slot;
> + tm->move.nr_items = nr_items;
> + tm->op = MOD_LOG_MOVE_KEYS;
> +
> + tm_list = kzalloc(nr_items * sizeof(struct tree_mod_elem *), flags);
> + if (!tm_list) {
> + ret = -ENOMEM;
> + goto free_tms;
> + }
> +
> + for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
> + tm_list[i] = alloc_tree_mod_elem(eb, i + dst_slot,
> + MOD_LOG_KEY_REMOVE_WHILE_MOVING, flags);
> + if (!tm_list[i]) {
> + ret = -ENOMEM;
> + goto free_tms;
> + }
> + }
> +
> + if (tree_mod_dont_log(fs_info, eb))
> + goto free_tms;
> +
> /*
> * When we override something during the move, we log these removals.
> * This can only happen when we move towards the beginning of the
> * buffer, i.e. dst_slot < src_slot.
> */
> for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
> - ret = __tree_mod_log_insert_key(fs_info, eb, i + dst_slot,
> - MOD_LOG_KEY_REMOVE_WHILE_MOVING, GFP_NOFS);
> + ret = __tree_mod_log_insert(fs_info, tm_list[i]);
> BUG_ON(ret < 0);
> }
>
> - tm = kzalloc(sizeof(*tm), flags);
> - if (!tm)
> - return -ENOMEM;
> + ret = __tree_mod_log_insert(fs_info, tm);
> + tree_mod_log_write_unlock(fs_info);
> + kfree(tm_list);
>
> - tm->index = eb->start >> PAGE_CACHE_SHIFT;
> - tm->slot = src_slot;
> - tm->move.dst_slot = dst_slot;
> - tm->move.nr_items = nr_items;
> - tm->op = MOD_LOG_MOVE_KEYS;
> + return ret;
> +free_tms:
> + for (i = 0; i < nr_items; i++)
> + kfree(tm_list[i]);
> + kfree(tm_list);
> + kfree(tm);
>
> - return __tree_mod_log_insert(fs_info, tm);
> + return ret;
> }
>
> static inline void
> -__tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
> +__tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
> + struct tree_mod_elem **tm_list,
> + int nritems)
> {
> int i;
> - u32 nritems;
> int ret;
>
> - if (btrfs_header_level(eb) == 0)
> - return;
> -
> - nritems = btrfs_header_nritems(eb);
> for (i = nritems - 1; i >= 0; i--) {
> - ret = __tree_mod_log_insert_key(fs_info, eb, i,
> - MOD_LOG_KEY_REMOVE_WHILE_FREEING, GFP_NOFS);
> + ret = __tree_mod_log_insert(fs_info, tm_list[i]);
> BUG_ON(ret < 0);
> }
> }
> @@ -641,17 +687,38 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
> struct extent_buffer *new_root, gfp_t flags,
> int log_removal)
> {
> - struct tree_mod_elem *tm;
> + struct tree_mod_elem *tm = NULL;
> + struct tree_mod_elem **tm_list = NULL;
> + int nritems = 0;
> + int ret = 0;
> + int i;
>
> - if (tree_mod_dont_log(fs_info, NULL))
> + if (!tree_mod_need_log(fs_info, NULL))
> return 0;
>
> - if (log_removal)
> - __tree_mod_log_free_eb(fs_info, old_root);
> + if (log_removal && btrfs_header_level(old_root) > 0) {
> + nritems = btrfs_header_nritems(old_root);
> + tm_list = kzalloc(nritems * sizeof(struct tree_mod_elem *),
> + flags);
> + if (!tm_list) {
> + ret = -ENOMEM;
> + goto free_tms;
> + }
> + for (i = 0; i < nritems; i++) {
> + tm_list[i] = alloc_tree_mod_elem(old_root, i,
> + MOD_LOG_KEY_REMOVE_WHILE_FREEING, flags);
> + if (!tm_list[i]) {
> + ret = -ENOMEM;
> + goto free_tms;
> + }
> + }
> + }
>
> tm = kzalloc(sizeof(*tm), flags);
> - if (!tm)
> - return -ENOMEM;
> + if (!tm) {
> + ret = -ENOMEM;
> + goto free_tms;
> + }
>
> tm->index = new_root->start >> PAGE_CACHE_SHIFT;
> tm->old_root.logical = old_root->start;
> @@ -659,7 +726,27 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
> tm->generation = btrfs_header_generation(old_root);
> tm->op = MOD_LOG_ROOT_REPLACE;
>
> - return __tree_mod_log_insert(fs_info, tm);
> + if (tree_mod_dont_log(fs_info, NULL))
> + goto free_tms;
> +
> + if (tm_list)
> + __tree_mod_log_free_eb(fs_info, tm_list, nritems);
> +
> + ret = __tree_mod_log_insert(fs_info, tm);
> + tree_mod_log_write_unlock(fs_info);
> + kfree(tm_list);
> +
> + return ret;
> +
> +free_tms:
> + if (tm_list) {
> + for (i = 0; i < nritems; i++)
> + kfree(tm_list[i]);
> + kfree(tm_list);
> + }
> + kfree(tm);
> +
> + return ret;
> }
>
> static struct tree_mod_elem *
> @@ -733,26 +820,51 @@ tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct extent_buffer *dst,
> struct extent_buffer *src, unsigned long dst_offset,
> unsigned long src_offset, int nr_items)
> {
> + struct tree_mod_elem **tm_list = NULL;
> + struct tree_mod_elem **tm_list_add, **tm_list_rem;
> int ret;
> int i;
>
> - if (tree_mod_dont_log(fs_info, NULL))
> + if (!tree_mod_need_log(fs_info, NULL))
> return;
>
> if (btrfs_header_level(dst) == 0 && btrfs_header_level(src) == 0)
> return;
>
> + tm_list = kzalloc(nr_items * 2 * sizeof(struct tree_mod_elem *),
> + GFP_NOFS);
> + BUG_ON(!tm_list);
This isn't ok, we need to return -ENOMEM here.
> +
> + tm_list_add = tm_list;
> + tm_list_rem = tm_list + nr_items;
> for (i = 0; i < nr_items; i++) {
> - ret = __tree_mod_log_insert_key(fs_info, src,
> - i + src_offset,
> - MOD_LOG_KEY_REMOVE, GFP_NOFS);
> + tm_list_rem[i] = alloc_tree_mod_elem(src, i + src_offset,
> + MOD_LOG_KEY_REMOVE, GFP_NOFS);
> + BUG_ON(!tm_list_rem[i]);
And here.
> +
> + tm_list_add[i] = alloc_tree_mod_elem(dst, i + dst_offset,
> + MOD_LOG_KEY_ADD, GFP_NOFS);
> + BUG_ON(!tm_list_add[i]);
And here.
> + }
> +
> + if (tree_mod_dont_log(fs_info, NULL))
> + goto free_tms;
> +
> + for (i = 0; i < nr_items; i++) {
> + ret = __tree_mod_log_insert(fs_info, tm_list_rem[i]);
> BUG_ON(ret < 0);
> - ret = __tree_mod_log_insert_key(fs_info, dst,
> - i + dst_offset,
> - MOD_LOG_KEY_ADD,
> - GFP_NOFS);
> + ret = __tree_mod_log_insert(fs_info, tm_list_add[i]);
> BUG_ON(ret < 0);
> }
> +
> + tree_mod_log_write_unlock(fs_info);
> + kfree(tm_list);
> + return;
> +
> +free_tms:
> + for (i = 0; i < nr_items * 2; i++)
> + kfree(tm_list[i]);
> + kfree(tm_list);
> }
>
> static inline void
> @@ -780,9 +892,39 @@ tree_mod_log_set_node_key(struct btrfs_fs_info *fs_info,
> static noinline void
> tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
> {
> - if (tree_mod_dont_log(fs_info, eb))
> + struct tree_mod_elem **tm_list = NULL;
> + int nritems = 0;
> + int i;
> +
> + if (btrfs_header_level(eb) == 0)
> return;
> - __tree_mod_log_free_eb(fs_info, eb);
> +
> + if (!tree_mod_need_log(fs_info, NULL))
> + return;
> +
> + nritems = btrfs_header_nritems(eb);
> + tm_list = kzalloc(nritems * sizeof(struct tree_mod_elem *),
> + GFP_NOFS);
> + BUG_ON(!tm_list);
And here.
> +
> + for (i = 0; i < nritems; i++) {
> + tm_list[i] = alloc_tree_mod_elem(eb, i,
> + MOD_LOG_KEY_REMOVE_WHILE_FREEING, GFP_NOFS);
> + BUG_ON(!tm_list[i]);
And here. Thanks,
Josef
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree mod logging
2013-12-13 20:06 ` Josef Bacik
@ 2013-12-13 20:10 ` Filipe David Manana
2013-12-13 20:49 ` Josef Bacik
0 siblings, 1 reply; 11+ messages in thread
From: Filipe David Manana @ 2013-12-13 20:10 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs@vger.kernel.org
On Fri, Dec 13, 2013 at 8:06 PM, Josef Bacik <jbacik@fb.com> wrote:
>
> On 12/13/2013 02:41 PM, Filipe David Borba Manana wrote:
>>
>> While running the test btrfs/004 from xfstests in a loop, it failed
>> about 1 time out of 20 runs in my desktop. The failure happend in
>> the backref walking part of the test, and the test's error message was
>> like this:
>>
>> btrfs/004 93s ... [failed, exit status 1] - output mismatch (see
>> /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad)
>> --- tests/btrfs/004.out 2013-11-26 18:25:29.263333714 +0000
>> +++ /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad
>> 2013-12-10 15:25:10.327518516 +0000
>> @@ -1,3 +1,8 @@
>> QA output created by 004
>> *** test backref walking
>> -*** done
>> +unexpected output from
>> + /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal
>> logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
>> +expected inum: 405, expected address: 454656, file:
>> /home/fdmanana/btrfs-tests/scratch_1/snap1/p0/d6/d3d/d156/fce, got:
>> +
>> ...
>> (Run 'diff -u tests/btrfs/004.out
>> /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad' to see the
>> entire diff)
>> Ran: btrfs/004
>> Failures: btrfs/004
>> Failed 1 of 1 tests
>>
>> But immediately after the test finished, the btrfs inspect-internal
>> command
>> returned the expected output:
>>
>> $ btrfs inspect-internal logical-resolve -P 141512704
>> /home/fdmanana/btrfs-tests/scratch_1
>> inode 405 offset 454656 root 258
>> inode 405 offset 454656 root 5
>>
>> It turned out this was because the btrfs_search_old_slot() calls performed
>> during backref walking (backref.c:__resolve_indirect_ref) were not finding
>> anything. The reason for this turned out to be that the tree mod logging
>> code was not logging some node multi-step operations atomically, therefore
>> btrfs_search_old_slot() callers iterated often over an incomplete tree
>> that
>> wasn't fully consistent with any tree state from the past. Besides missing
>> items, this often (but not always) resulted in -EIO errors during old slot
>> searches, reported in dmesg like this:
>>
>> [ 4299.933936] ------------[ cut here ]------------
>> [ 4299.933949] WARNING: CPU: 0 PID: 23190 at fs/btrfs/ctree.c:1343
>> btrfs_search_old_slot+0x57b/0xab0 [btrfs]()
>> [ 4299.933950] Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O)
>> vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rfcomm bluetooth parport_pc
>> ppdev binfmt_misc joydev snd_hda_codec_h
>> [ 4299.933977] CPU: 0 PID: 23190 Comm: btrfs Tainted: G W O
>> 3.12.0-fdm-btrfs-next-16+ #70
>> [ 4299.933978] Hardware name: To Be Filled By O.E.M. To Be Filled By
>> O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
>> [ 4299.933979] 000000000000053f ffff8806f3fd98f8 ffffffff8176d284
>> 0000000000000007
>> [ 4299.933982] 0000000000000000 ffff8806f3fd9938 ffffffff8104a81c
>> ffff880659c64b70
>> [ 4299.933984] ffff880659c643d0 ffff8806599233d8 ffff880701e2e938
>> 0000160000000000
>> [ 4299.933987] Call Trace:
>> [ 4299.933991] [<ffffffff8176d284>] dump_stack+0x55/0x76
>> [ 4299.933994] [<ffffffff8104a81c>] warn_slowpath_common+0x8c/0xc0
>> [ 4299.933997] [<ffffffff8104a86a>] warn_slowpath_null+0x1a/0x20
>> [ 4299.934003] [<ffffffffa065d3bb>] btrfs_search_old_slot+0x57b/0xab0
>> [btrfs]
>> [ 4299.934005] [<ffffffff81775f3b>] ? _raw_read_unlock+0x2b/0x50
>> [ 4299.934010] [<ffffffffa0655001>] ? __tree_mod_log_search+0x81/0xc0
>> [btrfs]
>> [ 4299.934019] [<ffffffffa06dd9b0>] __resolve_indirect_refs+0x130/0x5f0
>> [btrfs]
>> [ 4299.934027] [<ffffffffa06a21f1>] ? free_extent_buffer+0x61/0xc0
>> [btrfs]
>> [ 4299.934034] [<ffffffffa06de39c>] find_parent_nodes+0x1fc/0xe40 [btrfs]
>> [ 4299.934042] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0
>> [btrfs]
>> [ 4299.934048] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0
>> [btrfs]
>> [ 4299.934056] [<ffffffffa06df980>] iterate_extent_inodes+0xe0/0x250
>> [btrfs]
>> [ 4299.934058] [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
>> [ 4299.934065] [<ffffffffa06dfb82>] iterate_inodes_from_logical+0x92/0xb0
>> [btrfs]
>> [ 4299.934071] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0
>> [btrfs]
>> [ 4299.934078] [<ffffffffa06b7015>] btrfs_ioctl+0xf65/0x1f60 [btrfs]
>> [ 4299.934080] [<ffffffff811658b8>] ? handle_mm_fault+0x278/0xb00
>> [ 4299.934083] [<ffffffff81075563>] ? up_read+0x23/0x40
>> [ 4299.934085] [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
>> [ 4299.934088] [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
>> [ 4299.934090] [<ffffffff81776e23>] ? error_sti+0x5/0x6
>> [ 4299.934093] [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
>> [ 4299.934096] [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
>> [ 4299.934098] [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
>> [ 4299.934100] [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> [ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
>> [ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
>> [ 4299.934104] ---[ end trace 48f0cfc902491414 ]---
>> [ 4299.934378] btrfs bad fsid on block 0
>>
>> These tree mod log operations that must be performed atomically,
>> tree_mod_log_free_eb,
>> tree_mod_log_eb_copy, tree_mod_log_insert_root and
>> tree_mod_log_insert_move, used to
>> be performed atomically before the following commit:
>>
>> c8cc6341653721b54760480b0d0d9b5f09b46741
>> (Btrfs: stop using GFP_ATOMIC for the tree mod log allocations)
>>
>> That change removed the atomicity of such operations. This patch restores
>> the
>> atomicity while still not doing the GFP_ATOMIC allocations of
>> tree_mod_elem
>> structures, so it has to do the allocations using GFP_NOFS before
>> acquiring
>> the mod log lock.
>>
>> This issue has been experienced by several users recently, such as for
>> example:
>>
>> http://www.spinics.net/lists/linux-btrfs/msg28574.html
>>
>> After running the btrfs/004 test for 679 consecutive iterations with this
>> patch applied, I didn't ran into the issue anymore.
>
>
> Thanks for tracking this down, just some return error problems below.
Right, I left the BUG_ON's because they were already being used for
all existing tree mod failures.
If you don't mind, I'll do that change as a separate patch.
thanks Josef
>
>
>>
>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>> ---
>> fs/btrfs/ctree.c | 266
>> +++++++++++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 204 insertions(+), 62 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 59664f6..e99e5f6 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -474,6 +474,8 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info
>> *fs_info,
>> * the index is the shifted logical of the *new* root node for root
>> replace
>> * operations, or the shifted logical of the affected block for all
>> other
>> * operations.
>> + *
>> + * Note: must be called with write lock (tree_mod_log_write_lock).
>> */
>> static noinline int
>> __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct
>> tree_mod_elem *tm)
>> @@ -486,20 +488,6 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info,
>> struct tree_mod_elem *tm)
>> BUG_ON(!tm);
>> - tree_mod_log_write_lock(fs_info);
>> - if (list_empty(&fs_info->tree_mod_seq_list)) {
>> - tree_mod_log_write_unlock(fs_info);
>> - /*
>> - * Ok we no longer care about logging modifications, free
>> up tm
>> - * and return 0. Any callers shouldn't be using tm after
>> - * calling tree_mod_log_insert, but if they do we can just
>> - * change this to return a special error code to let the
>> callers
>> - * do their own thing.
>> - */
>> - kfree(tm);
>> - return 0;
>> - }
>> -
>> spin_lock(&fs_info->tree_mod_seq_lock);
>> tm->seq = btrfs_inc_tree_mod_seq_minor(fs_info);
>> spin_unlock(&fs_info->tree_mod_seq_lock);
>> @@ -527,7 +515,6 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info,
>> struct tree_mod_elem *tm)
>> rb_link_node(&tm->node, parent, new);
>> rb_insert_color(&tm->node, tm_root);
>> out:
>> - tree_mod_log_write_unlock(fs_info);
>> return ret;
>> }
>> @@ -544,19 +531,38 @@ static inline int tree_mod_dont_log(struct
>> btrfs_fs_info *fs_info,
>> return 1;
>> if (eb && btrfs_header_level(eb) == 0)
>> return 1;
>> +
>> + tree_mod_log_write_lock(fs_info);
>> + if (list_empty(&(fs_info)->tree_mod_seq_list)) {
>> + tree_mod_log_write_unlock(fs_info);
>> + return 1;
>> + }
>> +
>> return 0;
>> }
>> -static inline int
>> -__tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
>> - struct extent_buffer *eb, int slot,
>> - enum mod_log_op op, gfp_t flags)
>> +/* Similar to tree_mod_dont_log, but doesn't acquire any locks. */
>> +static inline int tree_mod_need_log(const struct btrfs_fs_info *fs_info,
>> + struct extent_buffer *eb)
>> +{
>> + smp_mb();
>> + if (list_empty(&(fs_info)->tree_mod_seq_list))
>> + return 0;
>> + if (eb && btrfs_header_level(eb) == 0)
>> + return 0;
>> +
>> + return 1;
>> +}
>> +
>> +static struct tree_mod_elem *
>> +alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
>> + enum mod_log_op op, gfp_t flags)
>> {
>> struct tree_mod_elem *tm;
>> tm = kzalloc(sizeof(*tm), flags);
>> if (!tm)
>> - return -ENOMEM;
>> + return NULL;
>> tm->index = eb->start >> PAGE_CACHE_SHIFT;
>> if (op != MOD_LOG_KEY_ADD) {
>> @@ -567,7 +573,7 @@ __tree_mod_log_insert_key(struct btrfs_fs_info
>> *fs_info,
>> tm->slot = slot;
>> tm->generation = btrfs_node_ptr_generation(eb, slot);
>> - return __tree_mod_log_insert(fs_info, tm);
>> + return tm;
>> }
>> static noinline int
>> @@ -575,10 +581,25 @@ tree_mod_log_insert_key(struct btrfs_fs_info
>> *fs_info,
>> struct extent_buffer *eb, int slot,
>> enum mod_log_op op, gfp_t flags)
>> {
>> - if (tree_mod_dont_log(fs_info, eb))
>> + struct tree_mod_elem *tm;
>> + int ret;
>> +
>> + if (!tree_mod_need_log(fs_info, eb))
>> return 0;
>> - return __tree_mod_log_insert_key(fs_info, eb, slot, op, flags);
>> + tm = alloc_tree_mod_elem(eb, slot, op, flags);
>> + if (!tm)
>> + return -ENOMEM;
>> +
>> + if (tree_mod_dont_log(fs_info, eb)) {
>> + kfree(tm);
>> + return 0;
>> + }
>> +
>> + ret = __tree_mod_log_insert(fs_info, tm);
>> + tree_mod_log_write_unlock(fs_info);
>> +
>> + return ret;
>> }
>> static noinline int
>> @@ -586,51 +607,76 @@ tree_mod_log_insert_move(struct btrfs_fs_info
>> *fs_info,
>> struct extent_buffer *eb, int dst_slot, int
>> src_slot,
>> int nr_items, gfp_t flags)
>> {
>> - struct tree_mod_elem *tm;
>> - int ret;
>> + struct tree_mod_elem *tm = NULL;
>> + struct tree_mod_elem **tm_list = NULL;
>> + int ret = 0;
>> int i;
>> - if (tree_mod_dont_log(fs_info, eb))
>> + if (!tree_mod_need_log(fs_info, eb))
>> return 0;
>> + tm = kzalloc(sizeof(*tm), flags);
>> + if (!tm)
>> + return -ENOMEM;
>> +
>> + tm->index = eb->start >> PAGE_CACHE_SHIFT;
>> + tm->slot = src_slot;
>> + tm->move.dst_slot = dst_slot;
>> + tm->move.nr_items = nr_items;
>> + tm->op = MOD_LOG_MOVE_KEYS;
>> +
>> + tm_list = kzalloc(nr_items * sizeof(struct tree_mod_elem *),
>> flags);
>> + if (!tm_list) {
>> + ret = -ENOMEM;
>> + goto free_tms;
>> + }
>> +
>> + for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
>> + tm_list[i] = alloc_tree_mod_elem(eb, i + dst_slot,
>> + MOD_LOG_KEY_REMOVE_WHILE_MOVING, flags);
>> + if (!tm_list[i]) {
>> + ret = -ENOMEM;
>> + goto free_tms;
>> + }
>> + }
>> +
>> + if (tree_mod_dont_log(fs_info, eb))
>> + goto free_tms;
>> +
>> /*
>> * When we override something during the move, we log these
>> removals.
>> * This can only happen when we move towards the beginning of the
>> * buffer, i.e. dst_slot < src_slot.
>> */
>> for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
>> - ret = __tree_mod_log_insert_key(fs_info, eb, i + dst_slot,
>> - MOD_LOG_KEY_REMOVE_WHILE_MOVING,
>> GFP_NOFS);
>> + ret = __tree_mod_log_insert(fs_info, tm_list[i]);
>> BUG_ON(ret < 0);
>> }
>> - tm = kzalloc(sizeof(*tm), flags);
>> - if (!tm)
>> - return -ENOMEM;
>> + ret = __tree_mod_log_insert(fs_info, tm);
>> + tree_mod_log_write_unlock(fs_info);
>> + kfree(tm_list);
>> - tm->index = eb->start >> PAGE_CACHE_SHIFT;
>> - tm->slot = src_slot;
>> - tm->move.dst_slot = dst_slot;
>> - tm->move.nr_items = nr_items;
>> - tm->op = MOD_LOG_MOVE_KEYS;
>> + return ret;
>> +free_tms:
>> + for (i = 0; i < nr_items; i++)
>> + kfree(tm_list[i]);
>> + kfree(tm_list);
>> + kfree(tm);
>> - return __tree_mod_log_insert(fs_info, tm);
>> + return ret;
>> }
>> static inline void
>> -__tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct
>> extent_buffer *eb)
>> +__tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
>> + struct tree_mod_elem **tm_list,
>> + int nritems)
>> {
>> int i;
>> - u32 nritems;
>> int ret;
>> - if (btrfs_header_level(eb) == 0)
>> - return;
>> -
>> - nritems = btrfs_header_nritems(eb);
>> for (i = nritems - 1; i >= 0; i--) {
>> - ret = __tree_mod_log_insert_key(fs_info, eb, i,
>> - MOD_LOG_KEY_REMOVE_WHILE_FREEING,
>> GFP_NOFS);
>> + ret = __tree_mod_log_insert(fs_info, tm_list[i]);
>> BUG_ON(ret < 0);
>> }
>> }
>> @@ -641,17 +687,38 @@ tree_mod_log_insert_root(struct btrfs_fs_info
>> *fs_info,
>> struct extent_buffer *new_root, gfp_t flags,
>> int log_removal)
>> {
>> - struct tree_mod_elem *tm;
>> + struct tree_mod_elem *tm = NULL;
>> + struct tree_mod_elem **tm_list = NULL;
>> + int nritems = 0;
>> + int ret = 0;
>> + int i;
>> - if (tree_mod_dont_log(fs_info, NULL))
>> + if (!tree_mod_need_log(fs_info, NULL))
>> return 0;
>> - if (log_removal)
>> - __tree_mod_log_free_eb(fs_info, old_root);
>> + if (log_removal && btrfs_header_level(old_root) > 0) {
>> + nritems = btrfs_header_nritems(old_root);
>> + tm_list = kzalloc(nritems * sizeof(struct tree_mod_elem
>> *),
>> + flags);
>> + if (!tm_list) {
>> + ret = -ENOMEM;
>> + goto free_tms;
>> + }
>> + for (i = 0; i < nritems; i++) {
>> + tm_list[i] = alloc_tree_mod_elem(old_root, i,
>> + MOD_LOG_KEY_REMOVE_WHILE_FREEING, flags);
>> + if (!tm_list[i]) {
>> + ret = -ENOMEM;
>> + goto free_tms;
>> + }
>> + }
>> + }
>> tm = kzalloc(sizeof(*tm), flags);
>> - if (!tm)
>> - return -ENOMEM;
>> + if (!tm) {
>> + ret = -ENOMEM;
>> + goto free_tms;
>> + }
>> tm->index = new_root->start >> PAGE_CACHE_SHIFT;
>> tm->old_root.logical = old_root->start;
>> @@ -659,7 +726,27 @@ tree_mod_log_insert_root(struct btrfs_fs_info
>> *fs_info,
>> tm->generation = btrfs_header_generation(old_root);
>> tm->op = MOD_LOG_ROOT_REPLACE;
>> - return __tree_mod_log_insert(fs_info, tm);
>> + if (tree_mod_dont_log(fs_info, NULL))
>> + goto free_tms;
>> +
>> + if (tm_list)
>> + __tree_mod_log_free_eb(fs_info, tm_list, nritems);
>> +
>> + ret = __tree_mod_log_insert(fs_info, tm);
>> + tree_mod_log_write_unlock(fs_info);
>> + kfree(tm_list);
>> +
>> + return ret;
>> +
>> +free_tms:
>> + if (tm_list) {
>> + for (i = 0; i < nritems; i++)
>> + kfree(tm_list[i]);
>> + kfree(tm_list);
>> + }
>> + kfree(tm);
>> +
>> + return ret;
>> }
>> static struct tree_mod_elem *
>> @@ -733,26 +820,51 @@ tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info,
>> struct extent_buffer *dst,
>> struct extent_buffer *src, unsigned long dst_offset,
>> unsigned long src_offset, int nr_items)
>> {
>> + struct tree_mod_elem **tm_list = NULL;
>> + struct tree_mod_elem **tm_list_add, **tm_list_rem;
>> int ret;
>> int i;
>> - if (tree_mod_dont_log(fs_info, NULL))
>> + if (!tree_mod_need_log(fs_info, NULL))
>> return;
>> if (btrfs_header_level(dst) == 0 && btrfs_header_level(src) == 0)
>> return;
>> + tm_list = kzalloc(nr_items * 2 * sizeof(struct tree_mod_elem *),
>> + GFP_NOFS);
>> + BUG_ON(!tm_list);
>
>
> This isn't ok, we need to return -ENOMEM here.
>
>> +
>> + tm_list_add = tm_list;
>> + tm_list_rem = tm_list + nr_items;
>> for (i = 0; i < nr_items; i++) {
>> - ret = __tree_mod_log_insert_key(fs_info, src,
>> - i + src_offset,
>> - MOD_LOG_KEY_REMOVE,
>> GFP_NOFS);
>> + tm_list_rem[i] = alloc_tree_mod_elem(src, i + src_offset,
>> + MOD_LOG_KEY_REMOVE, GFP_NOFS);
>> + BUG_ON(!tm_list_rem[i]);
>
>
> And here.
>
>> +
>> + tm_list_add[i] = alloc_tree_mod_elem(dst, i + dst_offset,
>> + MOD_LOG_KEY_ADD, GFP_NOFS);
>> + BUG_ON(!tm_list_add[i]);
>
>
> And here.
>
>> + }
>> +
>> + if (tree_mod_dont_log(fs_info, NULL))
>> + goto free_tms;
>> +
>> + for (i = 0; i < nr_items; i++) {
>> + ret = __tree_mod_log_insert(fs_info, tm_list_rem[i]);
>> BUG_ON(ret < 0);
>> - ret = __tree_mod_log_insert_key(fs_info, dst,
>> - i + dst_offset,
>> - MOD_LOG_KEY_ADD,
>> - GFP_NOFS);
>> + ret = __tree_mod_log_insert(fs_info, tm_list_add[i]);
>> BUG_ON(ret < 0);
>> }
>> +
>> + tree_mod_log_write_unlock(fs_info);
>> + kfree(tm_list);
>> + return;
>> +
>> +free_tms:
>> + for (i = 0; i < nr_items * 2; i++)
>> + kfree(tm_list[i]);
>> + kfree(tm_list);
>> }
>> static inline void
>> @@ -780,9 +892,39 @@ tree_mod_log_set_node_key(struct btrfs_fs_info
>> *fs_info,
>> static noinline void
>> tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer
>> *eb)
>> {
>> - if (tree_mod_dont_log(fs_info, eb))
>> + struct tree_mod_elem **tm_list = NULL;
>> + int nritems = 0;
>> + int i;
>> +
>> + if (btrfs_header_level(eb) == 0)
>> return;
>> - __tree_mod_log_free_eb(fs_info, eb);
>> +
>> + if (!tree_mod_need_log(fs_info, NULL))
>> + return;
>> +
>> + nritems = btrfs_header_nritems(eb);
>> + tm_list = kzalloc(nritems * sizeof(struct tree_mod_elem *),
>> + GFP_NOFS);
>> + BUG_ON(!tm_list);
>
>
> And here.
>
>> +
>> + for (i = 0; i < nritems; i++) {
>> + tm_list[i] = alloc_tree_mod_elem(eb, i,
>> + MOD_LOG_KEY_REMOVE_WHILE_FREEING, GFP_NOFS);
>> + BUG_ON(!tm_list[i]);
>
>
> And here. Thanks,
>
> Josef
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix tree mod logging
2013-12-13 20:10 ` Filipe David Manana
@ 2013-12-13 20:49 ` Josef Bacik
0 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2013-12-13 20:49 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs@vger.kernel.org
On 12/13/2013 03:10 PM, Filipe David Manana wrote:
> On Fri, Dec 13, 2013 at 8:06 PM, Josef Bacik <jbacik@fb.com> wrote:
>> On 12/13/2013 02:41 PM, Filipe David Borba Manana wrote:
>>> While running the test btrfs/004 from xfstests in a loop, it failed
>>> about 1 time out of 20 runs in my desktop. The failure happend in
>>> the backref walking part of the test, and the test's error message was
>>> like this:
>>>
>>> btrfs/004 93s ... [failed, exit status 1] - output mismatch (see
>>> /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad)
>>> --- tests/btrfs/004.out 2013-11-26 18:25:29.263333714 +0000
>>> +++ /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad
>>> 2013-12-10 15:25:10.327518516 +0000
>>> @@ -1,3 +1,8 @@
>>> QA output created by 004
>>> *** test backref walking
>>> -*** done
>>> +unexpected output from
>>> + /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal
>>> logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
>>> +expected inum: 405, expected address: 454656, file:
>>> /home/fdmanana/btrfs-tests/scratch_1/snap1/p0/d6/d3d/d156/fce, got:
>>> +
>>> ...
>>> (Run 'diff -u tests/btrfs/004.out
>>> /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad' to see the
>>> entire diff)
>>> Ran: btrfs/004
>>> Failures: btrfs/004
>>> Failed 1 of 1 tests
>>>
>>> But immediately after the test finished, the btrfs inspect-internal
>>> command
>>> returned the expected output:
>>>
>>> $ btrfs inspect-internal logical-resolve -P 141512704
>>> /home/fdmanana/btrfs-tests/scratch_1
>>> inode 405 offset 454656 root 258
>>> inode 405 offset 454656 root 5
>>>
>>> It turned out this was because the btrfs_search_old_slot() calls performed
>>> during backref walking (backref.c:__resolve_indirect_ref) were not finding
>>> anything. The reason for this turned out to be that the tree mod logging
>>> code was not logging some node multi-step operations atomically, therefore
>>> btrfs_search_old_slot() callers iterated often over an incomplete tree
>>> that
>>> wasn't fully consistent with any tree state from the past. Besides missing
>>> items, this often (but not always) resulted in -EIO errors during old slot
>>> searches, reported in dmesg like this:
>>>
>>> [ 4299.933936] ------------[ cut here ]------------
>>> [ 4299.933949] WARNING: CPU: 0 PID: 23190 at fs/btrfs/ctree.c:1343
>>> btrfs_search_old_slot+0x57b/0xab0 [btrfs]()
>>> [ 4299.933950] Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O)
>>> vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rfcomm bluetooth parport_pc
>>> ppdev binfmt_misc joydev snd_hda_codec_h
>>> [ 4299.933977] CPU: 0 PID: 23190 Comm: btrfs Tainted: G W O
>>> 3.12.0-fdm-btrfs-next-16+ #70
>>> [ 4299.933978] Hardware name: To Be Filled By O.E.M. To Be Filled By
>>> O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
>>> [ 4299.933979] 000000000000053f ffff8806f3fd98f8 ffffffff8176d284
>>> 0000000000000007
>>> [ 4299.933982] 0000000000000000 ffff8806f3fd9938 ffffffff8104a81c
>>> ffff880659c64b70
>>> [ 4299.933984] ffff880659c643d0 ffff8806599233d8 ffff880701e2e938
>>> 0000160000000000
>>> [ 4299.933987] Call Trace:
>>> [ 4299.933991] [<ffffffff8176d284>] dump_stack+0x55/0x76
>>> [ 4299.933994] [<ffffffff8104a81c>] warn_slowpath_common+0x8c/0xc0
>>> [ 4299.933997] [<ffffffff8104a86a>] warn_slowpath_null+0x1a/0x20
>>> [ 4299.934003] [<ffffffffa065d3bb>] btrfs_search_old_slot+0x57b/0xab0
>>> [btrfs]
>>> [ 4299.934005] [<ffffffff81775f3b>] ? _raw_read_unlock+0x2b/0x50
>>> [ 4299.934010] [<ffffffffa0655001>] ? __tree_mod_log_search+0x81/0xc0
>>> [btrfs]
>>> [ 4299.934019] [<ffffffffa06dd9b0>] __resolve_indirect_refs+0x130/0x5f0
>>> [btrfs]
>>> [ 4299.934027] [<ffffffffa06a21f1>] ? free_extent_buffer+0x61/0xc0
>>> [btrfs]
>>> [ 4299.934034] [<ffffffffa06de39c>] find_parent_nodes+0x1fc/0xe40 [btrfs]
>>> [ 4299.934042] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0
>>> [btrfs]
>>> [ 4299.934048] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0
>>> [btrfs]
>>> [ 4299.934056] [<ffffffffa06df980>] iterate_extent_inodes+0xe0/0x250
>>> [btrfs]
>>> [ 4299.934058] [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
>>> [ 4299.934065] [<ffffffffa06dfb82>] iterate_inodes_from_logical+0x92/0xb0
>>> [btrfs]
>>> [ 4299.934071] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0
>>> [btrfs]
>>> [ 4299.934078] [<ffffffffa06b7015>] btrfs_ioctl+0xf65/0x1f60 [btrfs]
>>> [ 4299.934080] [<ffffffff811658b8>] ? handle_mm_fault+0x278/0xb00
>>> [ 4299.934083] [<ffffffff81075563>] ? up_read+0x23/0x40
>>> [ 4299.934085] [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
>>> [ 4299.934088] [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
>>> [ 4299.934090] [<ffffffff81776e23>] ? error_sti+0x5/0x6
>>> [ 4299.934093] [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
>>> [ 4299.934096] [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
>>> [ 4299.934098] [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
>>> [ 4299.934100] [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>> [ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
>>> [ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
>>> [ 4299.934104] ---[ end trace 48f0cfc902491414 ]---
>>> [ 4299.934378] btrfs bad fsid on block 0
>>>
>>> These tree mod log operations that must be performed atomically,
>>> tree_mod_log_free_eb,
>>> tree_mod_log_eb_copy, tree_mod_log_insert_root and
>>> tree_mod_log_insert_move, used to
>>> be performed atomically before the following commit:
>>>
>>> c8cc6341653721b54760480b0d0d9b5f09b46741
>>> (Btrfs: stop using GFP_ATOMIC for the tree mod log allocations)
>>>
>>> That change removed the atomicity of such operations. This patch restores
>>> the
>>> atomicity while still not doing the GFP_ATOMIC allocations of
>>> tree_mod_elem
>>> structures, so it has to do the allocations using GFP_NOFS before
>>> acquiring
>>> the mod log lock.
>>>
>>> This issue has been experienced by several users recently, such as for
>>> example:
>>>
>>> http://www.spinics.net/lists/linux-btrfs/msg28574.html
>>>
>>> After running the btrfs/004 test for 679 consecutive iterations with this
>>> patch applied, I didn't ran into the issue anymore.
>>
>> Thanks for tracking this down, just some return error problems below.
> Right, I left the BUG_ON's because they were already being used for
> all existing tree mod failures.
> If you don't mind, I'll do that change as a separate patch.
New patches should follow the appropriate behaviour, you can leave the
BUG_ON()'s that are already there, but I don't want any new ones added.
Thanks,
Josef
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] Btrfs: fix tree mod logging
2013-12-13 19:41 [PATCH] Btrfs: fix tree mod logging Filipe David Borba Manana
2013-12-13 20:06 ` Josef Bacik
@ 2013-12-13 22:57 ` Filipe David Borba Manana
2013-12-19 7:37 ` Ahmet Inan
2013-12-20 15:17 ` [PATCH v3] " Filipe David Borba Manana
2 siblings, 1 reply; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-12-13 22:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe David Borba Manana
While running the test btrfs/004 from xfstests in a loop, it failed
about 1 time out of 20 runs in my desktop. The failure happend in
the backref walking part of the test, and the test's error message was
like this:
btrfs/004 93s ... [failed, exit status 1] - output mismatch (see /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2013-11-26 18:25:29.263333714 +0000
+++ /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad 2013-12-10 15:25:10.327518516 +0000
@@ -1,3 +1,8 @@
QA output created by 004
*** test backref walking
-*** done
+unexpected output from
+ /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
+expected inum: 405, expected address: 454656, file: /home/fdmanana/btrfs-tests/scratch_1/snap1/p0/d6/d3d/d156/fce, got:
+
...
(Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad' to see the entire diff)
Ran: btrfs/004
Failures: btrfs/004
Failed 1 of 1 tests
But immediately after the test finished, the btrfs inspect-internal command
returned the expected output:
$ btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
inode 405 offset 454656 root 258
inode 405 offset 454656 root 5
It turned out this was because the btrfs_search_old_slot() calls performed
during backref walking (backref.c:__resolve_indirect_ref) were not finding
anything. The reason for this turned out to be that the tree mod logging
code was not logging some node multi-step operations atomically, therefore
btrfs_search_old_slot() callers iterated often over an incomplete tree that
wasn't fully consistent with any tree state from the past. Besides missing
items, this often (but not always) resulted in -EIO errors during old slot
searches, reported in dmesg like this:
[ 4299.933936] ------------[ cut here ]------------
[ 4299.933949] WARNING: CPU: 0 PID: 23190 at fs/btrfs/ctree.c:1343 btrfs_search_old_slot+0x57b/0xab0 [btrfs]()
[ 4299.933950] Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rfcomm bluetooth parport_pc ppdev binfmt_misc joydev snd_hda_codec_h
[ 4299.933977] CPU: 0 PID: 23190 Comm: btrfs Tainted: G W O 3.12.0-fdm-btrfs-next-16+ #70
[ 4299.933978] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
[ 4299.933979] 000000000000053f ffff8806f3fd98f8 ffffffff8176d284 0000000000000007
[ 4299.933982] 0000000000000000 ffff8806f3fd9938 ffffffff8104a81c ffff880659c64b70
[ 4299.933984] ffff880659c643d0 ffff8806599233d8 ffff880701e2e938 0000160000000000
[ 4299.933987] Call Trace:
[ 4299.933991] [<ffffffff8176d284>] dump_stack+0x55/0x76
[ 4299.933994] [<ffffffff8104a81c>] warn_slowpath_common+0x8c/0xc0
[ 4299.933997] [<ffffffff8104a86a>] warn_slowpath_null+0x1a/0x20
[ 4299.934003] [<ffffffffa065d3bb>] btrfs_search_old_slot+0x57b/0xab0 [btrfs]
[ 4299.934005] [<ffffffff81775f3b>] ? _raw_read_unlock+0x2b/0x50
[ 4299.934010] [<ffffffffa0655001>] ? __tree_mod_log_search+0x81/0xc0 [btrfs]
[ 4299.934019] [<ffffffffa06dd9b0>] __resolve_indirect_refs+0x130/0x5f0 [btrfs]
[ 4299.934027] [<ffffffffa06a21f1>] ? free_extent_buffer+0x61/0xc0 [btrfs]
[ 4299.934034] [<ffffffffa06de39c>] find_parent_nodes+0x1fc/0xe40 [btrfs]
[ 4299.934042] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
[ 4299.934048] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
[ 4299.934056] [<ffffffffa06df980>] iterate_extent_inodes+0xe0/0x250 [btrfs]
[ 4299.934058] [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
[ 4299.934065] [<ffffffffa06dfb82>] iterate_inodes_from_logical+0x92/0xb0 [btrfs]
[ 4299.934071] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
[ 4299.934078] [<ffffffffa06b7015>] btrfs_ioctl+0xf65/0x1f60 [btrfs]
[ 4299.934080] [<ffffffff811658b8>] ? handle_mm_fault+0x278/0xb00
[ 4299.934083] [<ffffffff81075563>] ? up_read+0x23/0x40
[ 4299.934085] [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
[ 4299.934088] [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
[ 4299.934090] [<ffffffff81776e23>] ? error_sti+0x5/0x6
[ 4299.934093] [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
[ 4299.934096] [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
[ 4299.934098] [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
[ 4299.934100] [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
[ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
[ 4299.934104] ---[ end trace 48f0cfc902491414 ]---
[ 4299.934378] btrfs bad fsid on block 0
These tree mod log operations that must be performed atomically, tree_mod_log_free_eb,
tree_mod_log_eb_copy, tree_mod_log_insert_root and tree_mod_log_insert_move, used to
be performed atomically before the following commit:
c8cc6341653721b54760480b0d0d9b5f09b46741
(Btrfs: stop using GFP_ATOMIC for the tree mod log allocations)
That change removed the atomicity of such operations. This patch restores the
atomicity while still not doing the GFP_ATOMIC allocations of tree_mod_elem
structures, so it has to do the allocations using GFP_NOFS before acquiring
the mod log lock.
This issue has been experienced by several users recently, such as for example:
http://www.spinics.net/lists/linux-btrfs/msg28574.html
After running the btrfs/004 test for 679 consecutive iterations with this
patch applied, I didn't ran into the issue anymore.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
V2: Replaced BUG_ON's on kmalloc failures with proper error value returns.
fs/btrfs/ctree.c | 385 +++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 296 insertions(+), 89 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 59664f6..cfe4fa9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -39,7 +39,7 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
struct extent_buffer *src_buf);
static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
int level, int slot);
-static void tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
+static int tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb);
struct btrfs_path *btrfs_alloc_path(void)
@@ -474,6 +474,8 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
* the index is the shifted logical of the *new* root node for root replace
* operations, or the shifted logical of the affected block for all other
* operations.
+ *
+ * Note: must be called with write lock (tree_mod_log_write_lock).
*/
static noinline int
__tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
@@ -482,24 +484,9 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
struct rb_node **new;
struct rb_node *parent = NULL;
struct tree_mod_elem *cur;
- int ret = 0;
BUG_ON(!tm);
- tree_mod_log_write_lock(fs_info);
- if (list_empty(&fs_info->tree_mod_seq_list)) {
- tree_mod_log_write_unlock(fs_info);
- /*
- * Ok we no longer care about logging modifications, free up tm
- * and return 0. Any callers shouldn't be using tm after
- * calling tree_mod_log_insert, but if they do we can just
- * change this to return a special error code to let the callers
- * do their own thing.
- */
- kfree(tm);
- return 0;
- }
-
spin_lock(&fs_info->tree_mod_seq_lock);
tm->seq = btrfs_inc_tree_mod_seq_minor(fs_info);
spin_unlock(&fs_info->tree_mod_seq_lock);
@@ -517,18 +504,13 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
new = &((*new)->rb_left);
else if (cur->seq > tm->seq)
new = &((*new)->rb_right);
- else {
- ret = -EEXIST;
- kfree(tm);
- goto out;
- }
+ else
+ return -EEXIST;
}
rb_link_node(&tm->node, parent, new);
rb_insert_color(&tm->node, tm_root);
-out:
- tree_mod_log_write_unlock(fs_info);
- return ret;
+ return 0;
}
/*
@@ -544,19 +526,38 @@ static inline int tree_mod_dont_log(struct btrfs_fs_info *fs_info,
return 1;
if (eb && btrfs_header_level(eb) == 0)
return 1;
+
+ tree_mod_log_write_lock(fs_info);
+ if (list_empty(&(fs_info)->tree_mod_seq_list)) {
+ tree_mod_log_write_unlock(fs_info);
+ return 1;
+ }
+
return 0;
}
-static inline int
-__tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
- struct extent_buffer *eb, int slot,
- enum mod_log_op op, gfp_t flags)
+/* Similar to tree_mod_dont_log, but doesn't acquire any locks. */
+static inline int tree_mod_need_log(const struct btrfs_fs_info *fs_info,
+ struct extent_buffer *eb)
+{
+ smp_mb();
+ if (list_empty(&(fs_info)->tree_mod_seq_list))
+ return 0;
+ if (eb && btrfs_header_level(eb) == 0)
+ return 0;
+
+ return 1;
+}
+
+static struct tree_mod_elem *
+alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
+ enum mod_log_op op, gfp_t flags)
{
struct tree_mod_elem *tm;
tm = kzalloc(sizeof(*tm), flags);
if (!tm)
- return -ENOMEM;
+ return NULL;
tm->index = eb->start >> PAGE_CACHE_SHIFT;
if (op != MOD_LOG_KEY_ADD) {
@@ -566,8 +567,9 @@ __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
tm->op = op;
tm->slot = slot;
tm->generation = btrfs_node_ptr_generation(eb, slot);
+ RB_CLEAR_NODE(&tm->node);
- return __tree_mod_log_insert(fs_info, tm);
+ return tm;
}
static noinline int
@@ -575,10 +577,27 @@ tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb, int slot,
enum mod_log_op op, gfp_t flags)
{
- if (tree_mod_dont_log(fs_info, eb))
+ struct tree_mod_elem *tm;
+ int ret;
+
+ if (!tree_mod_need_log(fs_info, eb))
return 0;
- return __tree_mod_log_insert_key(fs_info, eb, slot, op, flags);
+ tm = alloc_tree_mod_elem(eb, slot, op, flags);
+ if (!tm)
+ return -ENOMEM;
+
+ if (tree_mod_dont_log(fs_info, eb)) {
+ kfree(tm);
+ return 0;
+ }
+
+ ret = __tree_mod_log_insert(fs_info, tm);
+ tree_mod_log_write_unlock(fs_info);
+ if (ret)
+ kfree(tm);
+
+ return ret;
}
static noinline int
@@ -586,24 +605,15 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb, int dst_slot, int src_slot,
int nr_items, gfp_t flags)
{
- struct tree_mod_elem *tm;
- int ret;
+ struct tree_mod_elem *tm = NULL;
+ struct tree_mod_elem **tm_list = NULL;
+ int ret = 0;
int i;
+ int locked = 0;
- if (tree_mod_dont_log(fs_info, eb))
+ if (!tree_mod_need_log(fs_info, eb))
return 0;
- /*
- * When we override something during the move, we log these removals.
- * This can only happen when we move towards the beginning of the
- * buffer, i.e. dst_slot < src_slot.
- */
- for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
- ret = __tree_mod_log_insert_key(fs_info, eb, i + dst_slot,
- MOD_LOG_KEY_REMOVE_WHILE_MOVING, GFP_NOFS);
- BUG_ON(ret < 0);
- }
-
tm = kzalloc(sizeof(*tm), flags);
if (!tm)
return -ENOMEM;
@@ -614,25 +624,76 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
tm->move.nr_items = nr_items;
tm->op = MOD_LOG_MOVE_KEYS;
- return __tree_mod_log_insert(fs_info, tm);
+ tm_list = kzalloc(nr_items * sizeof(struct tree_mod_elem *), flags);
+ if (!tm_list) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+
+ for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
+ tm_list[i] = alloc_tree_mod_elem(eb, i + dst_slot,
+ MOD_LOG_KEY_REMOVE_WHILE_MOVING, flags);
+ if (!tm_list[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ }
+
+ if (tree_mod_dont_log(fs_info, eb))
+ goto free_tms;
+ locked = 1;
+
+ /*
+ * When we override something during the move, we log these removals.
+ * This can only happen when we move towards the beginning of the
+ * buffer, i.e. dst_slot < src_slot.
+ */
+ for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
+ ret = __tree_mod_log_insert(fs_info, tm_list[i]);
+ if (ret)
+ goto free_tms;
+ }
+
+ ret = __tree_mod_log_insert(fs_info, tm);
+ if (ret)
+ goto free_tms;
+ tree_mod_log_write_unlock(fs_info);
+ kfree(tm_list);
+
+ return 0;
+free_tms:
+ for (i = 0; i < nr_items; i++) {
+ if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+ rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
+ kfree(tm_list[i]);
+ }
+ if (locked)
+ tree_mod_log_write_unlock(fs_info);
+ kfree(tm_list);
+ kfree(tm);
+
+ return ret;
}
-static inline void
-__tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
+static inline int
+__tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
+ struct tree_mod_elem **tm_list,
+ int nritems)
{
- int i;
- u32 nritems;
+ int i, j;
int ret;
- if (btrfs_header_level(eb) == 0)
- return;
-
- nritems = btrfs_header_nritems(eb);
for (i = nritems - 1; i >= 0; i--) {
- ret = __tree_mod_log_insert_key(fs_info, eb, i,
- MOD_LOG_KEY_REMOVE_WHILE_FREEING, GFP_NOFS);
- BUG_ON(ret < 0);
+ ret = __tree_mod_log_insert(fs_info, tm_list[i]);
+ if (ret) {
+ for (j = nritems - 1; j > i; j--)
+ rb_erase(&tm_list[j]->node,
+ &fs_info->tree_mod_log);
+ return ret;
+ }
}
+
+ return 0;
}
static noinline int
@@ -641,17 +702,38 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
struct extent_buffer *new_root, gfp_t flags,
int log_removal)
{
- struct tree_mod_elem *tm;
+ struct tree_mod_elem *tm = NULL;
+ struct tree_mod_elem **tm_list = NULL;
+ int nritems = 0;
+ int ret = 0;
+ int i;
- if (tree_mod_dont_log(fs_info, NULL))
+ if (!tree_mod_need_log(fs_info, NULL))
return 0;
- if (log_removal)
- __tree_mod_log_free_eb(fs_info, old_root);
+ if (log_removal && btrfs_header_level(old_root) > 0) {
+ nritems = btrfs_header_nritems(old_root);
+ tm_list = kzalloc(nritems * sizeof(struct tree_mod_elem *),
+ flags);
+ if (!tm_list) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ for (i = 0; i < nritems; i++) {
+ tm_list[i] = alloc_tree_mod_elem(old_root, i,
+ MOD_LOG_KEY_REMOVE_WHILE_FREEING, flags);
+ if (!tm_list[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ }
+ }
tm = kzalloc(sizeof(*tm), flags);
- if (!tm)
- return -ENOMEM;
+ if (!tm) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
tm->index = new_root->start >> PAGE_CACHE_SHIFT;
tm->old_root.logical = old_root->start;
@@ -659,7 +741,30 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
tm->generation = btrfs_header_generation(old_root);
tm->op = MOD_LOG_ROOT_REPLACE;
- return __tree_mod_log_insert(fs_info, tm);
+ if (tree_mod_dont_log(fs_info, NULL))
+ goto free_tms;
+
+ if (tm_list)
+ ret = __tree_mod_log_free_eb(fs_info, tm_list, nritems);
+ if (!ret)
+ ret = __tree_mod_log_insert(fs_info, tm);
+
+ tree_mod_log_write_unlock(fs_info);
+ if (ret)
+ goto free_tms;
+ kfree(tm_list);
+
+ return ret;
+
+free_tms:
+ if (tm_list) {
+ for (i = 0; i < nritems; i++)
+ kfree(tm_list[i]);
+ kfree(tm_list);
+ }
+ kfree(tm);
+
+ return ret;
}
static struct tree_mod_elem *
@@ -728,31 +833,75 @@ tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64 min_seq)
return __tree_mod_log_search(fs_info, start, min_seq, 0);
}
-static noinline void
+static noinline int
tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct extent_buffer *dst,
struct extent_buffer *src, unsigned long dst_offset,
unsigned long src_offset, int nr_items)
{
- int ret;
+ int ret = 0;
+ struct tree_mod_elem **tm_list = NULL;
+ struct tree_mod_elem **tm_list_add, **tm_list_rem;
int i;
+ int locked = 0;
- if (tree_mod_dont_log(fs_info, NULL))
- return;
+ if (!tree_mod_need_log(fs_info, NULL))
+ return 0;
if (btrfs_header_level(dst) == 0 && btrfs_header_level(src) == 0)
- return;
+ return 0;
+ tm_list = kzalloc(nr_items * 2 * sizeof(struct tree_mod_elem *),
+ GFP_NOFS);
+ if (!tm_list)
+ return -ENOMEM;
+
+ tm_list_add = tm_list;
+ tm_list_rem = tm_list + nr_items;
for (i = 0; i < nr_items; i++) {
- ret = __tree_mod_log_insert_key(fs_info, src,
- i + src_offset,
- MOD_LOG_KEY_REMOVE, GFP_NOFS);
- BUG_ON(ret < 0);
- ret = __tree_mod_log_insert_key(fs_info, dst,
- i + dst_offset,
- MOD_LOG_KEY_ADD,
- GFP_NOFS);
- BUG_ON(ret < 0);
+ tm_list_rem[i] = alloc_tree_mod_elem(src, i + src_offset,
+ MOD_LOG_KEY_REMOVE, GFP_NOFS);
+ if (!tm_list_rem[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+
+ tm_list_add[i] = alloc_tree_mod_elem(dst, i + dst_offset,
+ MOD_LOG_KEY_ADD, GFP_NOFS);
+ if (!tm_list_add[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ }
+
+ if (tree_mod_dont_log(fs_info, NULL))
+ goto free_tms;
+ locked = 1;
+
+ for (i = 0; i < nr_items; i++) {
+ ret = __tree_mod_log_insert(fs_info, tm_list_rem[i]);
+ if (ret)
+ goto free_tms;
+ ret = __tree_mod_log_insert(fs_info, tm_list_add[i]);
+ if (ret)
+ goto free_tms;
+ }
+
+ tree_mod_log_write_unlock(fs_info);
+ kfree(tm_list);
+
+ return 0;
+
+free_tms:
+ for (i = 0; i < nr_items * 2; i++) {
+ if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+ rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
+ kfree(tm_list[i]);
}
+ if (locked)
+ tree_mod_log_write_unlock(fs_info);
+ kfree(tm_list);
+
+ return ret;
}
static inline void
@@ -777,12 +926,52 @@ tree_mod_log_set_node_key(struct btrfs_fs_info *fs_info,
BUG_ON(ret < 0);
}
-static noinline void
+static noinline int
tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
{
+ struct tree_mod_elem **tm_list = NULL;
+ int nritems = 0;
+ int i;
+ int ret = 0;
+
+ if (btrfs_header_level(eb) == 0)
+ return 0;
+
+ if (!tree_mod_need_log(fs_info, NULL))
+ return 0;
+
+ nritems = btrfs_header_nritems(eb);
+ tm_list = kzalloc(nritems * sizeof(struct tree_mod_elem *),
+ GFP_NOFS);
+ if (!tm_list)
+ return -ENOMEM;
+
+ for (i = 0; i < nritems; i++) {
+ tm_list[i] = alloc_tree_mod_elem(eb, i,
+ MOD_LOG_KEY_REMOVE_WHILE_FREEING, GFP_NOFS);
+ if (!tm_list[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ }
+
if (tree_mod_dont_log(fs_info, eb))
- return;
- __tree_mod_log_free_eb(fs_info, eb);
+ goto free_tms;
+
+ ret = __tree_mod_log_free_eb(fs_info, tm_list, nritems);
+ tree_mod_log_write_unlock(fs_info);
+ if (ret)
+ goto free_tms;
+ kfree(tm_list);
+
+ return 0;
+
+free_tms:
+ for (i = 0; i < nritems; i++)
+ kfree(tm_list[i]);
+ kfree(tm_list);
+
+ return ret;
}
static noinline void
@@ -1040,8 +1229,13 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
btrfs_set_node_ptr_generation(parent, parent_slot,
trans->transid);
btrfs_mark_buffer_dirty(parent);
- if (last_ref)
- tree_mod_log_free_eb(root->fs_info, buf);
+ if (last_ref) {
+ ret = tree_mod_log_free_eb(root->fs_info, buf);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ return ret;
+ }
+ }
btrfs_free_tree_block(trans, root, buf, parent_start,
last_ref);
}
@@ -3064,8 +3258,12 @@ static int push_node_left(struct btrfs_trans_handle *trans,
} else
push_items = min(src_nritems - 8, push_items);
- tree_mod_log_eb_copy(root->fs_info, dst, src, dst_nritems, 0,
- push_items);
+ ret = tree_mod_log_eb_copy(root->fs_info, dst, src, dst_nritems, 0,
+ push_items);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ return ret;
+ }
copy_extent_buffer(dst, src,
btrfs_node_key_ptr_offset(dst_nritems),
btrfs_node_key_ptr_offset(0),
@@ -3135,8 +3333,12 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
(dst_nritems) *
sizeof(struct btrfs_key_ptr));
- tree_mod_log_eb_copy(root->fs_info, dst, src, 0,
- src_nritems - push_items, push_items);
+ ret = tree_mod_log_eb_copy(root->fs_info, dst, src, 0,
+ src_nritems - push_items, push_items);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ return ret;
+ }
copy_extent_buffer(dst, src,
btrfs_node_key_ptr_offset(0),
btrfs_node_key_ptr_offset(src_nritems - push_items),
@@ -3337,7 +3539,12 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
btrfs_header_chunk_tree_uuid(split),
BTRFS_UUID_SIZE);
- tree_mod_log_eb_copy(root->fs_info, split, c, 0, mid, c_nritems - mid);
+ ret = tree_mod_log_eb_copy(root->fs_info, split, c, 0,
+ mid, c_nritems - mid);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ return ret;
+ }
copy_extent_buffer(split, c,
btrfs_node_key_ptr_offset(0),
btrfs_node_key_ptr_offset(mid),
--
1.7.9.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] Btrfs: fix tree mod logging
2013-12-13 22:57 ` [PATCH v2] " Filipe David Borba Manana
@ 2013-12-19 7:37 ` Ahmet Inan
2013-12-19 20:17 ` Filipe David Manana
0 siblings, 1 reply; 11+ messages in thread
From: Ahmet Inan @ 2013-12-19 7:37 UTC (permalink / raw)
To: Filipe David Borba Manana; +Cc: linux-btrfs
Thanks a lot Filipe!
Have been testing this patch now for 5 days and it fixed this annoying
Problem since 3.11.0 on 3x NFS Servers here.
This is also a candidate that should be back ported, as it fixes crashes.
Just for Information for others here: Your previous patch,
"Btrfs: return immediately if tree log mod is not necessary"
is also needed to make it apply cleanly.
Ahmet
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] Btrfs: fix tree mod logging
2013-12-19 7:37 ` Ahmet Inan
@ 2013-12-19 20:17 ` Filipe David Manana
0 siblings, 0 replies; 11+ messages in thread
From: Filipe David Manana @ 2013-12-19 20:17 UTC (permalink / raw)
To: Ahmet Inan; +Cc: linux-btrfs
On Thu, Dec 19, 2013 at 7:37 AM, Ahmet Inan
<ainan@mathematik.uni-freiburg.de> wrote:
> Thanks a lot Filipe!
>
> Have been testing this patch now for 5 days and it fixed this annoying
> Problem since 3.11.0 on 3x NFS Servers here.
> This is also a candidate that should be back ported, as it fixes crashes.
> Just for Information for others here: Your previous patch,
> "Btrfs: return immediately if tree log mod is not necessary"
> is also needed to make it apply cleanly.
Thank you Ahmet for testing it and reporting back :)
>
> Ahmet
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3] Btrfs: fix tree mod logging
2013-12-13 19:41 [PATCH] Btrfs: fix tree mod logging Filipe David Borba Manana
2013-12-13 20:06 ` Josef Bacik
2013-12-13 22:57 ` [PATCH v2] " Filipe David Borba Manana
@ 2013-12-20 15:17 ` Filipe David Borba Manana
2013-12-20 23:19 ` Holger Hoffstätte
2 siblings, 1 reply; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-12-20 15:17 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe David Borba Manana
While running the test btrfs/004 from xfstests in a loop, it failed
about 1 time out of 20 runs in my desktop. The failure happened in
the backref walking part of the test, and the test's error message was
like this:
btrfs/004 93s ... [failed, exit status 1] - output mismatch (see /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2013-11-26 18:25:29.263333714 +0000
+++ /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad 2013-12-10 15:25:10.327518516 +0000
@@ -1,3 +1,8 @@
QA output created by 004
*** test backref walking
-*** done
+unexpected output from
+ /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
+expected inum: 405, expected address: 454656, file: /home/fdmanana/btrfs-tests/scratch_1/snap1/p0/d6/d3d/d156/fce, got:
+
...
(Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad' to see the entire diff)
Ran: btrfs/004
Failures: btrfs/004
Failed 1 of 1 tests
But immediately after the test finished, the btrfs inspect-internal command
returned the expected output:
$ btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
inode 405 offset 454656 root 258
inode 405 offset 454656 root 5
It turned out this was because the btrfs_search_old_slot() calls performed
during backref walking (backref.c:__resolve_indirect_ref) were not finding
anything. The reason for this turned out to be that the tree mod logging
code was not logging some node multi-step operations atomically, therefore
btrfs_search_old_slot() callers iterated often over an incomplete tree that
wasn't fully consistent with any tree state from the past. Besides missing
items, this often (but not always) resulted in -EIO errors during old slot
searches, reported in dmesg like this:
[ 4299.933936] ------------[ cut here ]------------
[ 4299.933949] WARNING: CPU: 0 PID: 23190 at fs/btrfs/ctree.c:1343 btrfs_search_old_slot+0x57b/0xab0 [btrfs]()
[ 4299.933950] Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rfcomm bluetooth parport_pc ppdev binfmt_misc joydev snd_hda_codec_h
[ 4299.933977] CPU: 0 PID: 23190 Comm: btrfs Tainted: G W O 3.12.0-fdm-btrfs-next-16+ #70
[ 4299.933978] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
[ 4299.933979] 000000000000053f ffff8806f3fd98f8 ffffffff8176d284 0000000000000007
[ 4299.933982] 0000000000000000 ffff8806f3fd9938 ffffffff8104a81c ffff880659c64b70
[ 4299.933984] ffff880659c643d0 ffff8806599233d8 ffff880701e2e938 0000160000000000
[ 4299.933987] Call Trace:
[ 4299.933991] [<ffffffff8176d284>] dump_stack+0x55/0x76
[ 4299.933994] [<ffffffff8104a81c>] warn_slowpath_common+0x8c/0xc0
[ 4299.933997] [<ffffffff8104a86a>] warn_slowpath_null+0x1a/0x20
[ 4299.934003] [<ffffffffa065d3bb>] btrfs_search_old_slot+0x57b/0xab0 [btrfs]
[ 4299.934005] [<ffffffff81775f3b>] ? _raw_read_unlock+0x2b/0x50
[ 4299.934010] [<ffffffffa0655001>] ? __tree_mod_log_search+0x81/0xc0 [btrfs]
[ 4299.934019] [<ffffffffa06dd9b0>] __resolve_indirect_refs+0x130/0x5f0 [btrfs]
[ 4299.934027] [<ffffffffa06a21f1>] ? free_extent_buffer+0x61/0xc0 [btrfs]
[ 4299.934034] [<ffffffffa06de39c>] find_parent_nodes+0x1fc/0xe40 [btrfs]
[ 4299.934042] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
[ 4299.934048] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
[ 4299.934056] [<ffffffffa06df980>] iterate_extent_inodes+0xe0/0x250 [btrfs]
[ 4299.934058] [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
[ 4299.934065] [<ffffffffa06dfb82>] iterate_inodes_from_logical+0x92/0xb0 [btrfs]
[ 4299.934071] [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
[ 4299.934078] [<ffffffffa06b7015>] btrfs_ioctl+0xf65/0x1f60 [btrfs]
[ 4299.934080] [<ffffffff811658b8>] ? handle_mm_fault+0x278/0xb00
[ 4299.934083] [<ffffffff81075563>] ? up_read+0x23/0x40
[ 4299.934085] [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
[ 4299.934088] [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
[ 4299.934090] [<ffffffff81776e23>] ? error_sti+0x5/0x6
[ 4299.934093] [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
[ 4299.934096] [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
[ 4299.934098] [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
[ 4299.934100] [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
[ 4299.934102] [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
[ 4299.934104] ---[ end trace 48f0cfc902491414 ]---
[ 4299.934378] btrfs bad fsid on block 0
These tree mod log operations that must be performed atomically, tree_mod_log_free_eb,
tree_mod_log_eb_copy, tree_mod_log_insert_root and tree_mod_log_insert_move, used to
be performed atomically before the following commit:
c8cc6341653721b54760480b0d0d9b5f09b46741
(Btrfs: stop using GFP_ATOMIC for the tree mod log allocations)
That change removed the atomicity of such operations. This patch restores the
atomicity while still not doing the GFP_ATOMIC allocations of tree_mod_elem
structures, so it has to do the allocations using GFP_NOFS before acquiring
the mod log lock.
This issue has been experienced by several users recently, such as for example:
http://www.spinics.net/lists/linux-btrfs/msg28574.html
After running the btrfs/004 test for 679 consecutive iterations with this
patch applied, I didn't ran into the issue anymore.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
V2: Replaced BUG_ON's on kmalloc failures with proper error value returns.
V3: Fixed possible NULL pointer dereference in error path of tree_mod_log_insert_move.
Thanks to kbuild test robot from Fengguang.
fs/btrfs/ctree.c | 385 +++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 296 insertions(+), 89 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 59664f6..7d88d85 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -39,7 +39,7 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
struct extent_buffer *src_buf);
static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
int level, int slot);
-static void tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
+static int tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb);
struct btrfs_path *btrfs_alloc_path(void)
@@ -474,6 +474,8 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
* the index is the shifted logical of the *new* root node for root replace
* operations, or the shifted logical of the affected block for all other
* operations.
+ *
+ * Note: must be called with write lock (tree_mod_log_write_lock).
*/
static noinline int
__tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
@@ -482,24 +484,9 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
struct rb_node **new;
struct rb_node *parent = NULL;
struct tree_mod_elem *cur;
- int ret = 0;
BUG_ON(!tm);
- tree_mod_log_write_lock(fs_info);
- if (list_empty(&fs_info->tree_mod_seq_list)) {
- tree_mod_log_write_unlock(fs_info);
- /*
- * Ok we no longer care about logging modifications, free up tm
- * and return 0. Any callers shouldn't be using tm after
- * calling tree_mod_log_insert, but if they do we can just
- * change this to return a special error code to let the callers
- * do their own thing.
- */
- kfree(tm);
- return 0;
- }
-
spin_lock(&fs_info->tree_mod_seq_lock);
tm->seq = btrfs_inc_tree_mod_seq_minor(fs_info);
spin_unlock(&fs_info->tree_mod_seq_lock);
@@ -517,18 +504,13 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
new = &((*new)->rb_left);
else if (cur->seq > tm->seq)
new = &((*new)->rb_right);
- else {
- ret = -EEXIST;
- kfree(tm);
- goto out;
- }
+ else
+ return -EEXIST;
}
rb_link_node(&tm->node, parent, new);
rb_insert_color(&tm->node, tm_root);
-out:
- tree_mod_log_write_unlock(fs_info);
- return ret;
+ return 0;
}
/*
@@ -544,19 +526,38 @@ static inline int tree_mod_dont_log(struct btrfs_fs_info *fs_info,
return 1;
if (eb && btrfs_header_level(eb) == 0)
return 1;
+
+ tree_mod_log_write_lock(fs_info);
+ if (list_empty(&(fs_info)->tree_mod_seq_list)) {
+ tree_mod_log_write_unlock(fs_info);
+ return 1;
+ }
+
return 0;
}
-static inline int
-__tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
- struct extent_buffer *eb, int slot,
- enum mod_log_op op, gfp_t flags)
+/* Similar to tree_mod_dont_log, but doesn't acquire any locks. */
+static inline int tree_mod_need_log(const struct btrfs_fs_info *fs_info,
+ struct extent_buffer *eb)
+{
+ smp_mb();
+ if (list_empty(&(fs_info)->tree_mod_seq_list))
+ return 0;
+ if (eb && btrfs_header_level(eb) == 0)
+ return 0;
+
+ return 1;
+}
+
+static struct tree_mod_elem *
+alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
+ enum mod_log_op op, gfp_t flags)
{
struct tree_mod_elem *tm;
tm = kzalloc(sizeof(*tm), flags);
if (!tm)
- return -ENOMEM;
+ return NULL;
tm->index = eb->start >> PAGE_CACHE_SHIFT;
if (op != MOD_LOG_KEY_ADD) {
@@ -566,8 +567,9 @@ __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
tm->op = op;
tm->slot = slot;
tm->generation = btrfs_node_ptr_generation(eb, slot);
+ RB_CLEAR_NODE(&tm->node);
- return __tree_mod_log_insert(fs_info, tm);
+ return tm;
}
static noinline int
@@ -575,10 +577,27 @@ tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb, int slot,
enum mod_log_op op, gfp_t flags)
{
- if (tree_mod_dont_log(fs_info, eb))
+ struct tree_mod_elem *tm;
+ int ret;
+
+ if (!tree_mod_need_log(fs_info, eb))
+ return 0;
+
+ tm = alloc_tree_mod_elem(eb, slot, op, flags);
+ if (!tm)
+ return -ENOMEM;
+
+ if (tree_mod_dont_log(fs_info, eb)) {
+ kfree(tm);
return 0;
+ }
+
+ ret = __tree_mod_log_insert(fs_info, tm);
+ tree_mod_log_write_unlock(fs_info);
+ if (ret)
+ kfree(tm);
- return __tree_mod_log_insert_key(fs_info, eb, slot, op, flags);
+ return ret;
}
static noinline int
@@ -586,53 +605,95 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb, int dst_slot, int src_slot,
int nr_items, gfp_t flags)
{
- struct tree_mod_elem *tm;
- int ret;
+ struct tree_mod_elem *tm = NULL;
+ struct tree_mod_elem **tm_list = NULL;
+ int ret = 0;
int i;
+ int locked = 0;
- if (tree_mod_dont_log(fs_info, eb))
+ if (!tree_mod_need_log(fs_info, eb))
return 0;
+ tm_list = kzalloc(nr_items * sizeof(struct tree_mod_elem *), flags);
+ if (!tm_list)
+ return -ENOMEM;
+
+ tm = kzalloc(sizeof(*tm), flags);
+ if (!tm) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+
+ tm->index = eb->start >> PAGE_CACHE_SHIFT;
+ tm->slot = src_slot;
+ tm->move.dst_slot = dst_slot;
+ tm->move.nr_items = nr_items;
+ tm->op = MOD_LOG_MOVE_KEYS;
+
+ for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
+ tm_list[i] = alloc_tree_mod_elem(eb, i + dst_slot,
+ MOD_LOG_KEY_REMOVE_WHILE_MOVING, flags);
+ if (!tm_list[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ }
+
+ if (tree_mod_dont_log(fs_info, eb))
+ goto free_tms;
+ locked = 1;
+
/*
* When we override something during the move, we log these removals.
* This can only happen when we move towards the beginning of the
* buffer, i.e. dst_slot < src_slot.
*/
for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
- ret = __tree_mod_log_insert_key(fs_info, eb, i + dst_slot,
- MOD_LOG_KEY_REMOVE_WHILE_MOVING, GFP_NOFS);
- BUG_ON(ret < 0);
+ ret = __tree_mod_log_insert(fs_info, tm_list[i]);
+ if (ret)
+ goto free_tms;
}
- tm = kzalloc(sizeof(*tm), flags);
- if (!tm)
- return -ENOMEM;
+ ret = __tree_mod_log_insert(fs_info, tm);
+ if (ret)
+ goto free_tms;
+ tree_mod_log_write_unlock(fs_info);
+ kfree(tm_list);
- tm->index = eb->start >> PAGE_CACHE_SHIFT;
- tm->slot = src_slot;
- tm->move.dst_slot = dst_slot;
- tm->move.nr_items = nr_items;
- tm->op = MOD_LOG_MOVE_KEYS;
+ return 0;
+free_tms:
+ for (i = 0; i < nr_items; i++) {
+ if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+ rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
+ kfree(tm_list[i]);
+ }
+ if (locked)
+ tree_mod_log_write_unlock(fs_info);
+ kfree(tm_list);
+ kfree(tm);
- return __tree_mod_log_insert(fs_info, tm);
+ return ret;
}
-static inline void
-__tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
+static inline int
+__tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
+ struct tree_mod_elem **tm_list,
+ int nritems)
{
- int i;
- u32 nritems;
+ int i, j;
int ret;
- if (btrfs_header_level(eb) == 0)
- return;
-
- nritems = btrfs_header_nritems(eb);
for (i = nritems - 1; i >= 0; i--) {
- ret = __tree_mod_log_insert_key(fs_info, eb, i,
- MOD_LOG_KEY_REMOVE_WHILE_FREEING, GFP_NOFS);
- BUG_ON(ret < 0);
+ ret = __tree_mod_log_insert(fs_info, tm_list[i]);
+ if (ret) {
+ for (j = nritems - 1; j > i; j--)
+ rb_erase(&tm_list[j]->node,
+ &fs_info->tree_mod_log);
+ return ret;
+ }
}
+
+ return 0;
}
static noinline int
@@ -641,17 +702,38 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
struct extent_buffer *new_root, gfp_t flags,
int log_removal)
{
- struct tree_mod_elem *tm;
+ struct tree_mod_elem *tm = NULL;
+ struct tree_mod_elem **tm_list = NULL;
+ int nritems = 0;
+ int ret = 0;
+ int i;
- if (tree_mod_dont_log(fs_info, NULL))
+ if (!tree_mod_need_log(fs_info, NULL))
return 0;
- if (log_removal)
- __tree_mod_log_free_eb(fs_info, old_root);
+ if (log_removal && btrfs_header_level(old_root) > 0) {
+ nritems = btrfs_header_nritems(old_root);
+ tm_list = kzalloc(nritems * sizeof(struct tree_mod_elem *),
+ flags);
+ if (!tm_list) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ for (i = 0; i < nritems; i++) {
+ tm_list[i] = alloc_tree_mod_elem(old_root, i,
+ MOD_LOG_KEY_REMOVE_WHILE_FREEING, flags);
+ if (!tm_list[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ }
+ }
tm = kzalloc(sizeof(*tm), flags);
- if (!tm)
- return -ENOMEM;
+ if (!tm) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
tm->index = new_root->start >> PAGE_CACHE_SHIFT;
tm->old_root.logical = old_root->start;
@@ -659,7 +741,30 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
tm->generation = btrfs_header_generation(old_root);
tm->op = MOD_LOG_ROOT_REPLACE;
- return __tree_mod_log_insert(fs_info, tm);
+ if (tree_mod_dont_log(fs_info, NULL))
+ goto free_tms;
+
+ if (tm_list)
+ ret = __tree_mod_log_free_eb(fs_info, tm_list, nritems);
+ if (!ret)
+ ret = __tree_mod_log_insert(fs_info, tm);
+
+ tree_mod_log_write_unlock(fs_info);
+ if (ret)
+ goto free_tms;
+ kfree(tm_list);
+
+ return ret;
+
+free_tms:
+ if (tm_list) {
+ for (i = 0; i < nritems; i++)
+ kfree(tm_list[i]);
+ kfree(tm_list);
+ }
+ kfree(tm);
+
+ return ret;
}
static struct tree_mod_elem *
@@ -728,31 +833,75 @@ tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64 min_seq)
return __tree_mod_log_search(fs_info, start, min_seq, 0);
}
-static noinline void
+static noinline int
tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct extent_buffer *dst,
struct extent_buffer *src, unsigned long dst_offset,
unsigned long src_offset, int nr_items)
{
- int ret;
+ int ret = 0;
+ struct tree_mod_elem **tm_list = NULL;
+ struct tree_mod_elem **tm_list_add, **tm_list_rem;
int i;
+ int locked = 0;
- if (tree_mod_dont_log(fs_info, NULL))
- return;
+ if (!tree_mod_need_log(fs_info, NULL))
+ return 0;
if (btrfs_header_level(dst) == 0 && btrfs_header_level(src) == 0)
- return;
+ return 0;
+ tm_list = kzalloc(nr_items * 2 * sizeof(struct tree_mod_elem *),
+ GFP_NOFS);
+ if (!tm_list)
+ return -ENOMEM;
+
+ tm_list_add = tm_list;
+ tm_list_rem = tm_list + nr_items;
for (i = 0; i < nr_items; i++) {
- ret = __tree_mod_log_insert_key(fs_info, src,
- i + src_offset,
- MOD_LOG_KEY_REMOVE, GFP_NOFS);
- BUG_ON(ret < 0);
- ret = __tree_mod_log_insert_key(fs_info, dst,
- i + dst_offset,
- MOD_LOG_KEY_ADD,
- GFP_NOFS);
- BUG_ON(ret < 0);
+ tm_list_rem[i] = alloc_tree_mod_elem(src, i + src_offset,
+ MOD_LOG_KEY_REMOVE, GFP_NOFS);
+ if (!tm_list_rem[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+
+ tm_list_add[i] = alloc_tree_mod_elem(dst, i + dst_offset,
+ MOD_LOG_KEY_ADD, GFP_NOFS);
+ if (!tm_list_add[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ }
+
+ if (tree_mod_dont_log(fs_info, NULL))
+ goto free_tms;
+ locked = 1;
+
+ for (i = 0; i < nr_items; i++) {
+ ret = __tree_mod_log_insert(fs_info, tm_list_rem[i]);
+ if (ret)
+ goto free_tms;
+ ret = __tree_mod_log_insert(fs_info, tm_list_add[i]);
+ if (ret)
+ goto free_tms;
+ }
+
+ tree_mod_log_write_unlock(fs_info);
+ kfree(tm_list);
+
+ return 0;
+
+free_tms:
+ for (i = 0; i < nr_items * 2; i++) {
+ if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+ rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
+ kfree(tm_list[i]);
}
+ if (locked)
+ tree_mod_log_write_unlock(fs_info);
+ kfree(tm_list);
+
+ return ret;
}
static inline void
@@ -777,12 +926,52 @@ tree_mod_log_set_node_key(struct btrfs_fs_info *fs_info,
BUG_ON(ret < 0);
}
-static noinline void
+static noinline int
tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
{
+ struct tree_mod_elem **tm_list = NULL;
+ int nritems = 0;
+ int i;
+ int ret = 0;
+
+ if (btrfs_header_level(eb) == 0)
+ return 0;
+
+ if (!tree_mod_need_log(fs_info, NULL))
+ return 0;
+
+ nritems = btrfs_header_nritems(eb);
+ tm_list = kzalloc(nritems * sizeof(struct tree_mod_elem *),
+ GFP_NOFS);
+ if (!tm_list)
+ return -ENOMEM;
+
+ for (i = 0; i < nritems; i++) {
+ tm_list[i] = alloc_tree_mod_elem(eb, i,
+ MOD_LOG_KEY_REMOVE_WHILE_FREEING, GFP_NOFS);
+ if (!tm_list[i]) {
+ ret = -ENOMEM;
+ goto free_tms;
+ }
+ }
+
if (tree_mod_dont_log(fs_info, eb))
- return;
- __tree_mod_log_free_eb(fs_info, eb);
+ goto free_tms;
+
+ ret = __tree_mod_log_free_eb(fs_info, tm_list, nritems);
+ tree_mod_log_write_unlock(fs_info);
+ if (ret)
+ goto free_tms;
+ kfree(tm_list);
+
+ return 0;
+
+free_tms:
+ for (i = 0; i < nritems; i++)
+ kfree(tm_list[i]);
+ kfree(tm_list);
+
+ return ret;
}
static noinline void
@@ -1040,8 +1229,13 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
btrfs_set_node_ptr_generation(parent, parent_slot,
trans->transid);
btrfs_mark_buffer_dirty(parent);
- if (last_ref)
- tree_mod_log_free_eb(root->fs_info, buf);
+ if (last_ref) {
+ ret = tree_mod_log_free_eb(root->fs_info, buf);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ return ret;
+ }
+ }
btrfs_free_tree_block(trans, root, buf, parent_start,
last_ref);
}
@@ -3064,8 +3258,12 @@ static int push_node_left(struct btrfs_trans_handle *trans,
} else
push_items = min(src_nritems - 8, push_items);
- tree_mod_log_eb_copy(root->fs_info, dst, src, dst_nritems, 0,
- push_items);
+ ret = tree_mod_log_eb_copy(root->fs_info, dst, src, dst_nritems, 0,
+ push_items);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ return ret;
+ }
copy_extent_buffer(dst, src,
btrfs_node_key_ptr_offset(dst_nritems),
btrfs_node_key_ptr_offset(0),
@@ -3135,8 +3333,12 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
(dst_nritems) *
sizeof(struct btrfs_key_ptr));
- tree_mod_log_eb_copy(root->fs_info, dst, src, 0,
- src_nritems - push_items, push_items);
+ ret = tree_mod_log_eb_copy(root->fs_info, dst, src, 0,
+ src_nritems - push_items, push_items);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ return ret;
+ }
copy_extent_buffer(dst, src,
btrfs_node_key_ptr_offset(0),
btrfs_node_key_ptr_offset(src_nritems - push_items),
@@ -3337,7 +3539,12 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
btrfs_header_chunk_tree_uuid(split),
BTRFS_UUID_SIZE);
- tree_mod_log_eb_copy(root->fs_info, split, c, 0, mid, c_nritems - mid);
+ ret = tree_mod_log_eb_copy(root->fs_info, split, c, 0,
+ mid, c_nritems - mid);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ return ret;
+ }
copy_extent_buffer(split, c,
btrfs_node_key_ptr_offset(0),
btrfs_node_key_ptr_offset(mid),
--
1.7.9.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3] Btrfs: fix tree mod logging
2013-12-20 15:17 ` [PATCH v3] " Filipe David Borba Manana
@ 2013-12-20 23:19 ` Holger Hoffstätte
2013-12-20 23:31 ` Filipe David Manana
0 siblings, 1 reply; 11+ messages in thread
From: Holger Hoffstätte @ 2013-12-20 23:19 UTC (permalink / raw)
To: linux-btrfs
On Fri, 20 Dec 2013 15:17:46 +0000, Filipe David Borba Manana wrote:
[..patch..]
I applied this to 3.12.6 and was rewarded with:
*fs/btrfs/ctree.c: In function 'tree_mod_log_set_node_key':
*fs/btrfs/ctree.c:924:2: error: implicit declaration of function
'__tree_mod_log_insert_key' [-Werror=implicit-function-declaration]
* ret = __tree_mod_log_insert_key(fs_info, eb, slot,
* ^
*cc1: some warnings being treated as errors
*scripts/Makefile.build:308: recipe for target 'fs/btrfs/ctree.o' failed
Said function prototype is nowhere to be found, only the regular non-
scoped function. Did you just miss this call? Changing the call to the
remaining tree_mod_log_insert_key() looks like what was intended and
compiles, but I wanted to check before I burn the house down. :)
-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] Btrfs: fix tree mod logging
2013-12-20 23:19 ` Holger Hoffstätte
@ 2013-12-20 23:31 ` Filipe David Manana
2013-12-20 23:52 ` Holger Hoffstätte
0 siblings, 1 reply; 11+ messages in thread
From: Filipe David Manana @ 2013-12-20 23:31 UTC (permalink / raw)
To: Holger Hoffstätte; +Cc: linux-btrfs@vger.kernel.org
On Fri, Dec 20, 2013 at 11:19 PM, Holger Hoffstätte
<holger.hoffstaette@googlemail.com> wrote:
> On Fri, 20 Dec 2013 15:17:46 +0000, Filipe David Borba Manana wrote:
>
> [..patch..]
>
> I applied this to 3.12.6 and was rewarded with:
>
> *fs/btrfs/ctree.c: In function 'tree_mod_log_set_node_key':
> *fs/btrfs/ctree.c:924:2: error: implicit declaration of function
> '__tree_mod_log_insert_key' [-Werror=implicit-function-declaration]
> * ret = __tree_mod_log_insert_key(fs_info, eb, slot,
> * ^
> *cc1: some warnings being treated as errors
> *scripts/Makefile.build:308: recipe for target 'fs/btrfs/ctree.o' failed
>
> Said function prototype is nowhere to be found, only the regular non-
> scoped function. Did you just miss this call? Changing the call to the
> remaining tree_mod_log_insert_key() looks like what was intended and
> compiles, but I wanted to check before I burn the house down. :)
Did you apply first the patch titled "Btrfs: return immediately if
tree log mod is not necessary" ?
It depends on that one, and both are targeted against btrfs-next
repository from Josef. Didn't try 3.12.* at all.
Thanks
>
> -h
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] Btrfs: fix tree mod logging
2013-12-20 23:31 ` Filipe David Manana
@ 2013-12-20 23:52 ` Holger Hoffstätte
0 siblings, 0 replies; 11+ messages in thread
From: Holger Hoffstätte @ 2013-12-20 23:52 UTC (permalink / raw)
To: linux-btrfs
On Fri, 20 Dec 2013 23:31:44 +0000, Filipe David Manana wrote:
> On Fri, Dec 20, 2013 at 11:19 PM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
>> On Fri, 20 Dec 2013 15:17:46 +0000, Filipe David Borba Manana wrote:
>>
>> I applied this to 3.12.6 and was rewarded with:
>>
>> *fs/btrfs/ctree.c: In function 'tree_mod_log_set_node_key':
>
> Did you apply first the patch titled "Btrfs: return immediately if tree
> log mod is not necessary" ?
No, of course not. =)
Looking at that one makes things clear and applying it first fixed it.
Everything built fine and seems to work, no spontaneous combustions yet.
Thanks!
-h
^ permalink raw reply [flat|nested] 11+ messages in thread