From: Goldwyn Rodrigues <rgoldwyn@suse.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: mfasheh@suse.de, linux-btrfs@vger.kernel.org, fdmanana@gmail.com
Subject: Re: btrfs: relocation: Fix leaking qgroups numbers on data extents
Date: Fri, 29 Jul 2016 13:42:35 -0500 [thread overview]
Message-ID: <20160729184235.GA3535@merlin.lan> (raw)
In-Reply-To: <20160614092622.15560-1-quwenruo@cn.fujitsu.com>
On Tue, Jun 14, 2016 at 05:26:22PM +0800, Qu Wenruo wrote:
> When balancing data extents, qgroup will leak all its numbers for
> balanced data extents.
>
> The root cause is the non-standard extent reference update used in
> balance code.
>
> The problem happens in the following steps:
> (Use 4M as original data extent size, and 257 as src root objectid)
>
> 1) Balance create new data extents and increase its refs
>
> Balance will alloc new data extent and create new EXTENT_DATA in data
> reloc tree, while its refernece is increased with 2 different
> referencer: 257 and data reloc tree.
>
> While at that time, file tree is still referring to old extents.
>
> Extent bytenr | Real referencer | backrefs |
> ------------------------------------------------------------
> New | Data reloc | Data reloc + 257 | << Mismatch
> Old | 257 | 257 |
>
> Qgroup number: 4M + metadata
>
> 2) Commit trans before merging reloc tree
> Then we goes to prepare_to_merge(), which will commit transacation.
>
> In the qgroup update codes inside commit_transaction, although backref
> walk codes find the new data extents has 2 backref, but file tree
> backref can't find referencer(file tree is still referring to old
> extents), and data reloc doesn't count as file tree.
>
> Extent bytenr | nr_old_roots | nr_new_roots | qgroup change |
> --------------------------------------------------------------|
> New | 0 | 0 | 0 |
> Old | 1 | 1 | 0 |
>
> Qgroup number: 4M + metadata +-0 = 4M + metadata
>
> 3) Swap tree blocks and free old tree blocks
> Then we goes to merge_reloc_roots(), which swaps the tree blocks
> directly, and free the old tree blocks.
> Freeing tree blocks will also free its data extents, this goes through
> normal routine, and qgroup handles it well, decreasing the numbers.
>
> And since new data extent is not updated here (updated in step 1), so
> qgroup won't scan new data extent.
>
> Extent bytenr | nr_old_roots | nr_new_roots | qgroup change |
> --------------------------------------------------------------|
> New |-No modification, doesn't go through qgroup--|
> Old | 1 | 0 | -4M |
>
> Qgroup number: 4M + metadata -4M = metadata
>
> This patch will fix it by re-dirtying new extent at step 3), so backref
> walk and qgroup can get correct result.
>
> And thanks to the new qgroup framework, we don't need to check whether
> it is needed to dirty some file extents. Even some unrelated extents are
> re-dirtied, qgroup can handle it quite well.
>
> So we only need to ensure we don't miss some extents.
>
> Reported-by: Mark Fasheh <mfasheh@suse.de>
> Reported-by: Filipe Manana <fdmanana@gmail.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
> fs/btrfs/relocation.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 94 insertions(+)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0477dca..f1d696d 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -31,6 +31,7 @@
> #include "async-thread.h"
> #include "free-space-cache.h"
> #include "inode-map.h"
> +#include "qgroup.h"
>
> /*
> * backref_node, mapping_node and tree_block start with this
> @@ -1750,6 +1751,78 @@ int memcmp_node_keys(struct extent_buffer *eb, int slot,
> }
>
> /*
> + * Helper function to fixup screwed qgroups caused by increased extent ref,
> + * which doesn't follow normal extent ref update behavior.
> + * (Correct behavior is, increase extent ref and modify source root in
> + * one trans)
> + * No better solution as long as we're doing swapping trick to do balance.
> + */
> +static int qgroup_redirty_data_extents(struct btrfs_trans_handle *trans,
> + struct btrfs_root *root, u64 bytenr,
> + u64 gen)
> +{
> + struct btrfs_fs_info *fs_info = root->fs_info;
> + struct extent_buffer *leaf;
> + struct btrfs_delayed_ref_root *delayed_refs;
> + int slot;
> + int ret = 0;
> +
> + if (!fs_info->quota_enabled || !is_fstree(root->objectid))
> + return 0;
> + if (WARN_ON(!trans))
> + return -EINVAL;
> +
> + delayed_refs = &trans->transaction->delayed_refs;
> +
> + leaf = read_tree_block(root, bytenr, gen);
> + if (IS_ERR(leaf)) {
> + return PTR_ERR(leaf);
> + } else if (!extent_buffer_uptodate(leaf)) {
> + ret = -EIO;
> + goto out;
> + }
> +
> + /* We only care leaf, which may contains EXTENT_DATA */
> + if (btrfs_header_level(leaf) != 0)
> + goto out;
> +
> + for (slot = 0; slot < btrfs_header_nritems(leaf); slot++) {
> + struct btrfs_key key;
> + struct btrfs_file_extent_item *fi;
> + struct btrfs_qgroup_extent_record *record;
> + struct btrfs_qgroup_extent_record *exist;
> +
> + btrfs_item_key_to_cpu(leaf, &key, slot);
> + if (key.type != BTRFS_EXTENT_DATA_KEY)
> + continue;
> + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
> + if (btrfs_file_extent_type(leaf, fi) ==
> + BTRFS_FILE_EXTENT_INLINE ||
> + btrfs_file_extent_disk_bytenr(leaf, fi) == 0)
> + continue;
> +
> + record = kmalloc(sizeof(*record), GFP_NOFS);
> + if (!record) {
> + ret = -ENOMEM;
> + break;
> + }
> + record->bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
> + record->num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
> + record->old_roots = NULL;
> +
> + spin_lock(&delayed_refs->lock);
> + exist = btrfs_qgroup_insert_dirty_extent(delayed_refs, record);
> + spin_unlock(&delayed_refs->lock);
> + if (exist)
> + kfree(record);
> + }
> +out:
> + free_extent_buffer(leaf);
> + return ret;
> +
> +}
> +
> +/*
> * try to replace tree blocks in fs tree with the new blocks
> * in reloc tree. tree blocks haven't been modified since the
> * reloc tree was create can be replaced.
> @@ -1919,7 +1992,28 @@ again:
> 0);
> BUG_ON(ret);
>
> + /*
> + * Fix up the screwed up qgroups
> + *
> + * For tree blocks with extent data, new data extent's
> + * backref is already increased with both file tree and data
> + * reloc tree.
> + * While trans is committed before we modify file tree.
> + *
> + * This makes qgroup can't account new data extents well,
> + * as the file tree is still referring to old extents, not
> + * new extents, backref walk will find the new extent only
> + * referred by data reloc tree.
> + * So qgroup is screwed up and didn't increase its ref/excl.
> + *
> + * Fix it up by re-dirtying qgroup record for data extents in
> + * new tree blocks
> + */
> + ret = qgroup_redirty_data_extents(trans, dest, new_bytenr,
> + new_ptr_gen);
> btrfs_unlock_up_safe(path, 0);
> + if (ret < 0)
> + break;
>
> ret = level;
> break;
next prev parent reply other threads:[~2016-07-29 18:42 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-14 9:26 [PATCH] btrfs: relocation: Fix leaking qgroups numbers on data extents Qu Wenruo
2016-07-29 18:42 ` Goldwyn Rodrigues [this message]
2016-07-30 1:06 ` Qu Wenruo
2016-07-30 15:57 ` Goldwyn Rodrigues
2016-07-31 1:47 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160729184235.GA3535@merlin.lan \
--to=rgoldwyn@suse.de \
--cc=fdmanana@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=mfasheh@suse.de \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).