linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goldwyn Rodrigues <rgoldwyn@suse.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: mfasheh@suse.de, linux-btrfs@vger.kernel.org, fdmanana@gmail.com
Subject: Re: btrfs: relocation: Fix leaking qgroups numbers on data extents
Date: Fri, 29 Jul 2016 13:42:35 -0500	[thread overview]
Message-ID: <20160729184235.GA3535@merlin.lan> (raw)
In-Reply-To: <20160614092622.15560-1-quwenruo@cn.fujitsu.com>

On Tue, Jun 14, 2016 at 05:26:22PM +0800, Qu Wenruo wrote:
> When balancing data extents, qgroup will leak all its numbers for
> balanced data extents.
> 
> The root cause is the non-standard extent reference update used in
> balance code.
> 
> The problem happens in the following steps:
> (Use 4M as original data extent size, and 257 as src root objectid)
> 
> 1) Balance create new data extents and increase its refs
> 
> Balance will alloc new data extent and create new EXTENT_DATA in data
> reloc tree, while its refernece is increased with 2 different
> referencer: 257 and data reloc tree.
> 
> While at that time, file tree is still referring to old extents.
> 
> Extent bytenr   |    Real referencer    |     backrefs     |
> ------------------------------------------------------------
> New             | Data reloc            | Data reloc + 257 | << Mismatch
> Old             | 257                   | 257              |
> 
> Qgroup number: 4M + metadata
> 
> 2) Commit trans before merging reloc tree
> Then we goes to prepare_to_merge(), which will commit transacation.
> 
> In the qgroup update codes inside commit_transaction, although backref
> walk codes find the new data extents has 2 backref, but file tree
> backref can't find referencer(file tree is still referring to old
> extents), and data reloc doesn't count as file tree.
> 
> Extent bytenr   | nr_old_roots | nr_new_roots | qgroup change |
> --------------------------------------------------------------|
> New             | 0            | 0            | 0             |
> Old             | 1            | 1            | 0             |
> 
> Qgroup number: 4M + metadata +-0 = 4M + metadata
> 
> 3) Swap tree blocks and free old tree blocks
> Then we goes to merge_reloc_roots(), which swaps the tree blocks
> directly, and free the old tree blocks.
> Freeing tree blocks will also free its data extents, this goes through
> normal routine, and qgroup handles it well, decreasing the numbers.
> 
> And since new data extent is not updated here (updated in step 1), so
> qgroup won't scan new data extent.
> 
> Extent bytenr   | nr_old_roots | nr_new_roots | qgroup change |
> --------------------------------------------------------------|
> New             |-No modification, doesn't go through qgroup--|
> Old             | 1            | 0            | -4M           |
> 
> Qgroup number: 4M + metadata -4M = metadata
> 
> This patch will fix it by re-dirtying new extent at step 3), so backref
> walk and qgroup can get correct result.
> 
> And thanks to the new qgroup framework, we don't need to check whether
> it is needed to dirty some file extents. Even some unrelated extents are
> re-dirtied, qgroup can handle it quite well.
> 
> So we only need to ensure we don't miss some extents.
> 
> Reported-by: Mark Fasheh <mfasheh@suse.de>
> Reported-by: Filipe Manana <fdmanana@gmail.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>


> ---
>  fs/btrfs/relocation.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0477dca..f1d696d 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -31,6 +31,7 @@
>  #include "async-thread.h"
>  #include "free-space-cache.h"
>  #include "inode-map.h"
> +#include "qgroup.h"
>  
>  /*
>   * backref_node, mapping_node and tree_block start with this
> @@ -1750,6 +1751,78 @@ int memcmp_node_keys(struct extent_buffer *eb, int slot,
>  }
>  
>  /*
> + * Helper function to fixup screwed qgroups caused by increased extent ref,
> + * which doesn't follow normal extent ref update behavior.
> + * (Correct behavior is, increase extent ref and modify source root in
> + *  one trans)
> + * No better solution as long as we're doing swapping trick to do balance.
> + */
> +static int qgroup_redirty_data_extents(struct btrfs_trans_handle *trans,
> +				       struct btrfs_root *root, u64 bytenr,
> +				       u64 gen)
> +{
> +	struct btrfs_fs_info *fs_info = root->fs_info;
> +	struct extent_buffer *leaf;
> +	struct btrfs_delayed_ref_root *delayed_refs;
> +	int slot;
> +	int ret = 0;
> +
> +	if (!fs_info->quota_enabled || !is_fstree(root->objectid))
> +		return 0;
> +	if (WARN_ON(!trans))
> +		return -EINVAL;
> +
> +	delayed_refs = &trans->transaction->delayed_refs;
> +
> +	leaf = read_tree_block(root, bytenr, gen);
> +	if (IS_ERR(leaf)) {
> +		return PTR_ERR(leaf);
> +	} else if (!extent_buffer_uptodate(leaf)) {
> +		ret = -EIO;
> +		goto out;
> +	}
> +
> +	/* We only care leaf, which may contains EXTENT_DATA */
> +	if (btrfs_header_level(leaf) != 0)
> +		goto out;
> +
> +	for (slot = 0; slot < btrfs_header_nritems(leaf); slot++) {
> +		struct btrfs_key key;
> +		struct btrfs_file_extent_item *fi;
> +		struct btrfs_qgroup_extent_record *record;
> +		struct btrfs_qgroup_extent_record *exist;
> +
> +		btrfs_item_key_to_cpu(leaf, &key, slot);
> +		if (key.type != BTRFS_EXTENT_DATA_KEY)
> +			continue;
> +		fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
> +		if (btrfs_file_extent_type(leaf, fi) ==
> +		    BTRFS_FILE_EXTENT_INLINE ||
> +		    btrfs_file_extent_disk_bytenr(leaf, fi) == 0)
> +			continue;
> +
> +		record = kmalloc(sizeof(*record), GFP_NOFS);
> +		if (!record) {
> +			ret = -ENOMEM;
> +			break;
> +		}
> +		record->bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
> +		record->num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
> +		record->old_roots = NULL;
> +
> +		spin_lock(&delayed_refs->lock);
> +		exist = btrfs_qgroup_insert_dirty_extent(delayed_refs, record);
> +		spin_unlock(&delayed_refs->lock);
> +		if (exist)
> +			kfree(record);
> +	}
> +out:
> +	free_extent_buffer(leaf);
> +	return ret;
> +
> +}
> +
> +/*
>   * try to replace tree blocks in fs tree with the new blocks
>   * in reloc tree. tree blocks haven't been modified since the
>   * reloc tree was create can be replaced.
> @@ -1919,7 +1992,28 @@ again:
>  					0);
>  		BUG_ON(ret);
>  
> +		/*
> +		 * Fix up the screwed up qgroups
> +		 *
> +		 * For tree blocks with extent data, new data extent's
> +		 * backref is already increased with both file tree and data
> +		 * reloc tree.
> +		 * While trans is committed before we modify file tree.
> +		 *
> +		 * This makes qgroup can't account new data extents well,
> +		 * as the file tree is still referring to old extents, not
> +		 * new extents, backref walk will find the new extent only
> +		 * referred by data reloc tree.
> +		 * So qgroup is screwed up and didn't increase its ref/excl.
> +		 *
> +		 * Fix it up by re-dirtying qgroup record for data extents in
> +		 * new tree blocks
> +		 */
> +		ret = qgroup_redirty_data_extents(trans, dest, new_bytenr,
> +						  new_ptr_gen);
>  		btrfs_unlock_up_safe(path, 0);
> +		if (ret < 0)
> +			break;
>  
>  		ret = level;
>  		break;

  reply	other threads:[~2016-07-29 18:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-14  9:26 [PATCH] btrfs: relocation: Fix leaking qgroups numbers on data extents Qu Wenruo
2016-07-29 18:42 ` Goldwyn Rodrigues [this message]
2016-07-30  1:06   ` Qu Wenruo
2016-07-30 15:57     ` Goldwyn Rodrigues
2016-07-31  1:47       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160729184235.GA3535@merlin.lan \
    --to=rgoldwyn@suse.de \
    --cc=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@suse.de \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).