linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arne Jansen <sensille@gmx.net>
To: Josef Bacik <jbacik@fusionio.com>
Cc: linux-btrfs@vger.kernel.org, daniel@quora.org,
	clmason@fusionio.com, list.btrfs@jan-o-sch.net
Subject: Re: [PATCH] Btrfs: allow delayed refs to be merged V2
Date: Mon, 16 Jul 2012 09:41:21 +0200	[thread overview]
Message-ID: <5003C5A1.20307@gmx.net> (raw)
In-Reply-To: <1342271390-29507-1-git-send-email-jbacik@fusionio.com>

On 14.07.2012 15:09, Josef Bacik wrote:
> Daniel Blueman reported a bug with fio+balance on a ramdisk setup.
> Basically what happens is the balance relocates a tree block which will drop
> the implicit refs for all of its children and adds a full backref.  Once the
> block is relocated we have to add the implicit refs back, so when we cow the
> block again we add the implicit refs for its children back.  The problem
> comes when the original drop ref doesn't get run before we add the implicit
> refs back.  The delayed ref stuff will specifically prefer ADD operations
> over DROP to keep us from freeing up an extent that will have references to
> it, so we try to add the implicit ref before it is actually removed and we
> panic.  This worked fine before because the add would have just canceled the
> drop out and we would have been fine.  But the backref walking work needs to
> be able to freeze the delayed ref stuff in time so we have this ever
> increasing sequence number that gets attached to all new delayed ref updates
> which makes us not merge refs and we run into this issue.
> 
> So to fix this we need to merge delayed refs.  So everytime we run a
> clustered ref we need to try and merge all of its delayed refs.  The backref
> walking stuff locks the delayed ref head before processing, so if we have it
> locked we are safe to merge any refs inside of the sequence number.  If
> there is no sequence number we can merge all refs.  Doing this not only
> fixes our bug but keeps the delayed ref code from adding and removing
> useless refs and batching together multiple refs into one search instead of
> one search per delayed ref, which will really help our commit times.  I ran
> this with Daniels test and 276 and I haven't seen any problems.  Thanks,
> 
> Reported-by: Daniel J Blueman <daniel@quora.org>
> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> ---
> V1->V2: 
> -Merge all extents, don't do this weird sequence check at the front, just do it
> all when we run the delayed refs.
> -Merge like actions so we can get the performance boost of multiple ref mods at
> the same time

This one looks better. It gives back the merging capability without too much
fiddling with sequences. Although I still think something else is fishy when
merges are needed for correct functionality, this patch seems to fix it while
having additional benefits. Thanks for working on it :)

Acked-by: Arne Jansen <sensille@gmx.net>

>  fs/btrfs/delayed-ref.c |  152 +++++++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/delayed-ref.h |    3 +
>  fs/btrfs/extent-tree.c |    9 +++
>  3 files changed, 137 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 13ae7b0..324ccec 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -38,17 +38,14 @@
>  static int comp_tree_refs(struct btrfs_delayed_tree_ref *ref2,
>  			  struct btrfs_delayed_tree_ref *ref1)
>  {
> -	if (ref1->node.type == BTRFS_TREE_BLOCK_REF_KEY) {
> -		if (ref1->root < ref2->root)
> -			return -1;
> -		if (ref1->root > ref2->root)
> -			return 1;
> -	} else {
> -		if (ref1->parent < ref2->parent)
> -			return -1;
> -		if (ref1->parent > ref2->parent)
> -			return 1;
> -	}
> +	if (ref1->root < ref2->root)
> +		return -1;
> +	if (ref1->root > ref2->root)
> +		return 1;
> +	if (ref1->parent < ref2->parent)
> +		return -1;
> +	if (ref1->parent > ref2->parent)
> +		return 1;
>  	return 0;
>  }
>  
> @@ -85,7 +82,8 @@ static int comp_data_refs(struct btrfs_delayed_data_ref *ref2,
>   * type of the delayed backrefs and content of delayed backrefs.
>   */
>  static int comp_entry(struct btrfs_delayed_ref_node *ref2,
> -		      struct btrfs_delayed_ref_node *ref1)
> +		      struct btrfs_delayed_ref_node *ref1,
> +		      bool compare_seq)
>  {
>  	if (ref1->bytenr < ref2->bytenr)
>  		return -1;
> @@ -102,10 +100,12 @@ static int comp_entry(struct btrfs_delayed_ref_node *ref2,
>  	if (ref1->type > ref2->type)
>  		return 1;
>  	/* merging of sequenced refs is not allowed */
> -	if (ref1->seq < ref2->seq)
> -		return -1;
> -	if (ref1->seq > ref2->seq)
> -		return 1;
> +	if (compare_seq) {
> +		if (ref1->seq < ref2->seq)
> +			return -1;
> +		if (ref1->seq > ref2->seq)
> +			return 1;
> +	}
>  	if (ref1->type == BTRFS_TREE_BLOCK_REF_KEY ||
>  	    ref1->type == BTRFS_SHARED_BLOCK_REF_KEY) {
>  		return comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref2),
> @@ -139,7 +139,7 @@ static struct btrfs_delayed_ref_node *tree_insert(struct rb_root *root,
>  		entry = rb_entry(parent_node, struct btrfs_delayed_ref_node,
>  				 rb_node);
>  
> -		cmp = comp_entry(entry, ins);
> +		cmp = comp_entry(entry, ins, 1);
>  		if (cmp < 0)
>  			p = &(*p)->rb_left;
>  		else if (cmp > 0)
> @@ -233,6 +233,111 @@ int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans,
>  	return 0;
>  }
>  
> +static void inline drop_delayed_ref(struct btrfs_trans_handle *trans,
> +				    struct btrfs_delayed_ref_root *delayed_refs,
> +				    struct btrfs_delayed_ref_node *ref)
> +{
> +	rb_erase(&ref->rb_node, &delayed_refs->root);
> +	ref->in_tree = 0;
> +	btrfs_put_delayed_ref(ref);
> +	delayed_refs->num_entries--;
> +	if (trans->delayed_ref_updates)
> +		trans->delayed_ref_updates--;
> +}
> +
> +static int merge_ref(struct btrfs_trans_handle *trans,
> +		     struct btrfs_delayed_ref_root *delayed_refs,
> +		     struct btrfs_delayed_ref_node *ref, u64 seq)
> +{
> +	struct rb_node *node;
> +	int merged = 0;
> +	int mod = 0;
> +	int done = 0;
> +
> +	node = rb_prev(&ref->rb_node);
> +	while (node) {
> +		struct btrfs_delayed_ref_node *next;
> +
> +		next = rb_entry(node, struct btrfs_delayed_ref_node, rb_node);
> +		node = rb_prev(node);
> +		if (next->bytenr != ref->bytenr)
> +			break;
> +		if (seq && next->seq >= seq)
> +			break;
> +		if (comp_entry(ref, next, 0))
> +			continue;
> +
> +		if (ref->action == next->action) {
> +			mod = next->ref_mod;
> +		} else {
> +			if (ref->ref_mod < next->ref_mod) {
> +				struct btrfs_delayed_ref_node *tmp;
> +
> +				tmp = ref;
> +				ref = next;
> +				next = tmp;
> +				done = 1;
> +			}
> +			mod = -next->ref_mod;
> +		}
> +
> +		merged++;
> +		drop_delayed_ref(trans, delayed_refs, next);
> +		ref->ref_mod += mod;
> +		if (ref->ref_mod == 0) {
> +			drop_delayed_ref(trans, delayed_refs, ref);
> +			break;
> +		} else {
> +			/*
> +			 * You can't have multiples of the same ref on a tree
> +			 * block.
> +			 */
> +			WARN_ON(ref->type == BTRFS_TREE_BLOCK_REF_KEY ||
> +				ref->type == BTRFS_SHARED_BLOCK_REF_KEY);
> +		}
> +
> +		if (done)
> +			break;
> +		node = rb_prev(&ref->rb_node);
> +	}
> +
> +	return merged;
> +}
> +
> +void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
> +			      struct btrfs_delayed_ref_root *delayed_refs,
> +			      struct btrfs_delayed_ref_head *head)
> +{
> +	struct rb_node *node;
> +	u64 seq = 0;
> +
> +	if (!list_empty(&delayed_refs->seq_head)) {
> +		struct seq_list *elem;
> +
> +		elem = list_first_entry(&delayed_refs->seq_head,
> +					struct seq_list, list);
> +		seq = elem->seq;
> +	}
> +
> +	node = rb_prev(&head->node.rb_node);
> +	while (node) {
> +		struct btrfs_delayed_ref_node *ref;
> +
> +		ref = rb_entry(node, struct btrfs_delayed_ref_node,
> +			       rb_node);
> +		if (ref->bytenr != head->node.bytenr)
> +			break;
> +
> +		/* We can't merge refs that are outside of our seq count */
> +		if (seq && ref->seq >= seq)
> +			break;
> +		if (merge_ref(trans, delayed_refs, ref, seq))
> +			node = rb_prev(&head->node.rb_node);
> +		else
> +			node = rb_prev(node);
> +	}
> +}
> +
>  int btrfs_check_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs,
>  			    u64 seq)
>  {
> @@ -332,18 +437,11 @@ update_existing_ref(struct btrfs_trans_handle *trans,
>  		 * every changing the extent allocation tree.
>  		 */
>  		existing->ref_mod--;
> -		if (existing->ref_mod == 0) {
> -			rb_erase(&existing->rb_node,
> -				 &delayed_refs->root);
> -			existing->in_tree = 0;
> -			btrfs_put_delayed_ref(existing);
> -			delayed_refs->num_entries--;
> -			if (trans->delayed_ref_updates)
> -				trans->delayed_ref_updates--;
> -		} else {
> +		if (existing->ref_mod == 0)
> +			drop_delayed_ref(trans, delayed_refs, existing);
> +		else
>  			WARN_ON(existing->type == BTRFS_TREE_BLOCK_REF_KEY ||
>  				existing->type == BTRFS_SHARED_BLOCK_REF_KEY);
> -		}
>  	} else {
>  		WARN_ON(existing->type == BTRFS_TREE_BLOCK_REF_KEY ||
>  			existing->type == BTRFS_SHARED_BLOCK_REF_KEY);
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index 413927f..9a3573c 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -187,6 +187,9 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>  				struct btrfs_trans_handle *trans,
>  				u64 bytenr, u64 num_bytes,
>  				struct btrfs_delayed_extent_op *extent_op);
> +void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
> +			      struct btrfs_delayed_ref_root *delayed_refs,
> +			      struct btrfs_delayed_ref_head *head);
>  
>  struct btrfs_delayed_ref_head *
>  btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 6621ed7..6db511b 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2249,6 +2249,15 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans,
>  		}
>  
>  		/*
> +		 * We need to try and merge add/drops of the same ref since we
> +		 * can run into issues with relocate dropping the implicit ref
> +		 * and then it being added back again before the drop can
> +		 * finish.  If we merged anything we need to re-loop so we can
> +		 * get a good ref.
> +		 */
> +		btrfs_merge_delayed_refs(trans, delayed_refs, locked_ref);
> +
> +		/*
>  		 * locked_ref is the head node, so we have to go one
>  		 * node back for any delayed ref updates
>  		 */


  reply	other threads:[~2012-07-16  7:41 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-14 13:09 [PATCH] Btrfs: allow delayed refs to be merged V2 Josef Bacik
2012-07-16  7:41 ` Arne Jansen [this message]
2012-07-16  9:43   ` Arne Jansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5003C5A1.20307@gmx.net \
    --to=sensille@gmx.net \
    --cc=clmason@fusionio.com \
    --cc=daniel@quora.org \
    --cc=jbacik@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=list.btrfs@jan-o-sch.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).