linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: throttle delayed refs better
Date: Fri, 24 Jan 2014 15:34:48 +0800	[thread overview]
Message-ID: <20140124073448.GC31638@localhost.localdomain> (raw)
In-Reply-To: <1390500472-15144-1-git-send-email-jbacik@fb.com>

On Thu, Jan 23, 2014 at 01:07:52PM -0500, Josef Bacik wrote:
> On one of our gluster clusters we noticed some pretty big lag spikes.  This
> turned out to be because our transaction commit was taking like 3 minutes to
> complete.  This is because we have like 30 gigs of metadata, so our global
> reserve would end up being the max which is like 512 mb.  So our throttling code
> would allow a ridiculous amount of delayed refs to build up and then they'd all
> get run at transaction commit time, and for a cold mounted file system that
> could take up to 3 minutes to run.  So fix the throttling to be based on both
> the size of the global reserve and how long it takes us to run delayed refs.
> This patch tracks the time it takes to run delayed refs and then only allows 1
> seconds worth of outstanding delayed refs at a time.  This way it will auto-tune
> itself from cold cache up to when everything is in memory and it no longer has
> to go to disk.  This makes our transaction commits take much less time to run.
> Thanks,

Which version of btrfs is the patch made for?

I checked the code and it doesn't seem to be btrfs-next, either...we don't
have a __btrfs_run_delayed_refs(). 

-liubo

> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/ctree.h       |  3 +++
>  fs/btrfs/disk-io.c     |  2 +-
>  fs/btrfs/extent-tree.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/transaction.c |  4 ++--
>  4 files changed, 46 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 3cebb4a..ca6bcc3 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1360,6 +1360,7 @@ struct btrfs_fs_info {
>  
>  	u64 generation;
>  	u64 last_trans_committed;
> +	u64 avg_delayed_ref_runtime;
>  
>  	/*
>  	 * this is updated to the current trans every time a full commit
> @@ -3172,6 +3173,8 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_root *root,
>  
>  int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
>  				       struct btrfs_root *root);
> +int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
> +				       struct btrfs_root *root);
>  void btrfs_put_block_group(struct btrfs_block_group_cache *cache);
>  int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>  			   struct btrfs_root *root, unsigned long count);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index ed23127..f0e7bbe 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2185,7 +2185,7 @@ int open_ctree(struct super_block *sb,
>  	fs_info->free_chunk_space = 0;
>  	fs_info->tree_mod_log = RB_ROOT;
>  	fs_info->commit_interval = BTRFS_DEFAULT_COMMIT_INTERVAL;
> -
> +	fs_info->avg_delayed_ref_runtime = div64_u64(NSEC_PER_SEC, 64);
>  	/* readahead state */
>  	INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_WAIT);
>  	spin_lock_init(&fs_info->reada_lock);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index c77156c..b532259 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2322,8 +2322,10 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>  	struct btrfs_delayed_ref_head *locked_ref = NULL;
>  	struct btrfs_delayed_extent_op *extent_op;
>  	struct btrfs_fs_info *fs_info = root->fs_info;
> +	ktime_t start = ktime_get();
>  	int ret;
>  	unsigned long count = 0;
> +	unsigned long actual_count = 0;
>  	int must_insert_reserved = 0;
>  
>  	delayed_refs = &trans->transaction->delayed_refs;
> @@ -2452,6 +2454,7 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>  				 &delayed_refs->href_root);
>  			spin_unlock(&delayed_refs->lock);
>  		} else {
> +			actual_count++;
>  			ref->in_tree = 0;
>  			rb_erase(&ref->rb_node, &locked_ref->ref_root);
>  		}
> @@ -2502,6 +2505,26 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>  		count++;
>  		cond_resched();
>  	}
> +
> +	/*
> +	 * We don't want to include ref heads since we can have empty ref heads
> +	 * and those will drastically skew our runtime down since we just do
> +	 * accounting, no actual extent tree updates.
> +	 */
> +	if (actual_count > 0) {
> +		u64 runtime = ktime_to_ns(ktime_sub(ktime_get(), start));
> +		u64 avg;
> +
> +		/*
> +		 * We weigh the current average higher than our current runtime
> +		 * to avoid large swings in the average.
> +		 */
> +		spin_lock(&delayed_refs->lock);
> +		avg = fs_info->avg_delayed_ref_runtime * 3 + runtime;
> +		avg = div64_u64(avg, 4);
> +		fs_info->avg_delayed_ref_runtime = avg;
> +		spin_unlock(&delayed_refs->lock);
> +	}
>  	return 0;
>  }
>  
> @@ -2600,7 +2623,7 @@ static inline u64 heads_to_leaves(struct btrfs_root *root, u64 heads)
>  	return div64_u64(num_bytes, BTRFS_LEAF_DATA_SIZE(root));
>  }
>  
> -int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
> +int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
>  				       struct btrfs_root *root)
>  {
>  	struct btrfs_block_rsv *global_rsv;
> @@ -2629,6 +2652,22 @@ int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
> +				       struct btrfs_root *root)
> +{
> +	struct btrfs_fs_info *fs_info = root->fs_info;
> +	u64 num_entries =
> +		atomic_read(&trans->transaction->delayed_refs.num_entries);
> +	u64 avg_runtime;
> +
> +	smp_mb();
> +	avg_runtime = fs_info->avg_delayed_ref_runtime;
> +	if (num_entries * avg_runtime >= NSEC_PER_SEC)
> +		return 1;
> +
> +	return btrfs_check_space_for_delayed_refs(trans, root);
> +}
> +
>  /*
>   * this starts processing the delayed reference count updates and
>   * extent insertions we have queued up so far.  count can be
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index fd14464..5e2bfda 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -645,7 +645,7 @@ static int should_end_transaction(struct btrfs_trans_handle *trans,
>  				  struct btrfs_root *root)
>  {
>  	if (root->fs_info->global_block_rsv.space_info->full &&
> -	    btrfs_should_throttle_delayed_refs(trans, root))
> +	    btrfs_check_space_for_delayed_refs(trans, root))
>  		return 1;
>  
>  	return !!btrfs_block_rsv_check(root, &root->fs_info->global_block_rsv, 5);
> @@ -710,7 +710,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>  
>  	trans->delayed_ref_updates = 0;
>  	if (!trans->sync && btrfs_should_throttle_delayed_refs(trans, root)) {
> -		cur = max_t(unsigned long, cur, 1);
> +		cur = max_t(unsigned long, cur, 32);
>  		trans->delayed_ref_updates = 0;
>  		btrfs_run_delayed_refs(trans, root, cur);
>  	}
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-01-24  7:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-23 18:07 [PATCH] Btrfs: throttle delayed refs better Josef Bacik
2014-01-24  7:34 ` Liu Bo [this message]
2014-01-24 14:53   ` Josef Bacik
2014-02-03 18:28 ` Johannes Hirte
2014-02-03 21:08   ` Josef Bacik
2014-02-03 22:53     ` Johannes Hirte
2014-02-04 14:12       ` Josef Bacik
2014-02-05  8:14         ` Johannes Hirte
2014-02-05 15:49           ` Josef Bacik
2014-02-05 17:34             ` Johannes Hirte
2014-02-05 19:00               ` Josef Bacik
2014-02-05 19:30                 ` Johannes Hirte
2014-02-05 19:36                   ` Josef Bacik
2014-02-05 21:42                     ` Johannes Hirte
2014-02-05 21:46                       ` Josef Bacik
2014-02-05 22:57                         ` Johannes Hirte
2014-02-06 15:14                           ` Josef Bacik
2014-02-06 21:19                           ` Josef Bacik
2014-02-14 19:25                             ` Johannes Hirte
2014-02-14 19:29                               ` Josef Bacik
2014-02-15 17:42                                 ` Johannes Hirte
2014-02-05 22:22                       ` Josef Bacik
2014-02-27 15:38 ` 钱凯
2014-02-27 15:56   ` Josef Bacik
2015-10-14 15:46     ` Alex Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140124073448.GC31638@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).