linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Btrfs: truncate space reservation fixes
@ 2015-04-13 19:22 Chris Mason
  2015-04-13 19:22 ` [PATCH 1/5] Btrfs: account for crcs in delayed ref processing Chris Mason
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Chris Mason @ 2015-04-13 19:22 UTC (permalink / raw)
  To: linux-btrfs

One of the production workloads here at FB ends up creating and eventually
deleting very large files.  We were consistently hitting ENOSPC aborts
while trying to delete the files because there wasn't enough metadata
reserved to cover deleting CRCs or actually updating the block group
items on disk.

This patchset addresses these problems by adding crc items into the
math for delayed ref processing, and changing the truncate items loop
to reserve metadata more often.

It also solves a performance problem where we are constantly committing
the transaction in hopes of making enospc progress.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] Btrfs: account for crcs in delayed ref processing
  2015-04-13 19:22 [PATCH 0/5] Btrfs: truncate space reservation fixes Chris Mason
@ 2015-04-13 19:22 ` Chris Mason
  2015-04-13 19:22 ` [PATCH 2/5] Btrfs: refill block reserves during truncate Chris Mason
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2015-04-13 19:22 UTC (permalink / raw)
  To: linux-btrfs

From: Josef Bacik <jbacik@fb.com>

As we delete large extents, we end up doing huge amounts of COW in order
to delete the corresponding crcs.  This adds accounting so that we keep
track of that space and flushing of delayed refs so that we don't build
up too much delayed crc work.

This helps limit the delayed work that must be done at commit time and
tries to avoid ENOSPC aborts because the crcs eat all the global
reserves.

Signed-off-by: Chris Mason <clm@fb.com>
---
 fs/btrfs/delayed-ref.c | 22 ++++++++++++++++++++--
 fs/btrfs/delayed-ref.h | 10 ++++++++++
 fs/btrfs/extent-tree.c | 46 +++++++++++++++++++++++++++++++---------------
 fs/btrfs/inode.c       | 25 ++++++++++++++++++-------
 fs/btrfs/transaction.c |  4 ++++
 5 files changed, 83 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 6d16bea..8f8ed7d 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -489,11 +489,13 @@ update_existing_ref(struct btrfs_trans_handle *trans,
  * existing and update must have the same bytenr
  */
 static noinline void
-update_existing_head_ref(struct btrfs_delayed_ref_node *existing,
+update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
+			 struct btrfs_delayed_ref_node *existing,
 			 struct btrfs_delayed_ref_node *update)
 {
 	struct btrfs_delayed_ref_head *existing_ref;
 	struct btrfs_delayed_ref_head *ref;
+	int old_ref_mod;
 
 	existing_ref = btrfs_delayed_node_to_head(existing);
 	ref = btrfs_delayed_node_to_head(update);
@@ -541,7 +543,20 @@ update_existing_head_ref(struct btrfs_delayed_ref_node *existing,
 	 * only need the lock for this case cause we could be processing it
 	 * currently, for refs we just added we know we're a-ok.
 	 */
+	old_ref_mod = existing_ref->total_ref_mod;
 	existing->ref_mod += update->ref_mod;
+	existing_ref->total_ref_mod += update->ref_mod;
+
+	/*
+	 * If we are going to from a positive ref mod to a negative or vice
+	 * versa we need to make sure to adjust pending_csums accordingly.
+	 */
+	if (existing_ref->is_data) {
+		if (existing_ref->total_ref_mod >= 0 && old_ref_mod < 0)
+			delayed_refs->pending_csums -= existing->num_bytes;
+		if (existing_ref->total_ref_mod < 0 && old_ref_mod >= 0)
+			delayed_refs->pending_csums += existing->num_bytes;
+	}
 	spin_unlock(&existing_ref->lock);
 }
 
@@ -605,6 +620,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 	head_ref->is_data = is_data;
 	head_ref->ref_root = RB_ROOT;
 	head_ref->processing = 0;
+	head_ref->total_ref_mod = count_mod;
 
 	spin_lock_init(&head_ref->lock);
 	mutex_init(&head_ref->mutex);
@@ -614,7 +630,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 	existing = htree_insert(&delayed_refs->href_root,
 				&head_ref->href_node);
 	if (existing) {
-		update_existing_head_ref(&existing->node, ref);
+		update_existing_head_ref(delayed_refs, &existing->node, ref);
 		/*
 		 * we've updated the existing ref, free the newly
 		 * allocated ref
@@ -622,6 +638,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 		kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
 		head_ref = existing;
 	} else {
+		if (is_data && count_mod < 0)
+			delayed_refs->pending_csums += num_bytes;
 		delayed_refs->num_heads++;
 		delayed_refs->num_heads_ready++;
 		atomic_inc(&delayed_refs->num_entries);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index a764e23..5eb0892 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -88,6 +88,14 @@ struct btrfs_delayed_ref_head {
 	struct rb_node href_node;
 
 	struct btrfs_delayed_extent_op *extent_op;
+
+	/*
+	 * This is used to track the final ref_mod from all the refs associated
+	 * with this head ref, this is not adjusted as delayed refs are run,
+	 * this is meant to track if we need to do the csum accounting or not.
+	 */
+	int total_ref_mod;
+
 	/*
 	 * when a new extent is allocated, it is just reserved in memory
 	 * The actual extent isn't inserted into the extent allocation tree
@@ -138,6 +146,8 @@ struct btrfs_delayed_ref_root {
 	/* total number of head nodes ready for processing */
 	unsigned long num_heads_ready;
 
+	u64 pending_csums;
+
 	/*
 	 * set when the tree is flushing before a transaction commit,
 	 * used by the throttling code to decide if new updates need
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 41e5812..a6f88eb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2538,6 +2538,12 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
 		 * list before we release it.
 		 */
 		if (btrfs_delayed_ref_is_head(ref)) {
+			if (locked_ref->is_data &&
+			    locked_ref->total_ref_mod < 0) {
+				spin_lock(&delayed_refs->lock);
+				delayed_refs->pending_csums -= ref->num_bytes;
+				spin_unlock(&delayed_refs->lock);
+			}
 			btrfs_delayed_ref_unlock(locked_ref);
 			locked_ref = NULL;
 		}
@@ -2626,11 +2632,31 @@ static inline u64 heads_to_leaves(struct btrfs_root *root, u64 heads)
 	return div_u64(num_bytes, BTRFS_LEAF_DATA_SIZE(root));
 }
 
+/*
+ * Takes the number of bytes to be csumm'ed and figures out how many leaves it
+ * would require to store the csums for that many bytes.
+ */
+static u64 csum_bytes_to_leaves(struct btrfs_root *root, u64 csum_bytes)
+{
+	u64 csum_size;
+	u64 num_csums_per_leaf;
+	u64 num_csums;
+
+	csum_size = BTRFS_LEAF_DATA_SIZE(root) - sizeof(struct btrfs_item);
+	num_csums_per_leaf = div64_u64(csum_size,
+			(u64)btrfs_super_csum_size(root->fs_info->super_copy));
+	num_csums = div64_u64(csum_bytes, root->sectorsize);
+	num_csums += num_csums_per_leaf - 1;
+	num_csums = div64_u64(num_csums, num_csums_per_leaf);
+	return num_csums;
+}
+
 int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
 				       struct btrfs_root *root)
 {
 	struct btrfs_block_rsv *global_rsv;
 	u64 num_heads = trans->transaction->delayed_refs.num_heads_ready;
+	u64 csum_bytes = trans->transaction->delayed_refs.pending_csums;
 	u64 num_bytes;
 	int ret = 0;
 
@@ -2639,6 +2665,7 @@ int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
 	if (num_heads > 1)
 		num_bytes += (num_heads - 1) * root->nodesize;
 	num_bytes <<= 1;
+	num_bytes += csum_bytes_to_leaves(root, csum_bytes) * root->nodesize;
 	global_rsv = &root->fs_info->global_block_rsv;
 
 	/*
@@ -5065,30 +5092,19 @@ static u64 calc_csum_metadata_size(struct inode *inode, u64 num_bytes,
 				   int reserve)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	u64 csum_size;
-	int num_csums_per_leaf;
-	int num_csums;
-	int old_csums;
+	u64 old_csums, num_csums;
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM &&
 	    BTRFS_I(inode)->csum_bytes == 0)
 		return 0;
 
-	old_csums = (int)div_u64(BTRFS_I(inode)->csum_bytes, root->sectorsize);
+	old_csums = csum_bytes_to_leaves(root, BTRFS_I(inode)->csum_bytes);
+
 	if (reserve)
 		BTRFS_I(inode)->csum_bytes += num_bytes;
 	else
 		BTRFS_I(inode)->csum_bytes -= num_bytes;
-	csum_size = BTRFS_LEAF_DATA_SIZE(root) - sizeof(struct btrfs_item);
-	num_csums_per_leaf = (int)div_u64(csum_size,
-					    sizeof(struct btrfs_csum_item) +
-					    sizeof(struct btrfs_disk_key));
-	num_csums = (int)div_u64(BTRFS_I(inode)->csum_bytes, root->sectorsize);
-	num_csums = num_csums + num_csums_per_leaf - 1;
-	num_csums = num_csums / num_csums_per_leaf;
-
-	old_csums = old_csums + num_csums_per_leaf - 1;
-	old_csums = old_csums / num_csums_per_leaf;
+	num_csums = csum_bytes_to_leaves(root, BTRFS_I(inode)->csum_bytes);
 
 	/* No change, no need to reserve more */
 	if (old_csums == num_csums)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e3fe137f..cec23cf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4197,9 +4197,10 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	int extent_type = -1;
 	int ret;
 	int err = 0;
-	int be_nice = 0;
 	u64 ino = btrfs_ino(inode);
 	u64 bytes_deleted = 0;
+	bool be_nice = 0;
+	bool should_throttle = 0;
 
 	BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
 
@@ -4405,19 +4406,20 @@ delete:
 						btrfs_header_owner(leaf),
 						ino, extent_offset, 0);
 			BUG_ON(ret);
-			if (be_nice && pending_del_nr &&
-			    (pending_del_nr % 16 == 0) &&
-			    bytes_deleted > 1024 * 1024) {
+			if (btrfs_should_throttle_delayed_refs(trans, root))
 				btrfs_async_run_delayed_refs(root,
 					trans->delayed_ref_updates * 2, 0);
-			}
 		}
 
 		if (found_type == BTRFS_INODE_ITEM_KEY)
 			break;
 
+		should_throttle =
+			btrfs_should_throttle_delayed_refs(trans, root);
+
 		if (path->slots[0] == 0 ||
-		    path->slots[0] != pending_del_slot) {
+		    path->slots[0] != pending_del_slot ||
+		    (be_nice && should_throttle)) {
 			if (pending_del_nr) {
 				ret = btrfs_del_items(trans, root, path,
 						pending_del_slot,
@@ -4430,6 +4432,15 @@ delete:
 				pending_del_nr = 0;
 			}
 			btrfs_release_path(path);
+			if (be_nice && should_throttle) {
+				unsigned long updates = trans->delayed_ref_updates;
+				if (updates) {
+					trans->delayed_ref_updates = 0;
+					ret = btrfs_run_delayed_refs(trans, root, updates * 2);
+					if (ret && !err)
+						err = ret;
+				}
+			}
 			goto search_again;
 		} else {
 			path->slots[0]--;
@@ -4449,7 +4460,7 @@ error:
 
 	btrfs_free_path(path);
 
-	if (be_nice && bytes_deleted > 32 * 1024 * 1024) {
+	if (be_nice && btrfs_should_throttle_delayed_refs(trans, root)) {
 		unsigned long updates = trans->delayed_ref_updates;
 		if (updates) {
 			trans->delayed_ref_updates = 0;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index ba831ee..8b9eea8 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -64,6 +64,9 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
 	if (atomic_dec_and_test(&transaction->use_count)) {
 		BUG_ON(!list_empty(&transaction->list));
 		WARN_ON(!RB_EMPTY_ROOT(&transaction->delayed_refs.href_root));
+		if (transaction->delayed_refs.pending_csums)
+			printk(KERN_ERR "pending csums is %llu\n",
+			       transaction->delayed_refs.pending_csums);
 		while (!list_empty(&transaction->pending_chunks)) {
 			struct extent_map *em;
 
@@ -223,6 +226,7 @@ loop:
 	cur_trans->delayed_refs.href_root = RB_ROOT;
 	atomic_set(&cur_trans->delayed_refs.num_entries, 0);
 	cur_trans->delayed_refs.num_heads_ready = 0;
+	cur_trans->delayed_refs.pending_csums = 0;
 	cur_trans->delayed_refs.num_heads = 0;
 	cur_trans->delayed_refs.flushing = 0;
 	cur_trans->delayed_refs.run_delayed_start = 0;
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] Btrfs: refill block reserves during truncate
  2015-04-13 19:22 [PATCH 0/5] Btrfs: truncate space reservation fixes Chris Mason
  2015-04-13 19:22 ` [PATCH 1/5] Btrfs: account for crcs in delayed ref processing Chris Mason
@ 2015-04-13 19:22 ` Chris Mason
  2015-04-13 19:22 ` [PATCH 3/5] Btrfs: reserve space for block groups Chris Mason
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2015-04-13 19:22 UTC (permalink / raw)
  To: linux-btrfs

When truncate starts, it allocates some space in the block reserves so
that we'll have enough to update metadata along the way.

For very large files, we can easily go through all of that space as we
loop through the extents.  This changes truncate to refill the space
reservation as it progresses through the file.

Signed-off-by: Chris Mason <clm@fb.com>
---
 fs/btrfs/ctree.h       |  3 +++
 fs/btrfs/extent-tree.c |  9 ++++-----
 fs/btrfs/inode.c       | 45 +++++++++++++++++++++++++++++++++++++++------
 3 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 95944b8..6bf16d5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3297,6 +3297,9 @@ static inline gfp_t btrfs_alloc_write_mask(struct address_space *mapping)
 }
 
 /* extent-tree.c */
+
+u64 btrfs_csum_bytes_to_leaves(struct btrfs_root *root, u64 csum_bytes);
+
 static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_root *root,
 						 unsigned num_items)
 {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a6f88eb..75f4bed 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2636,7 +2636,7 @@ static inline u64 heads_to_leaves(struct btrfs_root *root, u64 heads)
  * Takes the number of bytes to be csumm'ed and figures out how many leaves it
  * would require to store the csums for that many bytes.
  */
-static u64 csum_bytes_to_leaves(struct btrfs_root *root, u64 csum_bytes)
+u64 btrfs_csum_bytes_to_leaves(struct btrfs_root *root, u64 csum_bytes)
 {
 	u64 csum_size;
 	u64 num_csums_per_leaf;
@@ -2665,7 +2665,7 @@ int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
 	if (num_heads > 1)
 		num_bytes += (num_heads - 1) * root->nodesize;
 	num_bytes <<= 1;
-	num_bytes += csum_bytes_to_leaves(root, csum_bytes) * root->nodesize;
+	num_bytes += btrfs_csum_bytes_to_leaves(root, csum_bytes) * root->nodesize;
 	global_rsv = &root->fs_info->global_block_rsv;
 
 	/*
@@ -5098,13 +5098,12 @@ static u64 calc_csum_metadata_size(struct inode *inode, u64 num_bytes,
 	    BTRFS_I(inode)->csum_bytes == 0)
 		return 0;
 
-	old_csums = csum_bytes_to_leaves(root, BTRFS_I(inode)->csum_bytes);
-
+	old_csums = btrfs_csum_bytes_to_leaves(root, BTRFS_I(inode)->csum_bytes);
 	if (reserve)
 		BTRFS_I(inode)->csum_bytes += num_bytes;
 	else
 		BTRFS_I(inode)->csum_bytes -= num_bytes;
-	num_csums = csum_bytes_to_leaves(root, BTRFS_I(inode)->csum_bytes);
+	num_csums = btrfs_csum_bytes_to_leaves(root, BTRFS_I(inode)->csum_bytes);
 
 	/* No change, no need to reserve more */
 	if (old_csums == num_csums)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cec23cf..88537c5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4163,6 +4163,21 @@ out:
 	return err;
 }
 
+static int truncate_space_check(struct btrfs_trans_handle *trans,
+				struct btrfs_root *root,
+				u64 bytes_deleted)
+{
+	int ret;
+
+	bytes_deleted = btrfs_csum_bytes_to_leaves(root, bytes_deleted);
+	ret = btrfs_block_rsv_add(root, &root->fs_info->trans_block_rsv,
+				  bytes_deleted, BTRFS_RESERVE_NO_FLUSH);
+	if (!ret)
+		trans->bytes_reserved += bytes_deleted;
+	return ret;
+
+}
+
 /*
  * this can truncate away extent items, csum items and directory items.
  * It starts at a high offset and removes keys until it can't find
@@ -4201,6 +4216,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	u64 bytes_deleted = 0;
 	bool be_nice = 0;
 	bool should_throttle = 0;
+	bool should_end = 0;
 
 	BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
 
@@ -4396,6 +4412,8 @@ delete:
 		} else {
 			break;
 		}
+		should_throttle = 0;
+
 		if (found_extent &&
 		    (test_bit(BTRFS_ROOT_REF_COWS, &root->state) ||
 		     root == root->fs_info->tree_root)) {
@@ -4409,17 +4427,24 @@ delete:
 			if (btrfs_should_throttle_delayed_refs(trans, root))
 				btrfs_async_run_delayed_refs(root,
 					trans->delayed_ref_updates * 2, 0);
+			if (be_nice) {
+				if (truncate_space_check(trans, root,
+							 extent_num_bytes)) {
+					should_end = 1;
+				}
+				if (btrfs_should_throttle_delayed_refs(trans,
+								       root)) {
+					should_throttle = 1;
+				}
+			}
 		}
 
 		if (found_type == BTRFS_INODE_ITEM_KEY)
 			break;
 
-		should_throttle =
-			btrfs_should_throttle_delayed_refs(trans, root);
-
 		if (path->slots[0] == 0 ||
 		    path->slots[0] != pending_del_slot ||
-		    (be_nice && should_throttle)) {
+		    should_throttle || should_end) {
 			if (pending_del_nr) {
 				ret = btrfs_del_items(trans, root, path,
 						pending_del_slot,
@@ -4432,7 +4457,7 @@ delete:
 				pending_del_nr = 0;
 			}
 			btrfs_release_path(path);
-			if (be_nice && should_throttle) {
+			if (should_throttle) {
 				unsigned long updates = trans->delayed_ref_updates;
 				if (updates) {
 					trans->delayed_ref_updates = 0;
@@ -4441,6 +4466,14 @@ delete:
 						err = ret;
 				}
 			}
+			/*
+			 * if we failed to refill our space rsv, bail out
+			 * and let the transaction restart
+			 */
+			if (should_end) {
+				err = -EAGAIN;
+				goto error;
+			}
 			goto search_again;
 		} else {
 			path->slots[0]--;
@@ -4460,7 +4493,7 @@ error:
 
 	btrfs_free_path(path);
 
-	if (be_nice && btrfs_should_throttle_delayed_refs(trans, root)) {
+	if (be_nice && bytes_deleted > 32 * 1024 * 1024) {
 		unsigned long updates = trans->delayed_ref_updates;
 		if (updates) {
 			trans->delayed_ref_updates = 0;
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] Btrfs: reserve space for block groups
  2015-04-13 19:22 [PATCH 0/5] Btrfs: truncate space reservation fixes Chris Mason
  2015-04-13 19:22 ` [PATCH 1/5] Btrfs: account for crcs in delayed ref processing Chris Mason
  2015-04-13 19:22 ` [PATCH 2/5] Btrfs: refill block reserves during truncate Chris Mason
@ 2015-04-13 19:22 ` Chris Mason
  2015-04-13 19:22 ` [PATCH 4/5] Btrfs: don't commit the transaction in the async space flushing Chris Mason
  2015-04-13 19:22 ` [PATCH 5/5] Btrfs: don't steal from the global reserve if we don't have the space Chris Mason
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2015-04-13 19:22 UTC (permalink / raw)
  To: linux-btrfs

From: Josef Bacik <jbacik@fb.com>

This changes our delayed refs calculations to include the space needed
to write back dirty block groups.

Signed-off-by: Chris Mason <clm@fb.com>
---
 fs/btrfs/extent-tree.c | 12 +++++++++---
 fs/btrfs/transaction.c |  1 +
 fs/btrfs/transaction.h |  1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 75f4bed..ae8db3ba 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2657,7 +2657,8 @@ int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
 	struct btrfs_block_rsv *global_rsv;
 	u64 num_heads = trans->transaction->delayed_refs.num_heads_ready;
 	u64 csum_bytes = trans->transaction->delayed_refs.pending_csums;
-	u64 num_bytes;
+	u64 num_dirty_bgs = trans->transaction->num_dirty_bgs;
+	u64 num_bytes, num_dirty_bgs_bytes;
 	int ret = 0;
 
 	num_bytes = btrfs_calc_trans_metadata_size(root, 1);
@@ -2666,17 +2667,21 @@ int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
 		num_bytes += (num_heads - 1) * root->nodesize;
 	num_bytes <<= 1;
 	num_bytes += btrfs_csum_bytes_to_leaves(root, csum_bytes) * root->nodesize;
+	num_dirty_bgs_bytes = btrfs_calc_trans_metadata_size(root,
+							     num_dirty_bgs);
 	global_rsv = &root->fs_info->global_block_rsv;
 
 	/*
 	 * If we can't allocate any more chunks lets make sure we have _lots_ of
 	 * wiggle room since running delayed refs can create more delayed refs.
 	 */
-	if (global_rsv->space_info->full)
+	if (global_rsv->space_info->full) {
+		num_dirty_bgs_bytes <<= 1;
 		num_bytes <<= 1;
+	}
 
 	spin_lock(&global_rsv->lock);
-	if (global_rsv->reserved <= num_bytes)
+	if (global_rsv->reserved <= num_bytes + num_dirty_bgs_bytes)
 		ret = 1;
 	spin_unlock(&global_rsv->lock);
 	return ret;
@@ -5408,6 +5413,7 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 		if (list_empty(&cache->dirty_list)) {
 			list_add_tail(&cache->dirty_list,
 				      &trans->transaction->dirty_bgs);
+				trans->transaction->num_dirty_bgs++;
 			btrfs_get_block_group(cache);
 		}
 		spin_unlock(&trans->transaction->dirty_bgs_lock);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8b9eea8..234d606 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -251,6 +251,7 @@ loop:
 	INIT_LIST_HEAD(&cur_trans->switch_commits);
 	INIT_LIST_HEAD(&cur_trans->pending_ordered);
 	INIT_LIST_HEAD(&cur_trans->dirty_bgs);
+	cur_trans->num_dirty_bgs = 0;
 	spin_lock_init(&cur_trans->dirty_bgs_lock);
 	list_add_tail(&cur_trans->list, &fs_info->trans_list);
 	extent_io_tree_init(&cur_trans->dirty_pages,
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 96b189b..4cb0ae2 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -64,6 +64,7 @@ struct btrfs_transaction {
 	struct list_head pending_ordered;
 	struct list_head switch_commits;
 	struct list_head dirty_bgs;
+	u64 num_dirty_bgs;
 	spinlock_t dirty_bgs_lock;
 	struct btrfs_delayed_ref_root delayed_refs;
 	int aborted;
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] Btrfs: don't commit the transaction in the async space flushing
  2015-04-13 19:22 [PATCH 0/5] Btrfs: truncate space reservation fixes Chris Mason
                   ` (2 preceding siblings ...)
  2015-04-13 19:22 ` [PATCH 3/5] Btrfs: reserve space for block groups Chris Mason
@ 2015-04-13 19:22 ` Chris Mason
  2015-04-13 19:22 ` [PATCH 5/5] Btrfs: don't steal from the global reserve if we don't have the space Chris Mason
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2015-04-13 19:22 UTC (permalink / raw)
  To: linux-btrfs

From: Josef Bacik <jbacik@fb.com>

We're triggering a huge number of commits from
btrfs_async_reclaim_metadata_space.  These aren't really requried,
because everyone calling the async reclaim code is going to end up
triggering a commit on their own.

Signed-off-by: Chris Mason <clm@fb.com>
---
 fs/btrfs/extent-tree.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ae8db3ba..3d4b3d680 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4329,8 +4329,13 @@ out:
 static inline int need_do_async_reclaim(struct btrfs_space_info *space_info,
 					struct btrfs_fs_info *fs_info, u64 used)
 {
-	return (used >= div_factor_fine(space_info->total_bytes, 98) &&
-		!btrfs_fs_closing(fs_info) &&
+	u64 thresh = div_factor_fine(space_info->total_bytes, 98);
+
+	/* If we're just plain full then async reclaim just slows us down. */
+	if (space_info->bytes_used >= thresh)
+		return 0;
+
+	return (used >= thresh && !btrfs_fs_closing(fs_info) &&
 		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
 }
 
@@ -4385,10 +4390,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 		if (!btrfs_need_do_async_reclaim(space_info, fs_info,
 						 flush_state))
 			return;
-	} while (flush_state <= COMMIT_TRANS);
-
-	if (btrfs_need_do_async_reclaim(space_info, fs_info, flush_state))
-		queue_work(system_unbound_wq, work);
+	} while (flush_state < COMMIT_TRANS);
 }
 
 void btrfs_init_async_reclaim_work(struct work_struct *work)
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] Btrfs: don't steal from the global reserve if we don't have the space
  2015-04-13 19:22 [PATCH 0/5] Btrfs: truncate space reservation fixes Chris Mason
                   ` (3 preceding siblings ...)
  2015-04-13 19:22 ` [PATCH 4/5] Btrfs: don't commit the transaction in the async space flushing Chris Mason
@ 2015-04-13 19:22 ` Chris Mason
  4 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2015-04-13 19:22 UTC (permalink / raw)
  To: linux-btrfs

From: Josef Bacik <jbacik@fb.com>

btrfs_evict_inode() needs to be more careful about stealing from the
global_rsv.  We dont' want to end up aborting commit with ENOSPC just
because the evict_inode code was too greedy.

Signed-off-by: Chris Mason <clm@fb.com>
---
 fs/btrfs/inode.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 88537c5..141df0c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5010,6 +5010,7 @@ void btrfs_evict_inode(struct inode *inode)
 	struct btrfs_trans_handle *trans;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_block_rsv *rsv, *global_rsv;
+	int steal_from_global = 0;
 	u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
 	int ret;
 
@@ -5077,9 +5078,20 @@ void btrfs_evict_inode(struct inode *inode)
 		 * hard as possible to get this to work.
 		 */
 		if (ret)
-			ret = btrfs_block_rsv_migrate(global_rsv, rsv, min_size);
+			steal_from_global++;
+		else
+			steal_from_global = 0;
+		ret = 0;
 
-		if (ret) {
+		/*
+		 * steal_from_global == 0: we reserved stuff, hooray!
+		 * steal_from_global == 1: we didn't reserve stuff, boo!
+		 * steal_from_global == 2: we've committed, still not a lot of
+		 * room but maybe we'll have room in the global reserve this
+		 * time.
+		 * steal_from_global == 3: abandon all hope!
+		 */
+		if (steal_from_global > 2) {
 			btrfs_warn(root->fs_info,
 				"Could not get space for a delete, will truncate on mount %d",
 				ret);
@@ -5095,6 +5107,36 @@ void btrfs_evict_inode(struct inode *inode)
 			goto no_delete;
 		}
 
+		/*
+		 * We can't just steal from the global reserve, we need tomake
+		 * sure there is room to do it, if not we need to commit and try
+		 * again.
+		 */
+		if (steal_from_global) {
+			if (!btrfs_check_space_for_delayed_refs(trans, root))
+				ret = btrfs_block_rsv_migrate(global_rsv, rsv,
+							      min_size);
+			else
+				ret = -ENOSPC;
+		}
+
+		/*
+		 * Couldn't steal from the global reserve, we have too much
+		 * pending stuff built up, commit the transaction and try it
+		 * again.
+		 */
+		if (ret) {
+			ret = btrfs_commit_transaction(trans, root);
+			if (ret) {
+				btrfs_orphan_del(NULL, inode);
+				btrfs_free_block_rsv(root, rsv);
+				goto no_delete;
+			}
+			continue;
+		} else {
+			steal_from_global = 0;
+		}
+
 		trans->block_rsv = rsv;
 
 		ret = btrfs_truncate_inode_items(trans, root, inode, 0, 0);
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-04-13 19:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-13 19:22 [PATCH 0/5] Btrfs: truncate space reservation fixes Chris Mason
2015-04-13 19:22 ` [PATCH 1/5] Btrfs: account for crcs in delayed ref processing Chris Mason
2015-04-13 19:22 ` [PATCH 2/5] Btrfs: refill block reserves during truncate Chris Mason
2015-04-13 19:22 ` [PATCH 3/5] Btrfs: reserve space for block groups Chris Mason
2015-04-13 19:22 ` [PATCH 4/5] Btrfs: don't commit the transaction in the async space flushing Chris Mason
2015-04-13 19:22 ` [PATCH 5/5] Btrfs: don't steal from the global reserve if we don't have the space Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).