linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: dsterba@suse.cz
Subject: [PATCH] btrfs-progs: Fix wrong chunk allocation type in btrfs_reserve_extent
Date: Wed, 10 Jan 2018 13:51:13 +0800	[thread overview]
Message-ID: <20180110055113.18976-1-wqu@suse.com> (raw)

btrfs_reserve_extent() will try to allocate new chunk if there is not
enough space (mostly meta space).

However when it tries to allocate new meta chunk, it will always try to
allocate SINGLE meta chunk, and if the fs is using other profile, it
will cause dead allocation which can't be really used.

And when trying to allocate new meta chunk for fs trees, it uses the
@num_bytes which is normally times larger than metadata space.

Fix it by using proper meta profile and calculate based on file extent
other than @num_bytes if we are allocating meta space for file extents.

And all such chunk allocation condition is not explained at all.
Add needed (although maybe a little overkilled) comment for it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Needed prerequisite is pushed to github:
https://github.com/adam900710/btrfs-progs/tree/chunk_alloc_fixes

And unfortunately, the condition to trigger chunk allocation is pretty
hard to trigger.
For normal mkfs case, we will always have enough space, while for normal
usage, kernel will ensure there is enough space (global reservation) for
extent tree update, which is normally larger than the 2M headroom used in 
btrfs-progs.

So no easy test case here.
---
 extent-tree.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 47 insertions(+), 8 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 90e792a3fe62..aab09d926043 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1907,7 +1907,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 		return 0;
 
 	ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
-	                        space_info->flags, false);
+	                        flags, false);
 	if (ret == -ENOSPC) {
 		space_info->full = 1;
 		return ret;
@@ -2652,8 +2652,11 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
 	u64 search_start = 0;
 	u64 alloc_profile;
 	u64 profile;
+	u64 meta_profile;
 	struct btrfs_fs_info *info = root->fs_info;
 
+	meta_profile = info->avail_metadata_alloc_bits &
+		       info->metadata_alloc_profile;
 	if (is_data) {
 		alloc_profile = info->avail_data_alloc_bits &
 			        info->data_alloc_profile;
@@ -2663,21 +2666,57 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
 			        info->system_alloc_profile;
 		profile = BTRFS_BLOCK_GROUP_SYSTEM | alloc_profile;
 	} else {
-		alloc_profile = info->avail_metadata_alloc_bits &
-			        info->metadata_alloc_profile;
+		alloc_profile = meta_profile;
 		profile = BTRFS_BLOCK_GROUP_METADATA | alloc_profile;
 	}
 
+	/*
+	 * If we're modifying global trees likes extent/root/device tree
+	 * (whose ref_cows is 0), we shouldn't trigger chunk allocation AT ALL!
+	 *
+	 * The problem is, if root is extent tree, we trigger a chunk
+	 * allocation, then it will modify extent tree, causing a path deadlock.
+	 * (Although not implemented in btrfs-progs, it will still cause tree
+	 * modification screwed up)
+	 */
 	if (root->ref_cows) {
+		u64 meta_reserve;
+
 		if (!(profile & BTRFS_BLOCK_GROUP_METADATA)) {
-			ret = do_chunk_alloc(trans, info,
-					     num_bytes,
-					     BTRFS_BLOCK_GROUP_METADATA);
-			BUG_ON(ret);
+			/*
+			 * If we're modifying fs trees and allocating data
+			 * extent, we need to ensure we have enough metadata
+			 * space to insert new file extent item.
+			 *
+			 * Here we need to ensure we have enough space for the
+			 * following operations:
+			 * 1) Insert one file extent
+			 *    A full tree CoW + Leaf split should be enough.
+			 *
+			 * 2) Possible new chunk allocation
+			 *    4 trees are involved, chunk, device, extent and
+			 *    root tree.
+			 *    However chunk tree is counted as system chunk, and
+			 *    never
+			 *    really needs a new system chunk even for super
+			 *    large fs.
+			 *    Only 3 trees are need.
+			 *
+			 * So it's total 4 trees, subvolume, extent, device and
+			 * root tree.
+			 * Each tree needs a full tree CoW + Leaf split.
+			 */
+			meta_reserve = 4 * ( BTRFS_MAX_LEVEL + 1) *
+				       info->nodesize;
+			ret = do_chunk_alloc(trans, info, meta_reserve,
+				meta_profile | BTRFS_BLOCK_GROUP_METADATA);
 		}
+
+		/* Keep 2M as free space headroom */
 		ret = do_chunk_alloc(trans, info,
 				     num_bytes + SZ_2M, profile);
-		BUG_ON(ret);
+		if (ret < 0)
+			return ret;
 	}
 
 	WARN_ON(num_bytes < info->sectorsize);
-- 
2.15.1


                 reply	other threads:[~2018-01-10  5:51 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180110055113.18976-1-wqu@suse.com \
    --to=wqu@suse.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).