linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Chunk allocator unification
@ 2018-02-02  8:19 Qu Wenruo
  2018-02-02  8:19 ` [PATCH 1/7] btrfs-progs: Refactor parameter of BTRFS_MAX_DEVS() from root to fs_info Qu Wenruo
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  8:19 UTC (permalink / raw)
  To: linux-btrfs, dsterba

This patchset can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/libbtrfs_prepare

This patchset unified a large part of chunk allocator (free device
extent search) between kernel and btrfs-progs.

Although there are still differences in the following areas, the goal of
extract common code into libbtrfs is in the reach.

1) Member differences
   Btrfs-progs lacks quite some members of kernel, so something like
   fs_devices->total_rw_bytes still makes difference.

2) @convert parameter for mkfs/convert
   It should be able to extract them into a simpler function.

3) delayed bg creations vs direct bg creation
   Kernel delays block group creation while btrfs-progs directly
   allocates block group.
   This makes kernel __btrfs_alloc_chunk() more or less chunk mapping
   allocation only, while btrfs-progs handles extra chunk, dev extent
   and block group item creation.

   This needs extra function refactor before we really extra code into
   libbtrfs.

Despite the preparation for later libbtrfs, this rework still has some
benefit:

1) No minimal stripe/chunk size
   Unlike the intermediate minimal stripe/chunk size in btrfs-progs,
   kernel doesn't really implement a minimal stripe/chunk size.

   So now we don't have such limit in btrfs-progs side now.
   Although the minimal device size calculation doesn't make any harm so
   it's still kept as is.

2) Slightly less code for chunk allocator
   Just about -60 lines in the last patch.


Qu Wenruo (7):
  btrfs-progs: Refactor parameter of BTRFS_MAX_DEVS() from root to
    fs_info
  btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk
  btrfs-progs: Make btrfs_alloc_chunk to handle block group creation
  btrfs-progs: Introduce btrfs_raid_array and related infrastructures
  btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum
    hole size
  btrfs-progs: kernel-lib: Port kernel sort() to btrfs-progs
  btrfs-progs: volumes: Unify free dev extent search behavior between
    kernel and btrfs-progs.

 Makefile          |   3 +-
 convert/main.c    |  10 +-
 ctree.h           |  12 +-
 extent-tree.c     |  12 +-
 kerncompat.h      |   5 +
 kernel-lib/sort.c | 104 +++++++
 kernel-lib/sort.h |  16 ++
 mkfs/main.c       |  27 +-
 volumes.c         | 838 +++++++++++++++++++++++++-----------------------------
 volumes.h         |  42 ++-
 10 files changed, 578 insertions(+), 491 deletions(-)
 create mode 100644 kernel-lib/sort.c
 create mode 100644 kernel-lib/sort.h

-- 
2.16.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] btrfs-progs: Refactor parameter of BTRFS_MAX_DEVS() from root to fs_info
  2018-02-02  8:19 [PATCH 0/7] Chunk allocator unification Qu Wenruo
@ 2018-02-02  8:19 ` Qu Wenruo
  2018-02-02  8:19 ` [PATCH 2/7] btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk Qu Wenruo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  8:19 UTC (permalink / raw)
  To: linux-btrfs, dsterba

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 volumes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/volumes.c b/volumes.c
index edad367b593c..677d085de96c 100644
--- a/volumes.c
+++ b/volumes.c
@@ -826,7 +826,7 @@ error:
 	return ret;
 }
 
-#define BTRFS_MAX_DEVS(r) ((BTRFS_LEAF_DATA_SIZE(r->fs_info)	\
+#define BTRFS_MAX_DEVS(info) ((BTRFS_LEAF_DATA_SIZE(info)	\
 			- sizeof(struct btrfs_item)		\
 			- sizeof(struct btrfs_chunk))		\
 			/ sizeof(struct btrfs_stripe) + 1)
@@ -882,12 +882,12 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 			calc_size = SZ_1G;
 			max_chunk_size = 10 * calc_size;
 			min_stripe_size = SZ_64M;
-			max_stripes = BTRFS_MAX_DEVS(chunk_root);
+			max_stripes = BTRFS_MAX_DEVS(info);
 		} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
 			calc_size = SZ_1G;
 			max_chunk_size = 4 * calc_size;
 			min_stripe_size = SZ_32M;
-			max_stripes = BTRFS_MAX_DEVS(chunk_root);
+			max_stripes = BTRFS_MAX_DEVS(info);
 		}
 	}
 	if (type & BTRFS_BLOCK_GROUP_RAID1) {
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk
  2018-02-02  8:19 [PATCH 0/7] Chunk allocator unification Qu Wenruo
  2018-02-02  8:19 ` [PATCH 1/7] btrfs-progs: Refactor parameter of BTRFS_MAX_DEVS() from root to fs_info Qu Wenruo
@ 2018-02-02  8:19 ` Qu Wenruo
  2018-02-02  9:20   ` Su Yue
  2018-02-02  8:19 ` [PATCH 3/7] btrfs-progs: Make btrfs_alloc_chunk to handle block group creation Qu Wenruo
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  8:19 UTC (permalink / raw)
  To: linux-btrfs, dsterba

We used to have two chunk allocators, btrfs_alloc_chunk() and
btrfs_alloc_data_chunk(), the former is the more generic one, while the
later is only used in mkfs and convert, to allocate SINGLE data chunk.

Although btrfs_alloc_data_chunk() has some special hacks to cooperate
with convert, it's quite simple to integrity it into the generic chunk
allocator.

So merge them into one btrfs_alloc_chunk(), with extra @convert
parameter and necessary comment, to make code less duplicated and less
thing to maintain.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 convert/main.c |   6 +-
 extent-tree.c  |   2 +-
 mkfs/main.c    |   8 +--
 volumes.c      | 219 ++++++++++++++++++++++-----------------------------------
 volumes.h      |   5 +-
 5 files changed, 94 insertions(+), 146 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index b3ea31d7af43..b2444bb2ff21 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -910,9 +910,9 @@ static int make_convert_data_block_groups(struct btrfs_trans_handle *trans,
 
 			len = min(max_chunk_size,
 				  cache->start + cache->size - cur);
-			ret = btrfs_alloc_data_chunk(trans, fs_info,
-					&cur_backup, len,
-					BTRFS_BLOCK_GROUP_DATA, 1);
+			ret = btrfs_alloc_chunk(trans, fs_info,
+					&cur_backup, &len,
+					BTRFS_BLOCK_GROUP_DATA, true);
 			if (ret < 0)
 				break;
 			ret = btrfs_make_block_group(trans, fs_info, 0,
diff --git a/extent-tree.c b/extent-tree.c
index e2ae74a7fe66..b085ab0352b3 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1906,7 +1906,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 		return 0;
 
 	ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
-	                        space_info->flags);
+	                        space_info->flags, false);
 	if (ret == -ENOSPC) {
 		space_info->full = 1;
 		return 0;
diff --git a/mkfs/main.c b/mkfs/main.c
index ea65c6d897b2..358395ca0250 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -82,7 +82,7 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
 		ret = btrfs_alloc_chunk(trans, fs_info,
 					&chunk_start, &chunk_size,
 					BTRFS_BLOCK_GROUP_METADATA |
-					BTRFS_BLOCK_GROUP_DATA);
+					BTRFS_BLOCK_GROUP_DATA, false);
 		if (ret == -ENOSPC) {
 			error("no space to allocate data/metadata chunk");
 			goto err;
@@ -99,7 +99,7 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
 	} else {
 		ret = btrfs_alloc_chunk(trans, fs_info,
 					&chunk_start, &chunk_size,
-					BTRFS_BLOCK_GROUP_METADATA);
+					BTRFS_BLOCK_GROUP_METADATA, false);
 		if (ret == -ENOSPC) {
 			error("no space to allocate metadata chunk");
 			goto err;
@@ -133,7 +133,7 @@ static int create_data_block_groups(struct btrfs_trans_handle *trans,
 	if (!mixed) {
 		ret = btrfs_alloc_chunk(trans, fs_info,
 					&chunk_start, &chunk_size,
-					BTRFS_BLOCK_GROUP_DATA);
+					BTRFS_BLOCK_GROUP_DATA, false);
 		if (ret == -ENOSPC) {
 			error("no space to allocate data chunk");
 			goto err;
@@ -241,7 +241,7 @@ static int create_one_raid_group(struct btrfs_trans_handle *trans,
 	int ret;
 
 	ret = btrfs_alloc_chunk(trans, fs_info,
-				&chunk_start, &chunk_size, type);
+				&chunk_start, &chunk_size, type, false);
 	if (ret == -ENOSPC) {
 		error("not enough free space to allocate chunk");
 		exit(1);
diff --git a/volumes.c b/volumes.c
index 677d085de96c..9ee4650351c3 100644
--- a/volumes.c
+++ b/volumes.c
@@ -836,9 +836,23 @@ error:
 				- 2 * sizeof(struct btrfs_chunk))	\
 				/ sizeof(struct btrfs_stripe) + 1)
 
+/*
+ * Alloc a chunk, will insert dev extents, chunk item.
+ * NOTE: This function will not insert block group item nor mark newly
+ * allocated chunk available for later allocation.
+ * Block group item and free space update is handled by btrfs_make_block_group()
+ *
+ * @start:	return value of allocated chunk start bytenr.
+ * @num_bytes:	return value of allocated chunk size
+ * @type:	chunk type (including both profile and type)
+ * @convert:	if the chunk is allocated for convert case.
+ * 		If @convert is true, chunk allocator will skip device extent
+ * 		search, but use *start and *num_bytes as chunk start/num_bytes
+ * 		and devive offset, to build a 1:1 chunk mapping for convert.
+ */
 int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		      struct btrfs_fs_info *info, u64 *start,
-		      u64 *num_bytes, u64 type)
+		      u64 *num_bytes, u64 type, bool convert)
 {
 	u64 dev_offset;
 	struct btrfs_root *extent_root = info->extent_root;
@@ -868,10 +882,38 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	struct btrfs_key key;
 	u64 offset;
 
-	if (list_empty(dev_list)) {
+	if (list_empty(dev_list))
 		return -ENOSPC;
+
+	if (convert) {
+		/* For convert, profile must be SINGLE */
+		if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+			error("convert only suports SINGLE profile");
+			return -EINVAL;
+		}
+		if (!IS_ALIGNED(*start, info->sectorsize)) {
+			error("chunk start not aligned, start=%llu sectorsize=%u",
+				*start, info->sectorsize);
+			return -EINVAL;
+		}
+		if (!IS_ALIGNED(*num_bytes, info->sectorsize)) {
+			error("chunk size not aligned, size=%llu sectorsize=%u",
+				*num_bytes, info->sectorsize);
+			return -EINVAL;
+		}
+		calc_size = *num_bytes;
+		offset = *start;
+		/*
+		 * For convert, we use 1:1 chunk mapping specified by @start and
+		 * @num_bytes, so there is no need to go through dev_extent
+		 * searching.
+		 */
+		goto alloc_chunk;
 	}
 
+	/*
+	 * Chunk size calculation part.
+	 */
 	if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
 		if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
 			calc_size = SZ_8M;
@@ -942,6 +984,9 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	percent_max = div_factor(btrfs_super_total_bytes(info->super_copy), 1);
 	max_chunk_size = min(percent_max, max_chunk_size);
 
+	/*
+	 * Reserve space from each device.
+	 */
 again:
 	if (chunk_bytes_by_type(type, calc_size, num_stripes, sub_stripes) >
 	    max_chunk_size) {
@@ -972,7 +1017,8 @@ again:
 			return ret;
 		cur = cur->next;
 		if (avail >= min_free) {
-			list_move_tail(&device->dev_list, &private_devs);
+			list_move_tail(&device->dev_list,
+				       &private_devs);
 			index++;
 			if (type & BTRFS_BLOCK_GROUP_DUP)
 				index++;
@@ -999,9 +1045,16 @@ again:
 		}
 		return -ENOSPC;
 	}
-	ret = find_next_chunk(info, &offset);
-	if (ret)
-		return ret;
+
+	/*
+	 * Fill chunk mapping and chunk stripes
+	 */
+alloc_chunk:
+	if (!convert) {
+		ret = find_next_chunk(info, &offset);
+		if (ret)
+			return ret;
+	}
 	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
 	key.type = BTRFS_CHUNK_ITEM_KEY;
 	key.offset = offset;
@@ -1022,17 +1075,31 @@ again:
 	index = 0;
 	while(index < num_stripes) {
 		struct btrfs_stripe *stripe;
-		BUG_ON(list_empty(&private_devs));
-		cur = private_devs.next;
-		device = list_entry(cur, struct btrfs_device, dev_list);
 
-		/* loop over this device again if we're doing a dup group */
-		if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
-		    (index == num_stripes - 1))
-			list_move_tail(&device->dev_list, dev_list);
+		if (!convert) {
+			if (list_empty(&private_devs))
+				return -ENODEV;
+			cur = private_devs.next;
+			device = list_entry(cur, struct btrfs_device, dev_list);
+			if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
+			    (index == num_stripes - 1)) {
+				/*
+				 * loop over this device again if we're doing a
+				 * dup group
+				 */
+				list_move_tail(&device->dev_list, dev_list);
+			}
+		} else {
+			/* Only SINGLE is accepted in convert case */
+			BUG_ON(num_stripes > 1);
+			device = list_entry(dev_list->next, struct btrfs_device,
+					    dev_list);
+			key.offset = *start;
+			dev_offset = *start;
+		}
 
 		ret = btrfs_alloc_dev_extent(trans, device, key.offset,
-			     calc_size, &dev_offset, 0);
+			     calc_size, &dev_offset, convert);
 		if (ret < 0)
 			goto out_chunk_map;
 
@@ -1049,7 +1116,7 @@ again:
 		memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
 		index++;
 	}
-	BUG_ON(!list_empty(&private_devs));
+	BUG_ON(!convert && !list_empty(&private_devs));
 
 	/* key was set above */
 	btrfs_set_stack_chunk_length(chunk, *num_bytes);
@@ -1069,6 +1136,9 @@ again:
 	map->num_stripes = num_stripes;
 	map->sub_stripes = sub_stripes;
 
+	/*
+	 * Insert chunk item and chunk mapping.
+	 */
 	ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
 				btrfs_chunk_item_size(num_stripes));
 	BUG_ON(ret);
@@ -1098,125 +1168,6 @@ out_chunk:
 	return ret;
 }
 
-/*
- * Alloc a DATA chunk with SINGLE profile.
- *
- * If 'convert' is set, it will alloc a chunk with 1:1 mapping
- * (btrfs logical bytenr == on-disk bytenr)
- * For that case, caller must make sure the chunk and dev_extent are not
- * occupied.
- */
-int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
-			   struct btrfs_fs_info *info, u64 *start,
-			   u64 num_bytes, u64 type, int convert)
-{
-	u64 dev_offset;
-	struct btrfs_root *extent_root = info->extent_root;
-	struct btrfs_root *chunk_root = info->chunk_root;
-	struct btrfs_stripe *stripes;
-	struct btrfs_device *device = NULL;
-	struct btrfs_chunk *chunk;
-	struct list_head *dev_list = &info->fs_devices->devices;
-	struct list_head *cur;
-	struct map_lookup *map;
-	u64 calc_size = SZ_8M;
-	int num_stripes = 1;
-	int sub_stripes = 0;
-	int ret;
-	int index;
-	int stripe_len = BTRFS_STRIPE_LEN;
-	struct btrfs_key key;
-
-	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
-	key.type = BTRFS_CHUNK_ITEM_KEY;
-	if (convert) {
-		if (*start != round_down(*start, info->sectorsize)) {
-			error("DATA chunk start not sectorsize aligned: %llu",
-					(unsigned long long)*start);
-			return -EINVAL;
-		}
-		key.offset = *start;
-		dev_offset = *start;
-	} else {
-		u64 tmp;
-
-		ret = find_next_chunk(info, &tmp);
-		key.offset = tmp;
-		if (ret)
-			return ret;
-	}
-
-	chunk = kmalloc(btrfs_chunk_item_size(num_stripes), GFP_NOFS);
-	if (!chunk)
-		return -ENOMEM;
-
-	map = kmalloc(btrfs_map_lookup_size(num_stripes), GFP_NOFS);
-	if (!map) {
-		kfree(chunk);
-		return -ENOMEM;
-	}
-
-	stripes = &chunk->stripe;
-	calc_size = num_bytes;
-
-	index = 0;
-	cur = dev_list->next;
-	device = list_entry(cur, struct btrfs_device, dev_list);
-
-	while (index < num_stripes) {
-		struct btrfs_stripe *stripe;
-
-		ret = btrfs_alloc_dev_extent(trans, device, key.offset,
-			     calc_size, &dev_offset, convert);
-		BUG_ON(ret);
-
-		device->bytes_used += calc_size;
-		ret = btrfs_update_device(trans, device);
-		BUG_ON(ret);
-
-		map->stripes[index].dev = device;
-		map->stripes[index].physical = dev_offset;
-		stripe = stripes + index;
-		btrfs_set_stack_stripe_devid(stripe, device->devid);
-		btrfs_set_stack_stripe_offset(stripe, dev_offset);
-		memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
-		index++;
-	}
-
-	/* key was set above */
-	btrfs_set_stack_chunk_length(chunk, num_bytes);
-	btrfs_set_stack_chunk_owner(chunk, extent_root->root_key.objectid);
-	btrfs_set_stack_chunk_stripe_len(chunk, stripe_len);
-	btrfs_set_stack_chunk_type(chunk, type);
-	btrfs_set_stack_chunk_num_stripes(chunk, num_stripes);
-	btrfs_set_stack_chunk_io_align(chunk, stripe_len);
-	btrfs_set_stack_chunk_io_width(chunk, stripe_len);
-	btrfs_set_stack_chunk_sector_size(chunk, info->sectorsize);
-	btrfs_set_stack_chunk_sub_stripes(chunk, sub_stripes);
-	map->sector_size = info->sectorsize;
-	map->stripe_len = stripe_len;
-	map->io_align = stripe_len;
-	map->io_width = stripe_len;
-	map->type = type;
-	map->num_stripes = num_stripes;
-	map->sub_stripes = sub_stripes;
-
-	ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
-				btrfs_chunk_item_size(num_stripes));
-	BUG_ON(ret);
-	if (!convert)
-		*start = key.offset;
-
-	map->ce.start = key.offset;
-	map->ce.size = num_bytes;
-
-	ret = insert_cache_extent(&info->mapping_tree.cache_tree, &map->ce);
-	BUG_ON(ret);
-
-	kfree(chunk);
-	return ret;
-}
-
 int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 {
 	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
diff --git a/volumes.h b/volumes.h
index 11572e78c04f..7bbdf615d31a 100644
--- a/volumes.h
+++ b/volumes.h
@@ -208,10 +208,7 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
 int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		      struct btrfs_fs_info *fs_info, u64 *start,
-		      u64 *num_bytes, u64 type);
-int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
-			   struct btrfs_fs_info *fs_info, u64 *start,
-			   u64 num_bytes, u64 type, int convert);
+		      u64 *num_bytes, u64 type, bool convert);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		       int flags);
 int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] btrfs-progs: Make btrfs_alloc_chunk to handle block group creation
  2018-02-02  8:19 [PATCH 0/7] Chunk allocator unification Qu Wenruo
  2018-02-02  8:19 ` [PATCH 1/7] btrfs-progs: Refactor parameter of BTRFS_MAX_DEVS() from root to fs_info Qu Wenruo
  2018-02-02  8:19 ` [PATCH 2/7] btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk Qu Wenruo
@ 2018-02-02  8:19 ` Qu Wenruo
  2018-02-02 11:33   ` Nikolay Borisov
  2018-02-02  8:19 ` [PATCH 4/7] btrfs-progs: Introduce btrfs_raid_array and related infrastructures Qu Wenruo
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  8:19 UTC (permalink / raw)
  To: linux-btrfs, dsterba

Before this patch, chunk allocation is split into 2 parts:

1) Chunk allocation
   Handled by btrfs_alloc_chunk(), which will insert chunk and device
   extent items.

2) Block group allocation
   Handled by btrfs_make_block_group(), which will insert block group
   item and update space info.

However for chunk allocation, we don't really need to split these
operations as all btrfs_alloc_chunk() has btrfs_make_block_group()
followed.

So it's reasonable to merge btrfs_make_block_group() call into
btrfs_alloc_chunk() to save several lines, and provides the basis for
later btrfs_alloc_chunk() rework.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 convert/main.c |  4 ----
 extent-tree.c  | 10 ++--------
 mkfs/main.c    | 19 -------------------
 volumes.c      | 10 ++++++----
 4 files changed, 8 insertions(+), 35 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index b2444bb2ff21..240d3aa46db9 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -915,10 +915,6 @@ static int make_convert_data_block_groups(struct btrfs_trans_handle *trans,
 					BTRFS_BLOCK_GROUP_DATA, true);
 			if (ret < 0)
 				break;
-			ret = btrfs_make_block_group(trans, fs_info, 0,
-					BTRFS_BLOCK_GROUP_DATA, cur, len);
-			if (ret < 0)
-				break;
 			cur += len;
 		}
 	}
diff --git a/extent-tree.c b/extent-tree.c
index b085ab0352b3..bccd83d1bae6 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1909,15 +1909,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 	                        space_info->flags, false);
 	if (ret == -ENOSPC) {
 		space_info->full = 1;
-		return 0;
+		return ret;
 	}
-
-	BUG_ON(ret);
-
-	ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags,
-				     start, num_bytes);
-	BUG_ON(ret);
-	return 0;
+	return ret;
 }
 
 static int update_block_group(struct btrfs_root *root,
diff --git a/mkfs/main.c b/mkfs/main.c
index 358395ca0250..49159ea533b9 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -87,12 +87,6 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
 			error("no space to allocate data/metadata chunk");
 			goto err;
 		}
-		if (ret)
-			return ret;
-		ret = btrfs_make_block_group(trans, fs_info, 0,
-					     BTRFS_BLOCK_GROUP_METADATA |
-					     BTRFS_BLOCK_GROUP_DATA,
-					     chunk_start, chunk_size);
 		if (ret)
 			return ret;
 		allocation->mixed += chunk_size;
@@ -106,12 +100,7 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
 		}
 		if (ret)
 			return ret;
-		ret = btrfs_make_block_group(trans, fs_info, 0,
-					     BTRFS_BLOCK_GROUP_METADATA,
-					     chunk_start, chunk_size);
 		allocation->metadata += chunk_size;
-		if (ret)
-			return ret;
 	}
 
 	root->fs_info->system_allocs = 0;
@@ -140,12 +129,7 @@ static int create_data_block_groups(struct btrfs_trans_handle *trans,
 		}
 		if (ret)
 			return ret;
-		ret = btrfs_make_block_group(trans, fs_info, 0,
-					     BTRFS_BLOCK_GROUP_DATA,
-					     chunk_start, chunk_size);
 		allocation->data += chunk_size;
-		if (ret)
-			return ret;
 	}
 
 err:
@@ -249,9 +233,6 @@ static int create_one_raid_group(struct btrfs_trans_handle *trans,
 	if (ret)
 		return ret;
 
-	ret = btrfs_make_block_group(trans, fs_info, 0,
-				     type, chunk_start, chunk_size);
-
 	type &= BTRFS_BLOCK_GROUP_TYPE_MASK;
 	if (type == BTRFS_BLOCK_GROUP_DATA) {
 		allocation->data += chunk_size;
diff --git a/volumes.c b/volumes.c
index 9ee4650351c3..a9dc8c939dc5 100644
--- a/volumes.c
+++ b/volumes.c
@@ -837,10 +837,9 @@ error:
 				/ sizeof(struct btrfs_stripe) + 1)
 
 /*
- * Alloc a chunk, will insert dev extents, chunk item.
- * NOTE: This function will not insert block group item nor mark newly
- * allocated chunk available for later allocation.
- * Block group item and free space update is handled by btrfs_make_block_group()
+ * Alloc a chunk, will insert dev extents, chunk item, and insert new
+ * block group and update space info (so that extent allocator can use
+ * newly allocated chunk).
  *
  * @start:	return value of allocated chunk start bytenr.
  * @num_bytes:	return value of allocated chunk size
@@ -1159,6 +1158,9 @@ alloc_chunk:
 	}
 
 	kfree(chunk);
+
+	ret = btrfs_make_block_group(trans, info, 0, type, map->ce.start,
+				     map->ce.size);
 	return ret;
 
 out_chunk_map:
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] btrfs-progs: Introduce btrfs_raid_array and related infrastructures
  2018-02-02  8:19 [PATCH 0/7] Chunk allocator unification Qu Wenruo
                   ` (2 preceding siblings ...)
  2018-02-02  8:19 ` [PATCH 3/7] btrfs-progs: Make btrfs_alloc_chunk to handle block group creation Qu Wenruo
@ 2018-02-02  8:19 ` Qu Wenruo
  2018-02-02 11:37   ` Nikolay Borisov
  2018-02-02  8:19 ` [PATCH 5/7] btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum hole size Qu Wenruo
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  8:19 UTC (permalink / raw)
  To: linux-btrfs, dsterba

As part of the effort to unify code and behavior between btrfs-progs and
kernel, copy the btrfs_raid_array from kernel to btrfs-progs.

So later we can use the btrfs_raid_array[] to get needed raid info other
than manually do if-else branches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 ctree.h   | 12 +++++++++++-
 volumes.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 volumes.h | 30 +++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/ctree.h b/ctree.h
index 17cdac76c58c..c76849d8deb7 100644
--- a/ctree.h
+++ b/ctree.h
@@ -958,7 +958,17 @@ struct btrfs_csum_item {
 #define BTRFS_BLOCK_GROUP_RAID5    	(1ULL << 7)
 #define BTRFS_BLOCK_GROUP_RAID6    	(1ULL << 8)
 #define BTRFS_BLOCK_GROUP_RESERVED	BTRFS_AVAIL_ALLOC_BIT_SINGLE
-#define BTRFS_NR_RAID_TYPES             7
+
+enum btrfs_raid_types {
+	BTRFS_RAID_RAID10,
+	BTRFS_RAID_RAID1,
+	BTRFS_RAID_DUP,
+	BTRFS_RAID_RAID0,
+	BTRFS_RAID_SINGLE,
+	BTRFS_RAID_RAID5,
+	BTRFS_RAID_RAID6,
+	BTRFS_NR_RAID_TYPES
+};
 
 #define BTRFS_BLOCK_GROUP_TYPE_MASK	(BTRFS_BLOCK_GROUP_DATA |    \
 					 BTRFS_BLOCK_GROUP_SYSTEM |  \
diff --git a/volumes.c b/volumes.c
index a9dc8c939dc5..b47ff1f392b5 100644
--- a/volumes.c
+++ b/volumes.c
@@ -30,6 +30,72 @@
 #include "utils.h"
 #include "kernel-lib/raid56.h"
 
+const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
+	[BTRFS_RAID_RAID10] = {
+		.sub_stripes	= 2,
+		.dev_stripes	= 1,
+		.devs_max	= 0,	/* 0 == as many as possible */
+		.devs_min	= 4,
+		.tolerated_failures = 1,
+		.devs_increment	= 2,
+		.ncopies	= 2,
+	},
+	[BTRFS_RAID_RAID1] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 2,
+		.devs_min	= 2,
+		.tolerated_failures = 1,
+		.devs_increment	= 2,
+		.ncopies	= 2,
+	},
+	[BTRFS_RAID_DUP] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 2,
+		.devs_max	= 1,
+		.devs_min	= 1,
+		.tolerated_failures = 0,
+		.devs_increment	= 1,
+		.ncopies	= 2,
+	},
+	[BTRFS_RAID_RAID0] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 0,
+		.devs_min	= 2,
+		.tolerated_failures = 0,
+		.devs_increment	= 1,
+		.ncopies	= 1,
+	},
+	[BTRFS_RAID_SINGLE] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 1,
+		.devs_min	= 1,
+		.tolerated_failures = 0,
+		.devs_increment	= 1,
+		.ncopies	= 1,
+	},
+	[BTRFS_RAID_RAID5] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 0,
+		.devs_min	= 2,
+		.tolerated_failures = 1,
+		.devs_increment	= 1,
+		.ncopies	= 2,
+	},
+	[BTRFS_RAID_RAID6] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 0,
+		.devs_min	= 3,
+		.tolerated_failures = 2,
+		.devs_increment	= 1,
+		.ncopies	= 3,
+	},
+};
+
 struct stripe {
 	struct btrfs_device *dev;
 	u64 physical;
diff --git a/volumes.h b/volumes.h
index 7bbdf615d31a..612a0a7586f4 100644
--- a/volumes.h
+++ b/volumes.h
@@ -108,6 +108,36 @@ struct map_lookup {
 	struct btrfs_bio_stripe stripes[];
 };
 
+struct btrfs_raid_attr {
+	int sub_stripes;	/* sub_stripes info for map */
+	int dev_stripes;	/* stripes per dev */
+	int devs_max;		/* max devs to use */
+	int devs_min;		/* min devs needed */
+	int tolerated_failures; /* max tolerated fail devs */
+	int devs_increment;	/* ndevs has to be a multiple of this */
+	int ncopies;		/* how many copies to data has */
+};
+
+extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES];
+
+static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags)
+{
+	if (flags & BTRFS_BLOCK_GROUP_RAID10)
+		return BTRFS_RAID_RAID10;
+	else if (flags & BTRFS_BLOCK_GROUP_RAID1)
+		return BTRFS_RAID_RAID1;
+	else if (flags & BTRFS_BLOCK_GROUP_DUP)
+		return BTRFS_RAID_DUP;
+	else if (flags & BTRFS_BLOCK_GROUP_RAID0)
+		return BTRFS_RAID_RAID0;
+	else if (flags & BTRFS_BLOCK_GROUP_RAID5)
+		return BTRFS_RAID_RAID5;
+	else if (flags & BTRFS_BLOCK_GROUP_RAID6)
+		return BTRFS_RAID_RAID6;
+
+	return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */
+}
+
 #define btrfs_multi_bio_size(n) (sizeof(struct btrfs_multi_bio) + \
 			    (sizeof(struct btrfs_bio_stripe) * (n)))
 #define btrfs_map_lookup_size(n) (sizeof(struct map_lookup) + \
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum hole size
  2018-02-02  8:19 [PATCH 0/7] Chunk allocator unification Qu Wenruo
                   ` (3 preceding siblings ...)
  2018-02-02  8:19 ` [PATCH 4/7] btrfs-progs: Introduce btrfs_raid_array and related infrastructures Qu Wenruo
@ 2018-02-02  8:19 ` Qu Wenruo
  2018-02-02 11:41   ` Nikolay Borisov
  2018-02-02  8:19 ` [PATCH 6/7] btrfs-progs: kernel-lib: Port kernel sort() to btrfs-progs Qu Wenruo
  2018-02-02  8:19 ` [PATCH 7/7] btrfs-progs: volumes: Unify free dev extent search behavior between kernel and btrfs-progs Qu Wenruo
  6 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  8:19 UTC (permalink / raw)
  To: linux-btrfs, dsterba

Just as kernel find_free_dev_extent(), allow it to return maximum hole
size for us to build device list for later chunk allocator rework.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 volumes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/volumes.c b/volumes.c
index b47ff1f392b5..f4009ffa7c9e 100644
--- a/volumes.c
+++ b/volumes.c
@@ -516,10 +516,10 @@ out:
 }
 
 static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
-				u64 *start)
+				u64 *start, u64 *len)
 {
 	/* FIXME use last free of some kind */
-	return find_free_dev_extent_start(device, num_bytes, 0, start, NULL);
+	return find_free_dev_extent_start(device, num_bytes, 0, start, len);
 }
 
 static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
@@ -543,7 +543,7 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
 	 * is responsible to make sure it's free.
 	 */
 	if (!convert) {
-		ret = find_free_dev_extent(device, num_bytes, start);
+		ret = find_free_dev_extent(device, num_bytes, start, NULL);
 		if (ret)
 			goto err;
 	}
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] btrfs-progs: kernel-lib: Port kernel sort() to btrfs-progs
  2018-02-02  8:19 [PATCH 0/7] Chunk allocator unification Qu Wenruo
                   ` (4 preceding siblings ...)
  2018-02-02  8:19 ` [PATCH 5/7] btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum hole size Qu Wenruo
@ 2018-02-02  8:19 ` Qu Wenruo
  2018-02-02  8:19 ` [PATCH 7/7] btrfs-progs: volumes: Unify free dev extent search behavior between kernel and btrfs-progs Qu Wenruo
  6 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  8:19 UTC (permalink / raw)
  To: linux-btrfs, dsterba

Used by later btrfs_alloc_chunk() rework.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Makefile          |   3 +-
 kernel-lib/sort.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel-lib/sort.h |  16 +++++++++
 3 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 kernel-lib/sort.c
 create mode 100644 kernel-lib/sort.h

diff --git a/Makefile b/Makefile
index 327cdfa08eba..3db7bbe04662 100644
--- a/Makefile
+++ b/Makefile
@@ -106,7 +106,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
 	  qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
 	  kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
 	  inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
-	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
+	  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
+	  kernel-lib/sort.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
diff --git a/kernel-lib/sort.c b/kernel-lib/sort.c
new file mode 100644
index 000000000000..70ae3dbe2852
--- /dev/null
+++ b/kernel-lib/sort.c
@@ -0,0 +1,104 @@
+/*
+ * taken from linux kernel lib/sort.c, removed kernel config code and adapted
+ * for btrfsprogs
+ */
+
+#include "sort.h"
+
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A fast, small, non-recursive O(nlog n) sort for the Linux kernel
+ *
+ * Jan 23 2005  Matt Mackall <mpm@selenic.com>
+ */
+
+static int alignment_ok(const void *base, int align)
+{
+	return ((unsigned long)base & (align - 1)) == 0;
+}
+
+static void u32_swap(void *a, void *b, int size)
+{
+	u32 t = *(u32 *)a;
+	*(u32 *)a = *(u32 *)b;
+	*(u32 *)b = t;
+}
+
+static void u64_swap(void *a, void *b, int size)
+{
+	u64 t = *(u64 *)a;
+	*(u64 *)a = *(u64 *)b;
+	*(u64 *)b = t;
+}
+
+static void generic_swap(void *a, void *b, int size)
+{
+	char t;
+
+	do {
+		t = *(char *)a;
+		*(char *)a++ = *(char *)b;
+		*(char *)b++ = t;
+	} while (--size > 0);
+}
+
+/**
+ * sort - sort an array of elements
+ * @base: pointer to data to sort
+ * @num: number of elements
+ * @size: size of each element
+ * @cmp_func: pointer to comparison function
+ * @swap_func: pointer to swap function or NULL
+ *
+ * This function does a heapsort on the given array. You may provide a
+ * swap_func function optimized to your element type.
+ *
+ * Sorting time is O(n log n) both on average and worst-case. While
+ * qsort is about 20% faster on average, it suffers from exploitable
+ * O(n*n) worst-case behavior and extra memory requirements that make
+ * it less suitable for kernel use.
+ */
+
+void sort(void *base, size_t num, size_t size,
+	  int (*cmp_func)(const void *, const void *),
+	  void (*swap_func)(void *, void *, int size))
+{
+	/* pre-scale counters for performance */
+	int i = (num/2 - 1) * size, n = num * size, c, r;
+
+	if (!swap_func) {
+		if (size == 4 && alignment_ok(base, 4))
+			swap_func = u32_swap;
+		else if (size == 8 && alignment_ok(base, 8))
+			swap_func = u64_swap;
+		else
+			swap_func = generic_swap;
+	}
+
+	/* heapify */
+	for ( ; i >= 0; i -= size) {
+		for (r = i; r * 2 + size < n; r  = c) {
+			c = r * 2 + size;
+			if (c < n - size &&
+					cmp_func(base + c, base + c + size) < 0)
+				c += size;
+			if (cmp_func(base + r, base + c) >= 0)
+				break;
+			swap_func(base + r, base + c, size);
+		}
+	}
+
+	/* sort */
+	for (i = n - size; i > 0; i -= size) {
+		swap_func(base, base + i, size);
+		for (r = 0; r * 2 + size < i; r = c) {
+			c = r * 2 + size;
+			if (c < i - size &&
+					cmp_func(base + c, base + c + size) < 0)
+				c += size;
+			if (cmp_func(base + r, base + c) >= 0)
+				break;
+			swap_func(base + r, base + c, size);
+		}
+	}
+}
diff --git a/kernel-lib/sort.h b/kernel-lib/sort.h
new file mode 100644
index 000000000000..9355e01248f2
--- /dev/null
+++ b/kernel-lib/sort.h
@@ -0,0 +1,16 @@
+/*
+ * taken from linux kernel include/list_sort.h
+ * changed include to kerncompat.h
+ */
+
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SORT_H
+#define _LINUX_SORT_H
+
+#include "kerncompat.h"
+
+void sort(void *base, size_t num, size_t size,
+	  int (*cmp)(const void *, const void *),
+	  void (*swap)(void *, void *, int));
+
+#endif
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] btrfs-progs: volumes: Unify free dev extent search behavior between kernel and btrfs-progs.
  2018-02-02  8:19 [PATCH 0/7] Chunk allocator unification Qu Wenruo
                   ` (5 preceding siblings ...)
  2018-02-02  8:19 ` [PATCH 6/7] btrfs-progs: kernel-lib: Port kernel sort() to btrfs-progs Qu Wenruo
@ 2018-02-02  8:19 ` Qu Wenruo
  6 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  8:19 UTC (permalink / raw)
  To: linux-btrfs, dsterba

As preparation to create libbtrfs which shares code between kernel and
btrfs, this patch mainly unifies the search for free device extents.

The main modifications are:

1) Search for free device extent
   Use the kernel method, by sorting the devices by its max hole
   capability, and use that sorted result to determine stripe size
   and chunk size.

   The previous method, which uses available bytes of each device to
   search, can't handle scattered small holes in each device.

2) Chunk/stripe minimal size
   Remove the minimal size requirement.
   Now the real minimal stripe size is 64K (BTRFS_STRIPE_LEN), the same
   as kernel one.

   While up limit is still kept as is, and minimal device size is still
   kept for a while.
   But since we no longer have strange minimal stripe size limit,
   existing minimal device size calculation won't cause any problem.

3) How to insert device extents
   Although not the same as kernel, here we follow kernel behavior to
   delay dev extents insert.

   There is plan to follow kernel __btrfs_alloc_chunk() to make it only
   handle chunk mapping allocation, while do nothing with tree
   operation.

4) Usage of btrfs_raid_array[]
   Which makes a lot of old if-else branches disappear.

There are still a lot of work to do (both kernel and btrfs-progs) before
we could starting extracting code into libbtrfs, but this should make
libbtrfs inside our reach.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 kerncompat.h |   5 +
 volumes.c    | 611 ++++++++++++++++++++++++++---------------------------------
 volumes.h    |   7 +
 3 files changed, 281 insertions(+), 342 deletions(-)

diff --git a/kerncompat.h b/kerncompat.h
index fa96715fb70c..658d28ed0792 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -285,6 +285,7 @@ static inline int IS_ERR_OR_NULL(const void *ptr)
  */
 #define kmalloc(x, y) malloc(x)
 #define kzalloc(x, y) calloc(1, x)
+#define kcalloc(x, y) calloc(x, y)
 #define kstrdup(x, y) strdup(x)
 #define kfree(x) free(x)
 #define vmalloc(x) malloc(x)
@@ -394,4 +395,8 @@ struct __una_u64 { __le64 x; } __attribute__((__packed__));
 #define noinline
 #endif
 
+static inline u64 div_u64(u64 dividend, u32 divisor)
+{
+	return dividend / ((u64) divisor);
+}
 #endif
diff --git a/volumes.c b/volumes.c
index f4009ffa7c9e..d14f46f8496e 100644
--- a/volumes.c
+++ b/volumes.c
@@ -29,6 +29,7 @@
 #include "volumes.h"
 #include "utils.h"
 #include "kernel-lib/raid56.h"
+#include "kernel-lib/sort.h"
 
 const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 	[BTRFS_RAID_RAID10] = {
@@ -522,55 +523,55 @@ static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 	return find_free_dev_extent_start(device, num_bytes, 0, start, len);
 }
 
-static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
-				  struct btrfs_device *device,
-				  u64 chunk_offset, u64 num_bytes, u64 *start,
-				  int convert)
+static int btrfs_insert_dev_extents(struct btrfs_trans_handle *trans,
+				    struct btrfs_fs_info *fs_info,
+				    struct map_lookup *map, u64 stripe_size)
 {
-	int ret;
-	struct btrfs_path *path;
-	struct btrfs_root *root = device->dev_root;
+	int ret = 0;
+	struct btrfs_path path;
+	struct btrfs_root *root = fs_info->dev_root;
 	struct btrfs_dev_extent *extent;
 	struct extent_buffer *leaf;
 	struct btrfs_key key;
+	int i;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
+	btrfs_init_path(&path);
 
-	/*
-	 * For convert case, just skip search free dev_extent, as caller
-	 * is responsible to make sure it's free.
-	 */
-	if (!convert) {
-		ret = find_free_dev_extent(device, num_bytes, start, NULL);
-		if (ret)
-			goto err;
-	}
+	for (i = 0; i < map->num_stripes; i++) {
+		struct btrfs_device *device = map->stripes[i].dev;
 
-	key.objectid = device->devid;
-	key.offset = *start;
-	key.type = BTRFS_DEV_EXTENT_KEY;
-	ret = btrfs_insert_empty_item(trans, root, path, &key,
-				      sizeof(*extent));
-	BUG_ON(ret);
+		key.objectid = device->devid;
+		key.offset = map->stripes[i].physical;
+		key.type = BTRFS_DEV_EXTENT_KEY;
 
-	leaf = path->nodes[0];
-	extent = btrfs_item_ptr(leaf, path->slots[0],
-				struct btrfs_dev_extent);
-	btrfs_set_dev_extent_chunk_tree(leaf, extent, BTRFS_CHUNK_TREE_OBJECTID);
-	btrfs_set_dev_extent_chunk_objectid(leaf, extent,
-					    BTRFS_FIRST_CHUNK_TREE_OBJECTID);
-	btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset);
-
-	write_extent_buffer(leaf, root->fs_info->chunk_tree_uuid,
-		    (unsigned long)btrfs_dev_extent_chunk_tree_uuid(extent),
-		    BTRFS_UUID_SIZE);
-
-	btrfs_set_dev_extent_length(leaf, extent, num_bytes);
-	btrfs_mark_buffer_dirty(leaf);
-err:
-	btrfs_free_path(path);
+		ret = btrfs_insert_empty_item(trans, root, &path, &key,
+					      sizeof(*extent));
+		if (ret < 0)
+			goto out;
+		leaf = path.nodes[0];
+		extent = btrfs_item_ptr(leaf, path.slots[0],
+					struct btrfs_dev_extent);
+		btrfs_set_dev_extent_chunk_tree(leaf, extent,
+					BTRFS_CHUNK_TREE_OBJECTID);
+		btrfs_set_dev_extent_chunk_objectid(leaf, extent,
+					BTRFS_FIRST_CHUNK_TREE_OBJECTID);
+		btrfs_set_dev_extent_chunk_offset(leaf, extent, map->ce.start);
+
+		write_extent_buffer(leaf, fs_info->chunk_tree_uuid,
+			(unsigned long)btrfs_dev_extent_chunk_tree_uuid(extent),
+			BTRFS_UUID_SIZE);
+
+		btrfs_set_dev_extent_length(leaf, extent, stripe_size);
+		btrfs_mark_buffer_dirty(leaf);
+		btrfs_release_path(&path);
+
+		device->bytes_used += stripe_size;
+		ret = btrfs_update_device(trans, device);
+		if (ret < 0)
+			goto out;
+	}
+out:
+	btrfs_release_path(&path);
 	return ret;
 }
 
@@ -782,114 +783,23 @@ int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 	return 0;
 }
 
-static u64 chunk_bytes_by_type(u64 type, u64 calc_size, int num_stripes,
-			       int sub_stripes)
-{
-	if (type & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP))
-		return calc_size;
-	else if (type & BTRFS_BLOCK_GROUP_RAID10)
-		return calc_size * (num_stripes / sub_stripes);
-	else if (type & BTRFS_BLOCK_GROUP_RAID5)
-		return calc_size * (num_stripes - 1);
-	else if (type & BTRFS_BLOCK_GROUP_RAID6)
-		return calc_size * (num_stripes - 2);
-	else
-		return calc_size * num_stripes;
-}
-
-
-static u32 find_raid56_stripe_len(u32 data_devices, u32 dev_stripe_target)
-{
-	/* TODO, add a way to store the preferred stripe size */
-	return BTRFS_STRIPE_LEN;
-}
-
 /*
- * btrfs_device_avail_bytes - count bytes available for alloc_chunk
- *
- * It is not equal to "device->total_bytes - device->bytes_used".
- * We do not allocate any chunk in 1M at beginning of device, and not
- * allowed to allocate any chunk before alloc_start if it is specified.
- * So search holes from max(1M, alloc_start) to device->total_bytes.
+ * sort the devices in descending order by max_avail, total_avail
  */
-static int btrfs_device_avail_bytes(struct btrfs_trans_handle *trans,
-				    struct btrfs_device *device,
-				    u64 *avail_bytes)
+static int btrfs_cmp_device_info(const void *a, const void *b)
 {
-	struct btrfs_path *path;
-	struct btrfs_root *root = device->dev_root;
-	struct btrfs_key key;
-	struct btrfs_dev_extent *dev_extent = NULL;
-	struct extent_buffer *l;
-	u64 search_start = root->fs_info->alloc_start;
-	u64 search_end = device->total_bytes;
-	u64 extent_end = 0;
-	u64 free_bytes = 0;
-	int ret;
-	int slot = 0;
-
-	search_start = max(BTRFS_BLOCK_RESERVED_1M_FOR_SUPER, search_start);
+	const struct btrfs_device_info *di_a = a;
+	const struct btrfs_device_info *di_b = b;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
-	key.objectid = device->devid;
-	key.offset = root->fs_info->alloc_start;
-	key.type = BTRFS_DEV_EXTENT_KEY;
-
-	path->reada = 2;
-	ret = btrfs_search_slot(trans, root, &key, path, 0, 0);
-	if (ret < 0)
-		goto error;
-	ret = btrfs_previous_item(root, path, 0, key.type);
-	if (ret < 0)
-		goto error;
-
-	while (1) {
-		l = path->nodes[0];
-		slot = path->slots[0];
-		if (slot >= btrfs_header_nritems(l)) {
-			ret = btrfs_next_leaf(root, path);
-			if (ret == 0)
-				continue;
-			if (ret < 0)
-				goto error;
-			break;
-		}
-		btrfs_item_key_to_cpu(l, &key, slot);
-
-		if (key.objectid < device->devid)
-			goto next;
-		if (key.objectid > device->devid)
-			break;
-		if (key.type != BTRFS_DEV_EXTENT_KEY)
-			goto next;
-		if (key.offset > search_end)
-			break;
-		if (key.offset > search_start)
-			free_bytes += key.offset - search_start;
-
-		dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent);
-		extent_end = key.offset + btrfs_dev_extent_length(l,
-								  dev_extent);
-		if (extent_end > search_start)
-			search_start = extent_end;
-		if (search_start > search_end)
-			break;
-next:
-		path->slots[0]++;
-		cond_resched();
-	}
-
-	if (search_start < search_end)
-		free_bytes += search_end - search_start;
-
-	*avail_bytes = free_bytes;
-	ret = 0;
-error:
-	btrfs_free_path(path);
-	return ret;
+	if (di_a->max_avail > di_b->max_avail)
+		return -1;
+	if (di_a->max_avail < di_b->max_avail)
+		return 1;
+	if (di_a->total_avail > di_b->total_avail)
+		return -1;
+	if (di_a->total_avail < di_b->total_avail)
+		return 1;
+	return 0;
 }
 
 #define BTRFS_MAX_DEVS(info) ((BTRFS_LEAF_DATA_SIZE(info)	\
@@ -910,46 +820,60 @@ error:
  * @start:	return value of allocated chunk start bytenr.
  * @num_bytes:	return value of allocated chunk size
  * @type:	chunk type (including both profile and type)
- * @convert:	if the chunk is allocated for convert case.
- * 		If @convert is true, chunk allocator will skip device extent
- * 		search, but use *start and *num_bytes as chunk start/num_bytes
- * 		and devive offset, to build a 1:1 chunk mapping for convert.
+ * @convert:   if the chunk is allocated for convert case.
+ *             If @convert is true, chunk allocator will skip device extent
+ *             search, but use *start and *num_bytes as chunk start/num_bytes
+ *             and devive offset, to build a 1:1 chunk mapping for convert.
  */
 int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		      struct btrfs_fs_info *info, u64 *start,
 		      u64 *num_bytes, u64 type, bool convert)
 {
-	u64 dev_offset;
 	struct btrfs_root *extent_root = info->extent_root;
 	struct btrfs_root *chunk_root = info->chunk_root;
-	struct btrfs_stripe *stripes;
 	struct btrfs_device *device = NULL;
 	struct btrfs_chunk *chunk;
-	struct list_head private_devs;
 	struct list_head *dev_list = &info->fs_devices->devices;
-	struct list_head *cur;
+	struct btrfs_stripe *stripe;
 	struct map_lookup *map;
-	int min_stripe_size = SZ_1M;
-	u64 calc_size = SZ_8M;
-	u64 min_free;
-	u64 max_chunk_size = 4 * calc_size;
-	u64 avail = 0;
-	u64 max_avail = 0;
+	struct btrfs_device_info *devices_info = NULL;
 	u64 percent_max;
-	int num_stripes = 1;
-	int max_stripes = 0;
-	int min_stripes = 1;
-	int sub_stripes = 0;
-	int looped = 0;
+	int num_stripes;
 	int ret;
+	int i;
+	int j;
 	int index;
+	int ndevs = 0;
+	int rw_devs = 0;
 	int stripe_len = BTRFS_STRIPE_LEN;
 	struct btrfs_key key;
 	u64 offset;
+	int data_stripes;	/* number of stripes that counts for bg size */
+	int sub_stripes;	/* sub_stripes info for map */
+	int dev_stripes;	/* stripes per dev */
+	int devs_max;		/* max devs to use */
+	int devs_min;		/* min devs needed */
+	int devs_increment;	/* ndevs has to be a multiple of this */
+	int ncopies;		/* how many copies to data has */
+	u64 max_stripe_size;	/* max physical size of each stripe */
+	u64 max_chunk_size;	/* max logical size of the block group */
+	u64 stripe_size;
 
 	if (list_empty(dev_list))
 		return -ENOSPC;
 
+	list_for_each_entry(device, dev_list, dev_list)
+		rw_devs++;
+
+	index = btrfs_bg_flags_to_raid_index(type);
+
+	sub_stripes = btrfs_raid_array[index].sub_stripes;
+	dev_stripes = btrfs_raid_array[index].dev_stripes;
+	devs_max = btrfs_raid_array[index].devs_max;
+	devs_min = btrfs_raid_array[index].devs_min;
+	devs_increment = btrfs_raid_array[index].devs_increment;
+	ncopies = btrfs_raid_array[index].ncopies;
+
 	if (convert) {
 		/* For convert, profile must be SINGLE */
 		if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
@@ -966,8 +890,10 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 				*num_bytes, info->sectorsize);
 			return -EINVAL;
 		}
-		calc_size = *num_bytes;
+		num_stripes = 1;
+		data_stripes = 1;
 		offset = *start;
+		stripe_size = *num_bytes;
 		/*
 		 * For convert, we use 1:1 chunk mapping specified by @start and
 		 * @num_bytes, so there is no need to go through dev_extent
@@ -979,70 +905,25 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	/*
 	 * Chunk size calculation part.
 	 */
-	if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
-		if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
-			calc_size = SZ_8M;
-			max_chunk_size = calc_size * 2;
-			min_stripe_size = SZ_1M;
-			max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
-		} else if (type & BTRFS_BLOCK_GROUP_DATA) {
-			calc_size = SZ_1G;
-			max_chunk_size = 10 * calc_size;
-			min_stripe_size = SZ_64M;
-			max_stripes = BTRFS_MAX_DEVS(info);
-		} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
-			calc_size = SZ_1G;
-			max_chunk_size = 4 * calc_size;
-			min_stripe_size = SZ_32M;
-			max_stripes = BTRFS_MAX_DEVS(info);
-		}
-	}
-	if (type & BTRFS_BLOCK_GROUP_RAID1) {
-		num_stripes = min_t(u64, 2,
-				  btrfs_super_num_devices(info->super_copy));
-		if (num_stripes < 2)
-			return -ENOSPC;
-		min_stripes = 2;
-	}
-	if (type & BTRFS_BLOCK_GROUP_DUP) {
-		num_stripes = 2;
-		min_stripes = 2;
-	}
-	if (type & (BTRFS_BLOCK_GROUP_RAID0)) {
-		num_stripes = btrfs_super_num_devices(info->super_copy);
-		if (num_stripes > max_stripes)
-			num_stripes = max_stripes;
-		min_stripes = 2;
-	}
-	if (type & (BTRFS_BLOCK_GROUP_RAID10)) {
-		num_stripes = btrfs_super_num_devices(info->super_copy);
-		if (num_stripes > max_stripes)
-			num_stripes = max_stripes;
-		if (num_stripes < 4)
-			return -ENOSPC;
-		num_stripes &= ~(u32)1;
-		sub_stripes = 2;
-		min_stripes = 4;
-	}
-	if (type & (BTRFS_BLOCK_GROUP_RAID5)) {
-		num_stripes = btrfs_super_num_devices(info->super_copy);
-		if (num_stripes > max_stripes)
-			num_stripes = max_stripes;
-		if (num_stripes < 2)
-			return -ENOSPC;
-		min_stripes = 2;
-		stripe_len = find_raid56_stripe_len(num_stripes - 1,
-				    btrfs_super_stripesize(info->super_copy));
-	}
-	if (type & (BTRFS_BLOCK_GROUP_RAID6)) {
-		num_stripes = btrfs_super_num_devices(info->super_copy);
-		if (num_stripes > max_stripes)
-			num_stripes = max_stripes;
-		if (num_stripes < 3)
-			return -ENOSPC;
-		min_stripes = 3;
-		stripe_len = find_raid56_stripe_len(num_stripes - 2,
-				    btrfs_super_stripesize(info->super_copy));
+	if (type & BTRFS_BLOCK_GROUP_DATA) {
+		max_stripe_size = SZ_1G;
+		max_chunk_size = 10 * max_stripe_size;
+		if (!devs_max)
+			devs_max = BTRFS_MAX_DEVS(info);
+	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
+		/* TODO: Support total_rw_bytes in fs_devices */
+		max_stripe_size = SZ_256M;
+		max_chunk_size = max_stripe_size;
+		if (!devs_max)
+			devs_max = BTRFS_MAX_DEVS(info);
+	} else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
+		max_stripe_size = SZ_32M;
+		max_chunk_size = 2 * max_stripe_size;
+		if (!devs_max)
+			devs_max = BTRFS_MAX_DEVS_SYS_CHUNK;
+	} else {
+		error("invalid chunk type 0x%llx requested", type);
+		return -EINVAL;
 	}
 
 	/* we don't want a chunk larger than 10% of the FS */
@@ -1052,64 +933,116 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	/*
 	 * Reserve space from each device.
 	 */
-again:
-	if (chunk_bytes_by_type(type, calc_size, num_stripes, sub_stripes) >
-	    max_chunk_size) {
-		calc_size = max_chunk_size;
-		calc_size /= num_stripes;
-		calc_size /= stripe_len;
-		calc_size *= stripe_len;
-	}
-	/* we don't want tiny stripes */
-	calc_size = max_t(u64, calc_size, min_stripe_size);
-
-	calc_size /= stripe_len;
-	calc_size *= stripe_len;
-	INIT_LIST_HEAD(&private_devs);
-	cur = dev_list->next;
-	index = 0;
-
-	if (type & BTRFS_BLOCK_GROUP_DUP)
-		min_free = calc_size * 2;
-	else
-		min_free = calc_size;
 
-	/* build a private list of devices we will allocate from */
-	while(index < num_stripes) {
-		device = list_entry(cur, struct btrfs_device, dev_list);
-		ret = btrfs_device_avail_bytes(trans, device, &avail);
-		if (ret)
-			return ret;
-		cur = cur->next;
-		if (avail >= min_free) {
-			list_move_tail(&device->dev_list,
-				       &private_devs);
-			index++;
-			if (type & BTRFS_BLOCK_GROUP_DUP)
-				index++;
-		} else if (avail > max_avail)
-			max_avail = avail;
-		if (cur == dev_list)
-			break;
-	}
-	if (index < num_stripes) {
-		list_splice(&private_devs, dev_list);
-		if (index >= min_stripes) {
-			num_stripes = index;
-			if (type & (BTRFS_BLOCK_GROUP_RAID10)) {
-				num_stripes /= sub_stripes;
-				num_stripes *= sub_stripes;
-			}
-			looped = 1;
-			goto again;
+	devices_info = kcalloc(rw_devs, sizeof(*devices_info));
+	if (!devices_info)
+		return -ENOMEM;
+	/*
+	 * in the first pass through the devices list, we gather information
+	 * about the available holes on each device.
+	 */
+	ndevs = 0;
+	list_for_each_entry(device, dev_list, dev_list) {
+		u64 max_avail;
+		u64 dev_offset;
+		u64 total_avail;
+
+		if (!device->writeable) {
+			warning("read-only device in alloc_list, devid=%llu",
+				device->devid);
+			continue;
 		}
-		if (!looped && max_avail > 0) {
-			looped = 1;
-			calc_size = max_avail;
-			goto again;
+
+		if (device->total_bytes > device->bytes_used)
+			total_avail = device->total_bytes - device->bytes_used;
+		else
+			total_avail = 0;
+
+		/* If there is no space on this device, skip it. */
+		if (total_avail == 0)
+			continue;
+
+		ret = find_free_dev_extent(device,
+					   max_stripe_size * dev_stripes,
+					   &dev_offset, &max_avail);
+		if (ret && ret != -ENOSPC)
+			goto out_devices_info;
+
+		if (ret == 0)
+			max_avail = max_stripe_size * dev_stripes;
+
+		if (max_avail < BTRFS_STRIPE_LEN * dev_stripes)
+			continue;
+
+		if (ndevs == rw_devs) {
+			warning("%s: found more than %u devices", __func__,
+				rw_devs);
+			break;
 		}
-		return -ENOSPC;
+		devices_info[ndevs].dev_offset = dev_offset;
+		devices_info[ndevs].max_avail = max_avail;
+		devices_info[ndevs].total_avail = total_avail;
+		devices_info[ndevs].dev = device;
+		++ndevs;
+	}
+
+	/*
+	 * now sort the devices by hole size / available space
+	 */
+	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
+	     btrfs_cmp_device_info, NULL);
+
+	/* round down to number of usable stripes */
+	ndevs = round_down(ndevs, devs_increment);
+
+	if (ndevs < devs_min) {
+		ret = -ENOSPC;
+		goto out_devices_info;
+	}
+
+	ndevs = min(ndevs, devs_max);
+
+	/*
+	 * the primary goal is to maximize the number of stripes, so use as many
+	 * devices as possible, even if the stripes are not maximum sized.
+	 */
+	stripe_size = devices_info[ndevs-1].max_avail;
+	num_stripes = ndevs * dev_stripes;
+
+	/*
+	 * this will have to be fixed for RAID1 and RAID10 over
+	 * more drives
+	 */
+	data_stripes = num_stripes / ncopies;
+
+	if (type & BTRFS_BLOCK_GROUP_RAID5)
+		data_stripes = num_stripes - 1;
+
+	if (type & BTRFS_BLOCK_GROUP_RAID6)
+		data_stripes = num_stripes - 2;
+
+	/*
+	 * Use the number of data stripes to figure out how big this chunk
+	 * is really going to be in terms of logical address space,
+	 * and compare that answer with the max chunk size
+	 */
+	if (stripe_size * data_stripes > max_chunk_size) {
+		stripe_size = div_u64(max_chunk_size, data_stripes);
+
+		/* bump the answer up to a 16MB boundary */
+		stripe_size = round_up(stripe_size, SZ_16M);
+
+		/*
+		 * but don't go higher than the limits we found
+		 * while searching for free extents
+		 */
+		stripe_size = min(devices_info[ndevs - 1].max_avail,
+				  stripe_size);
 	}
+	stripe_size = div_u64(stripe_size, dev_stripes);
+
+	/* align to BTRFS_STRIPE_LEN */
+	stripe_size = round_down(stripe_size, BTRFS_STRIPE_LEN);
 
 	/*
 	 * Fill chunk mapping and chunk stripes
@@ -1118,11 +1051,12 @@ alloc_chunk:
 	if (!convert) {
 		ret = find_next_chunk(info, &offset);
 		if (ret)
-			return ret;
+			goto out_devices_info;
 	}
 	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
 	key.type = BTRFS_CHUNK_ITEM_KEY;
 	key.offset = offset;
+	*num_bytes =  stripe_size * data_stripes;
 
 	chunk = kmalloc(btrfs_chunk_item_size(num_stripes), GFP_NOFS);
 	if (!chunk)
@@ -1133,55 +1067,52 @@ alloc_chunk:
 		kfree(chunk);
 		return -ENOMEM;
 	}
+	map->num_stripes = num_stripes;
 
-	stripes = &chunk->stripe;
-	*num_bytes = chunk_bytes_by_type(type, calc_size,
-					 num_stripes, sub_stripes);
-	index = 0;
-	while(index < num_stripes) {
-		struct btrfs_stripe *stripe;
-
-		if (!convert) {
-			if (list_empty(&private_devs))
-				return -ENODEV;
-			cur = private_devs.next;
-			device = list_entry(cur, struct btrfs_device, dev_list);
-			if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
-			    (index == num_stripes - 1)) {
-				/*
-				 * loop over this device again if we're doing a
-				 * dup group
-				 */
-				list_move_tail(&device->dev_list, dev_list);
-			}
-		} else {
-			/* Only SINGLE is accepted in convert case */
-			BUG_ON(num_stripes > 1);
-			device = list_entry(dev_list->next, struct btrfs_device,
-					    dev_list);
-			key.offset = *start;
-			dev_offset = *start;
-		}
-
-		ret = btrfs_alloc_dev_extent(trans, device, key.offset,
-			     calc_size, &dev_offset, convert);
-		if (ret < 0)
-			goto out_chunk_map;
-
-		device->bytes_used += calc_size;
-		ret = btrfs_update_device(trans, device);
-		if (ret < 0)
-			goto out_chunk_map;
+	if (convert) {
+		device = list_entry(dev_list->next, struct btrfs_device,
+				    dev_list);
 
-		map->stripes[index].dev = device;
-		map->stripes[index].physical = dev_offset;
-		stripe = stripes + index;
+		map->stripes[0].dev = device;
+		map->stripes[0].physical = *start;
+		stripe = &chunk->stripe;
 		btrfs_set_stack_stripe_devid(stripe, device->devid);
-		btrfs_set_stack_stripe_offset(stripe, dev_offset);
+		btrfs_set_stack_stripe_offset(stripe, *start);
 		memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
-		index++;
+	} else {
+		for (i = 0; i < ndevs; i++) {
+			for (j = 0; j < dev_stripes; ++j) {
+				int s = i * dev_stripes + j;
+
+				device = devices_info[i].dev;
+				map->stripes[s].dev = device;
+				map->stripes[s].physical =
+					devices_info[i].dev_offset +
+					j * stripe_size;
+				stripe = &chunk->stripe + s;
+				btrfs_set_stack_stripe_devid(stripe,
+						device->devid);
+				btrfs_set_stack_stripe_offset(stripe,
+						devices_info[i].dev_offset +
+						j * stripe_size);
+				memcpy(stripe->dev_uuid, device->uuid,
+						BTRFS_UUID_SIZE);
+			}
+		}
 	}
-	BUG_ON(!convert && !list_empty(&private_devs));
+	map->stripe_len = BTRFS_STRIPE_LEN;
+	map->io_width = BTRFS_STRIPE_LEN;
+	map->io_width = BTRFS_STRIPE_LEN;
+	map->type = type;
+	map->sub_stripes = sub_stripes;
+	map->sector_size = info->sectorsize;
+	map->ce.start = key.offset;
+	map->ce.size = *num_bytes;
+
+
+	ret = insert_cache_extent(&info->mapping_tree.cache_tree, &map->ce);
+	if (ret < 0)
+		goto out_chunk_map;
 
 	/* key was set above */
 	btrfs_set_stack_chunk_length(chunk, *num_bytes);
@@ -1193,13 +1124,6 @@ alloc_chunk:
 	btrfs_set_stack_chunk_io_width(chunk, stripe_len);
 	btrfs_set_stack_chunk_sector_size(chunk, info->sectorsize);
 	btrfs_set_stack_chunk_sub_stripes(chunk, sub_stripes);
-	map->sector_size = info->sectorsize;
-	map->stripe_len = stripe_len;
-	map->io_align = stripe_len;
-	map->io_width = stripe_len;
-	map->type = type;
-	map->num_stripes = num_stripes;
-	map->sub_stripes = sub_stripes;
 
 	/*
 	 * Insert chunk item and chunk mapping.
@@ -1209,13 +1133,6 @@ alloc_chunk:
 	BUG_ON(ret);
 	*start = key.offset;;
 
-	map->ce.start = key.offset;
-	map->ce.size = *num_bytes;
-
-	ret = insert_cache_extent(&info->mapping_tree.cache_tree, &map->ce);
-	if (ret < 0)
-		goto out_chunk_map;
-
 	if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
 		ret = btrfs_add_system_chunk(info, &key,
 				    chunk, btrfs_chunk_item_size(num_stripes));
@@ -1225,14 +1142,24 @@ alloc_chunk:
 
 	kfree(chunk);
 
+	/*
+	 * Insert device extents
+	 */
+	ret = btrfs_insert_dev_extents(trans, info, map, stripe_size);
+	if (ret < 0)
+		goto out_devices_info;
+
 	ret = btrfs_make_block_group(trans, info, 0, type, map->ce.start,
 				     map->ce.size);
+	kfree(devices_info);
 	return ret;
 
 out_chunk_map:
 	kfree(map);
 out_chunk:
 	kfree(chunk);
+out_devices_info:
+	kfree(devices_info);
 	return ret;
 }
 
diff --git a/volumes.h b/volumes.h
index 612a0a7586f4..3741d45cae80 100644
--- a/volumes.h
+++ b/volumes.h
@@ -108,6 +108,13 @@ struct map_lookup {
 	struct btrfs_bio_stripe stripes[];
 };
 
+struct btrfs_device_info {
+	struct btrfs_device *dev;
+	u64 dev_offset;
+	u64 max_avail;
+	u64 total_avail;
+};
+
 struct btrfs_raid_attr {
 	int sub_stripes;	/* sub_stripes info for map */
 	int dev_stripes;	/* stripes per dev */
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk
  2018-02-02  8:19 ` [PATCH 2/7] btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk Qu Wenruo
@ 2018-02-02  9:20   ` Su Yue
  2018-02-02  9:41     ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Su Yue @ 2018-02-02  9:20 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, dsterba, Qu Wenruo



On 02/02/2018 04:19 PM, Qu Wenruo wrote:
> We used to have two chunk allocators, btrfs_alloc_chunk() and
> btrfs_alloc_data_chunk(), the former is the more generic one, while the
> later is only used in mkfs and convert, to allocate SINGLE data chunk.
> 
> Although btrfs_alloc_data_chunk() has some special hacks to cooperate
> with convert, it's quite simple to integrity it into the generic chunk
> allocator.
> 
> So merge them into one btrfs_alloc_chunk(), with extra @convert
> parameter and necessary comment, to make code less duplicated and less
> thing to maintain.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   convert/main.c |   6 +-
>   extent-tree.c  |   2 +-
>   mkfs/main.c    |   8 +--
>   volumes.c      | 219 ++++++++++++++++++++++-----------------------------------
>   volumes.h      |   5 +-
>   5 files changed, 94 insertions(+), 146 deletions(-)
> 
> diff --git a/convert/main.c b/convert/main.c
> index b3ea31d7af43..b2444bb2ff21 100644
> --- a/convert/main.c
> +++ b/convert/main.c
> @@ -910,9 +910,9 @@ static int make_convert_data_block_groups(struct btrfs_trans_handle *trans,
>   
>   			len = min(max_chunk_size,
>   				  cache->start + cache->size - cur);
> -			ret = btrfs_alloc_data_chunk(trans, fs_info,
> -					&cur_backup, len,
> -					BTRFS_BLOCK_GROUP_DATA, 1);
> +			ret = btrfs_alloc_chunk(trans, fs_info,
> +					&cur_backup, &len,
> +					BTRFS_BLOCK_GROUP_DATA, true);
>   			if (ret < 0)
>   				break;
>   			ret = btrfs_make_block_group(trans, fs_info, 0,
> diff --git a/extent-tree.c b/extent-tree.c
> index e2ae74a7fe66..b085ab0352b3 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -1906,7 +1906,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>   		return 0;
>   
>   	ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
> -	                        space_info->flags);
> +	                        space_info->flags, false);
>   	if (ret == -ENOSPC) {
>   		space_info->full = 1;
>   		return 0;
> diff --git a/mkfs/main.c b/mkfs/main.c
> index ea65c6d897b2..358395ca0250 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -82,7 +82,7 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
>   		ret = btrfs_alloc_chunk(trans, fs_info,
>   					&chunk_start, &chunk_size,
>   					BTRFS_BLOCK_GROUP_METADATA |
> -					BTRFS_BLOCK_GROUP_DATA);
> +					BTRFS_BLOCK_GROUP_DATA, false);
>   		if (ret == -ENOSPC) {
>   			error("no space to allocate data/metadata chunk");
>   			goto err;
> @@ -99,7 +99,7 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
>   	} else {
>   		ret = btrfs_alloc_chunk(trans, fs_info,
>   					&chunk_start, &chunk_size,
> -					BTRFS_BLOCK_GROUP_METADATA);
> +					BTRFS_BLOCK_GROUP_METADATA, false);
>   		if (ret == -ENOSPC) {
>   			error("no space to allocate metadata chunk");
>   			goto err;
> @@ -133,7 +133,7 @@ static int create_data_block_groups(struct btrfs_trans_handle *trans,
>   	if (!mixed) {
>   		ret = btrfs_alloc_chunk(trans, fs_info,
>   					&chunk_start, &chunk_size,
> -					BTRFS_BLOCK_GROUP_DATA);
> +					BTRFS_BLOCK_GROUP_DATA, false);
>   		if (ret == -ENOSPC) {
>   			error("no space to allocate data chunk");
>   			goto err;
> @@ -241,7 +241,7 @@ static int create_one_raid_group(struct btrfs_trans_handle *trans,
>   	int ret;
>   
>   	ret = btrfs_alloc_chunk(trans, fs_info,
> -				&chunk_start, &chunk_size, type);
> +				&chunk_start, &chunk_size, type, false);
>   	if (ret == -ENOSPC) {
>   		error("not enough free space to allocate chunk");
>   		exit(1);
> diff --git a/volumes.c b/volumes.c
> index 677d085de96c..9ee4650351c3 100644
> --- a/volumes.c
> +++ b/volumes.c
> @@ -836,9 +836,23 @@ error:
>   				- 2 * sizeof(struct btrfs_chunk))	\
>   				/ sizeof(struct btrfs_stripe) + 1)
>   
> +/*
> + * Alloc a chunk, will insert dev extents, chunk item.
> + * NOTE: This function will not insert block group item nor mark newly
> + * allocated chunk available for later allocation.
> + * Block group item and free space update is handled by btrfs_make_block_group()
> + *
> + * @start:	return value of allocated chunk start bytenr.
> + * @num_bytes:	return value of allocated chunk size
> + * @type:	chunk type (including both profile and type)
> + * @convert:	if the chunk is allocated for convert case.
> + * 		If @convert is true, chunk allocator will skip device extent
> + * 		search, but use *start and *num_bytes as chunk start/num_bytes
> + * 		and devive offset, to build a 1:1 chunk mapping for convert.
> + */
>   int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   		      struct btrfs_fs_info *info, u64 *start,
> -		      u64 *num_bytes, u64 type)
> +		      u64 *num_bytes, u64 type, bool convert)
>   {
>   	u64 dev_offset;
>   	struct btrfs_root *extent_root = info->extent_root;
> @@ -868,10 +882,38 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   	struct btrfs_key key;
>   	u64 offset;
>   
> -	if (list_empty(dev_list)) {
> +	if (list_empty(dev_list))
>   		return -ENOSPC;
> +
> +	if (convert) {
> +		/* For convert, profile must be SINGLE */
> +		if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
Maybe BTRFS_RAID_SINGLE?

Thanks,
Su

> +			error("convert only suports SINGLE profile");
> +			return -EINVAL;
> +		}
> +		if (!IS_ALIGNED(*start, info->sectorsize)) {
> +			error("chunk start not aligned, start=%llu sectorsize=%u",
> +				*start, info->sectorsize);
> +			return -EINVAL;
> +		}
> +		if (!IS_ALIGNED(*num_bytes, info->sectorsize)) {
> +			error("chunk size not aligned, size=%llu sectorsize=%u",
> +				*num_bytes, info->sectorsize);
> +			return -EINVAL;
> +		}
> +		calc_size = *num_bytes;
> +		offset = *start;
> +		/*
> +		 * For convert, we use 1:1 chunk mapping specified by @start and
> +		 * @num_bytes, so there is no need to go through dev_extent
> +		 * searching.
> +		 */
> +		goto alloc_chunk;
>   	}
>   
> +	/*
> +	 * Chunk size calculation part.
> +	 */
>   	if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>   		if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
>   			calc_size = SZ_8M;
> @@ -942,6 +984,9 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   	percent_max = div_factor(btrfs_super_total_bytes(info->super_copy), 1);
>   	max_chunk_size = min(percent_max, max_chunk_size);
>   
> +	/*
> +	 * Reserve space from each device.
> +	 */
>   again:
>   	if (chunk_bytes_by_type(type, calc_size, num_stripes, sub_stripes) >
>   	    max_chunk_size) {
> @@ -972,7 +1017,8 @@ again:
>   			return ret;
>   		cur = cur->next;
>   		if (avail >= min_free) {
> -			list_move_tail(&device->dev_list, &private_devs);
> +			list_move_tail(&device->dev_list,
> +				       &private_devs);
>   			index++;
>   			if (type & BTRFS_BLOCK_GROUP_DUP)
>   				index++;
> @@ -999,9 +1045,16 @@ again:
>   		}
>   		return -ENOSPC;
>   	}
> -	ret = find_next_chunk(info, &offset);
> -	if (ret)
> -		return ret;
> +
> +	/*
> +	 * Fill chunk mapping and chunk stripes
> +	 */
> +alloc_chunk:
> +	if (!convert) {
> +		ret = find_next_chunk(info, &offset);
> +		if (ret)
> +			return ret;
> +	}
>   	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
>   	key.type = BTRFS_CHUNK_ITEM_KEY;
>   	key.offset = offset;
> @@ -1022,17 +1075,31 @@ again:
>   	index = 0;
>   	while(index < num_stripes) {
>   		struct btrfs_stripe *stripe;
> -		BUG_ON(list_empty(&private_devs));
> -		cur = private_devs.next;
> -		device = list_entry(cur, struct btrfs_device, dev_list);
>   
> -		/* loop over this device again if we're doing a dup group */
> -		if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
> -		    (index == num_stripes - 1))
> -			list_move_tail(&device->dev_list, dev_list);
> +		if (!convert) {
> +			if (list_empty(&private_devs))
> +				return -ENODEV;
> +			cur = private_devs.next;
> +			device = list_entry(cur, struct btrfs_device, dev_list);
> +			if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
> +			    (index == num_stripes - 1)) {
> +				/*
> +				 * loop over this device again if we're doing a
> +				 * dup group
> +				 */
> +				list_move_tail(&device->dev_list, dev_list);
> +			}
> +		} else {
> +			/* Only SINGLE is accepted in convert case */
> +			BUG_ON(num_stripes > 1);
> +			device = list_entry(dev_list->next, struct btrfs_device,
> +					    dev_list);
> +			key.offset = *start;
> +			dev_offset = *start;
> +		}
>   
>   		ret = btrfs_alloc_dev_extent(trans, device, key.offset,
> -			     calc_size, &dev_offset, 0);
> +			     calc_size, &dev_offset, convert);
>   		if (ret < 0)
>   			goto out_chunk_map;
>   
> @@ -1049,7 +1116,7 @@ again:
>   		memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
>   		index++;
>   	}
> -	BUG_ON(!list_empty(&private_devs));
> +	BUG_ON(!convert && !list_empty(&private_devs));
>   
>   	/* key was set above */
>   	btrfs_set_stack_chunk_length(chunk, *num_bytes);
> @@ -1069,6 +1136,9 @@ again:
>   	map->num_stripes = num_stripes;
>   	map->sub_stripes = sub_stripes;
>   
> +	/*
> +	 * Insert chunk item and chunk mapping.
> +	 */
>   	ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
>   				btrfs_chunk_item_size(num_stripes));
>   	BUG_ON(ret);
> @@ -1098,125 +1168,6 @@ out_chunk:
>   	return ret;
>   }
>   
> -/*
> - * Alloc a DATA chunk with SINGLE profile.
> - *
> - * If 'convert' is set, it will alloc a chunk with 1:1 mapping
> - * (btrfs logical bytenr == on-disk bytenr)
> - * For that case, caller must make sure the chunk and dev_extent are not
> - * occupied.
> - */
> -int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
> -			   struct btrfs_fs_info *info, u64 *start,
> -			   u64 num_bytes, u64 type, int convert)
> -{
> -	u64 dev_offset;
> -	struct btrfs_root *extent_root = info->extent_root;
> -	struct btrfs_root *chunk_root = info->chunk_root;
> -	struct btrfs_stripe *stripes;
> -	struct btrfs_device *device = NULL;
> -	struct btrfs_chunk *chunk;
> -	struct list_head *dev_list = &info->fs_devices->devices;
> -	struct list_head *cur;
> -	struct map_lookup *map;
> -	u64 calc_size = SZ_8M;
> -	int num_stripes = 1;
> -	int sub_stripes = 0;
> -	int ret;
> -	int index;
> -	int stripe_len = BTRFS_STRIPE_LEN;
> -	struct btrfs_key key;
> -
> -	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> -	key.type = BTRFS_CHUNK_ITEM_KEY;
> -	if (convert) {
> -		if (*start != round_down(*start, info->sectorsize)) {
> -			error("DATA chunk start not sectorsize aligned: %llu",
> -					(unsigned long long)*start);
> -			return -EINVAL;
> -		}
> -		key.offset = *start;
> -		dev_offset = *start;
> -	} else {
> -		u64 tmp;
> -
> -		ret = find_next_chunk(info, &tmp);
> -		key.offset = tmp;
> -		if (ret)
> -			return ret;
> -	}
> -
> -	chunk = kmalloc(btrfs_chunk_item_size(num_stripes), GFP_NOFS);
> -	if (!chunk)
> -		return -ENOMEM;
> -
> -	map = kmalloc(btrfs_map_lookup_size(num_stripes), GFP_NOFS);
> -	if (!map) {
> -		kfree(chunk);
> -		return -ENOMEM;
> -	}
> -
> -	stripes = &chunk->stripe;
> -	calc_size = num_bytes;
> -
> -	index = 0;
> -	cur = dev_list->next;
> -	device = list_entry(cur, struct btrfs_device, dev_list);
> -
> -	while (index < num_stripes) {
> -		struct btrfs_stripe *stripe;
> -
> -		ret = btrfs_alloc_dev_extent(trans, device, key.offset,
> -			     calc_size, &dev_offset, convert);
> -		BUG_ON(ret);
> -
> -		device->bytes_used += calc_size;
> -		ret = btrfs_update_device(trans, device);
> -		BUG_ON(ret);
> -
> -		map->stripes[index].dev = device;
> -		map->stripes[index].physical = dev_offset;
> -		stripe = stripes + index;
> -		btrfs_set_stack_stripe_devid(stripe, device->devid);
> -		btrfs_set_stack_stripe_offset(stripe, dev_offset);
> -		memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
> -		index++;
> -	}
> -
> -	/* key was set above */
> -	btrfs_set_stack_chunk_length(chunk, num_bytes);
> -	btrfs_set_stack_chunk_owner(chunk, extent_root->root_key.objectid);
> -	btrfs_set_stack_chunk_stripe_len(chunk, stripe_len);
> -	btrfs_set_stack_chunk_type(chunk, type);
> -	btrfs_set_stack_chunk_num_stripes(chunk, num_stripes);
> -	btrfs_set_stack_chunk_io_align(chunk, stripe_len);
> -	btrfs_set_stack_chunk_io_width(chunk, stripe_len);
> -	btrfs_set_stack_chunk_sector_size(chunk, info->sectorsize);
> -	btrfs_set_stack_chunk_sub_stripes(chunk, sub_stripes);
> -	map->sector_size = info->sectorsize;
> -	map->stripe_len = stripe_len;
> -	map->io_align = stripe_len;
> -	map->io_width = stripe_len;
> -	map->type = type;
> -	map->num_stripes = num_stripes;
> -	map->sub_stripes = sub_stripes;
> -
> -	ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
> -				btrfs_chunk_item_size(num_stripes));
> -	BUG_ON(ret);
> -	if (!convert)
> -		*start = key.offset;
> -
> -	map->ce.start = key.offset;
> -	map->ce.size = num_bytes;
> -
> -	ret = insert_cache_extent(&info->mapping_tree.cache_tree, &map->ce);
> -	BUG_ON(ret);
> -
> -	kfree(chunk);
> -	return ret;
> -}
> -
>   int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
>   {
>   	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> diff --git a/volumes.h b/volumes.h
> index 11572e78c04f..7bbdf615d31a 100644
> --- a/volumes.h
> +++ b/volumes.h
> @@ -208,10 +208,7 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
>   int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
>   int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   		      struct btrfs_fs_info *fs_info, u64 *start,
> -		      u64 *num_bytes, u64 type);
> -int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
> -			   struct btrfs_fs_info *fs_info, u64 *start,
> -			   u64 num_bytes, u64 type, int convert);
> +		      u64 *num_bytes, u64 type, bool convert);
>   int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>   		       int flags);
>   int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk
  2018-02-02  9:20   ` Su Yue
@ 2018-02-02  9:41     ` Qu Wenruo
  2018-02-02  9:54       ` Su Yue
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02  9:41 UTC (permalink / raw)
  To: Su Yue, Qu Wenruo, linux-btrfs, dsterba


[-- Attachment #1.1: Type: text/plain, Size: 14809 bytes --]



On 2018年02月02日 17:20, Su Yue wrote:
> 
> 
> On 02/02/2018 04:19 PM, Qu Wenruo wrote:
>> We used to have two chunk allocators, btrfs_alloc_chunk() and
>> btrfs_alloc_data_chunk(), the former is the more generic one, while the
>> later is only used in mkfs and convert, to allocate SINGLE data chunk.
>>
>> Although btrfs_alloc_data_chunk() has some special hacks to cooperate
>> with convert, it's quite simple to integrity it into the generic chunk
>> allocator.
>>
>> So merge them into one btrfs_alloc_chunk(), with extra @convert
>> parameter and necessary comment, to make code less duplicated and less
>> thing to maintain.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   convert/main.c |   6 +-
>>   extent-tree.c  |   2 +-
>>   mkfs/main.c    |   8 +--
>>   volumes.c      | 219
>> ++++++++++++++++++++++-----------------------------------
[snip]
>> diff --git a/volumes.c b/volumes.c
>> index 677d085de96c..9ee4650351c3 100644
>> --- a/volumes.c
>> +++ b/volumes.c
>> @@ -836,9 +836,23 @@ error:
>>                   - 2 * sizeof(struct btrfs_chunk))    \
>>                   / sizeof(struct btrfs_stripe) + 1)
>>   +/*
>> + * Alloc a chunk, will insert dev extents, chunk item.
>> + * NOTE: This function will not insert block group item nor mark newly
>> + * allocated chunk available for later allocation.
>> + * Block group item and free space update is handled by
>> btrfs_make_block_group()
>> + *
>> + * @start:    return value of allocated chunk start bytenr.
>> + * @num_bytes:    return value of allocated chunk size
>> + * @type:    chunk type (including both profile and type)
>> + * @convert:    if the chunk is allocated for convert case.
>> + *         If @convert is true, chunk allocator will skip device extent
>> + *         search, but use *start and *num_bytes as chunk
>> start/num_bytes
>> + *         and devive offset, to build a 1:1 chunk mapping for convert.
>> + */
>>   int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>                 struct btrfs_fs_info *info, u64 *start,
>> -              u64 *num_bytes, u64 type)
>> +              u64 *num_bytes, u64 type, bool convert)
>>   {
>>       u64 dev_offset;
>>       struct btrfs_root *extent_root = info->extent_root;
>> @@ -868,10 +882,38 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle
>> *trans,
>>       struct btrfs_key key;
>>       u64 offset;
>>   -    if (list_empty(dev_list)) {
>> +    if (list_empty(dev_list))
>>           return -ENOSPC;
>> +
>> +    if (convert) {
>> +        /* For convert, profile must be SINGLE */
>> +        if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> Maybe BTRFS_RAID_SINGLE?

Here we're using type and raid index is not introduced yet, so what
we're doing is correct.

Thanks,
Qu

> 
> Thanks,
> Su
> 
>> +            error("convert only suports SINGLE profile");
>> +            return -EINVAL;
>> +        }
>> +        if (!IS_ALIGNED(*start, info->sectorsize)) {
>> +            error("chunk start not aligned, start=%llu sectorsize=%u",
>> +                *start, info->sectorsize);
>> +            return -EINVAL;
>> +        }
>> +        if (!IS_ALIGNED(*num_bytes, info->sectorsize)) {
>> +            error("chunk size not aligned, size=%llu sectorsize=%u",
>> +                *num_bytes, info->sectorsize);
>> +            return -EINVAL;
>> +        }
>> +        calc_size = *num_bytes;
>> +        offset = *start;
>> +        /*
>> +         * For convert, we use 1:1 chunk mapping specified by @start and
>> +         * @num_bytes, so there is no need to go through dev_extent
>> +         * searching.
>> +         */
>> +        goto alloc_chunk;
>>       }
>>   +    /*
>> +     * Chunk size calculation part.
>> +     */
>>       if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>>           if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
>>               calc_size = SZ_8M;
>> @@ -942,6 +984,9 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle
>> *trans,
>>       percent_max =
>> div_factor(btrfs_super_total_bytes(info->super_copy), 1);
>>       max_chunk_size = min(percent_max, max_chunk_size);
>>   +    /*
>> +     * Reserve space from each device.
>> +     */
>>   again:
>>       if (chunk_bytes_by_type(type, calc_size, num_stripes,
>> sub_stripes) >
>>           max_chunk_size) {
>> @@ -972,7 +1017,8 @@ again:
>>               return ret;
>>           cur = cur->next;
>>           if (avail >= min_free) {
>> -            list_move_tail(&device->dev_list, &private_devs);
>> +            list_move_tail(&device->dev_list,
>> +                       &private_devs);
>>               index++;
>>               if (type & BTRFS_BLOCK_GROUP_DUP)
>>                   index++;
>> @@ -999,9 +1045,16 @@ again:
>>           }
>>           return -ENOSPC;
>>       }
>> -    ret = find_next_chunk(info, &offset);
>> -    if (ret)
>> -        return ret;
>> +
>> +    /*
>> +     * Fill chunk mapping and chunk stripes
>> +     */
>> +alloc_chunk:
>> +    if (!convert) {
>> +        ret = find_next_chunk(info, &offset);
>> +        if (ret)
>> +            return ret;
>> +    }
>>       key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
>>       key.type = BTRFS_CHUNK_ITEM_KEY;
>>       key.offset = offset;
>> @@ -1022,17 +1075,31 @@ again:
>>       index = 0;
>>       while(index < num_stripes) {
>>           struct btrfs_stripe *stripe;
>> -        BUG_ON(list_empty(&private_devs));
>> -        cur = private_devs.next;
>> -        device = list_entry(cur, struct btrfs_device, dev_list);
>>   -        /* loop over this device again if we're doing a dup group */
>> -        if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
>> -            (index == num_stripes - 1))
>> -            list_move_tail(&device->dev_list, dev_list);
>> +        if (!convert) {
>> +            if (list_empty(&private_devs))
>> +                return -ENODEV;
>> +            cur = private_devs.next;
>> +            device = list_entry(cur, struct btrfs_device, dev_list);
>> +            if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
>> +                (index == num_stripes - 1)) {
>> +                /*
>> +                 * loop over this device again if we're doing a
>> +                 * dup group
>> +                 */
>> +                list_move_tail(&device->dev_list, dev_list);
>> +            }
>> +        } else {
>> +            /* Only SINGLE is accepted in convert case */
>> +            BUG_ON(num_stripes > 1);
>> +            device = list_entry(dev_list->next, struct btrfs_device,
>> +                        dev_list);
>> +            key.offset = *start;
>> +            dev_offset = *start;
>> +        }
>>             ret = btrfs_alloc_dev_extent(trans, device, key.offset,
>> -                 calc_size, &dev_offset, 0);
>> +                 calc_size, &dev_offset, convert);
>>           if (ret < 0)
>>               goto out_chunk_map;
>>   @@ -1049,7 +1116,7 @@ again:
>>           memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
>>           index++;
>>       }
>> -    BUG_ON(!list_empty(&private_devs));
>> +    BUG_ON(!convert && !list_empty(&private_devs));
>>         /* key was set above */
>>       btrfs_set_stack_chunk_length(chunk, *num_bytes);
>> @@ -1069,6 +1136,9 @@ again:
>>       map->num_stripes = num_stripes;
>>       map->sub_stripes = sub_stripes;
>>   +    /*
>> +     * Insert chunk item and chunk mapping.
>> +     */
>>       ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
>>                   btrfs_chunk_item_size(num_stripes));
>>       BUG_ON(ret);
>> @@ -1098,125 +1168,6 @@ out_chunk:
>>       return ret;
>>   }
>>   -/*
>> - * Alloc a DATA chunk with SINGLE profile.
>> - *
>> - * If 'convert' is set, it will alloc a chunk with 1:1 mapping
>> - * (btrfs logical bytenr == on-disk bytenr)
>> - * For that case, caller must make sure the chunk and dev_extent are not
>> - * occupied.
>> - */
>> -int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
>> -               struct btrfs_fs_info *info, u64 *start,
>> -               u64 num_bytes, u64 type, int convert)
>> -{
>> -    u64 dev_offset;
>> -    struct btrfs_root *extent_root = info->extent_root;
>> -    struct btrfs_root *chunk_root = info->chunk_root;
>> -    struct btrfs_stripe *stripes;
>> -    struct btrfs_device *device = NULL;
>> -    struct btrfs_chunk *chunk;
>> -    struct list_head *dev_list = &info->fs_devices->devices;
>> -    struct list_head *cur;
>> -    struct map_lookup *map;
>> -    u64 calc_size = SZ_8M;
>> -    int num_stripes = 1;
>> -    int sub_stripes = 0;
>> -    int ret;
>> -    int index;
>> -    int stripe_len = BTRFS_STRIPE_LEN;
>> -    struct btrfs_key key;
>> -
>> -    key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
>> -    key.type = BTRFS_CHUNK_ITEM_KEY;
>> -    if (convert) {
>> -        if (*start != round_down(*start, info->sectorsize)) {
>> -            error("DATA chunk start not sectorsize aligned: %llu",
>> -                    (unsigned long long)*start);
>> -            return -EINVAL;
>> -        }
>> -        key.offset = *start;
>> -        dev_offset = *start;
>> -    } else {
>> -        u64 tmp;
>> -
>> -        ret = find_next_chunk(info, &tmp);
>> -        key.offset = tmp;
>> -        if (ret)
>> -            return ret;
>> -    }
>> -
>> -    chunk = kmalloc(btrfs_chunk_item_size(num_stripes), GFP_NOFS);
>> -    if (!chunk)
>> -        return -ENOMEM;
>> -
>> -    map = kmalloc(btrfs_map_lookup_size(num_stripes), GFP_NOFS);
>> -    if (!map) {
>> -        kfree(chunk);
>> -        return -ENOMEM;
>> -    }
>> -
>> -    stripes = &chunk->stripe;
>> -    calc_size = num_bytes;
>> -
>> -    index = 0;
>> -    cur = dev_list->next;
>> -    device = list_entry(cur, struct btrfs_device, dev_list);
>> -
>> -    while (index < num_stripes) {
>> -        struct btrfs_stripe *stripe;
>> -
>> -        ret = btrfs_alloc_dev_extent(trans, device, key.offset,
>> -                 calc_size, &dev_offset, convert);
>> -        BUG_ON(ret);
>> -
>> -        device->bytes_used += calc_size;
>> -        ret = btrfs_update_device(trans, device);
>> -        BUG_ON(ret);
>> -
>> -        map->stripes[index].dev = device;
>> -        map->stripes[index].physical = dev_offset;
>> -        stripe = stripes + index;
>> -        btrfs_set_stack_stripe_devid(stripe, device->devid);
>> -        btrfs_set_stack_stripe_offset(stripe, dev_offset);
>> -        memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
>> -        index++;
>> -    }
>> -
>> -    /* key was set above */
>> -    btrfs_set_stack_chunk_length(chunk, num_bytes);
>> -    btrfs_set_stack_chunk_owner(chunk, extent_root->root_key.objectid);
>> -    btrfs_set_stack_chunk_stripe_len(chunk, stripe_len);
>> -    btrfs_set_stack_chunk_type(chunk, type);
>> -    btrfs_set_stack_chunk_num_stripes(chunk, num_stripes);
>> -    btrfs_set_stack_chunk_io_align(chunk, stripe_len);
>> -    btrfs_set_stack_chunk_io_width(chunk, stripe_len);
>> -    btrfs_set_stack_chunk_sector_size(chunk, info->sectorsize);
>> -    btrfs_set_stack_chunk_sub_stripes(chunk, sub_stripes);
>> -    map->sector_size = info->sectorsize;
>> -    map->stripe_len = stripe_len;
>> -    map->io_align = stripe_len;
>> -    map->io_width = stripe_len;
>> -    map->type = type;
>> -    map->num_stripes = num_stripes;
>> -    map->sub_stripes = sub_stripes;
>> -
>> -    ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
>> -                btrfs_chunk_item_size(num_stripes));
>> -    BUG_ON(ret);
>> -    if (!convert)
>> -        *start = key.offset;
>> -
>> -    map->ce.start = key.offset;
>> -    map->ce.size = num_bytes;
>> -
>> -    ret = insert_cache_extent(&info->mapping_tree.cache_tree, &map->ce);
>> -    BUG_ON(ret);
>> -
>> -    kfree(chunk);
>> -    return ret;
>> -}
>> -
>>   int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64
>> len)
>>   {
>>       struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>> diff --git a/volumes.h b/volumes.h
>> index 11572e78c04f..7bbdf615d31a 100644
>> --- a/volumes.h
>> +++ b/volumes.h
>> @@ -208,10 +208,7 @@ int btrfs_read_sys_array(struct btrfs_fs_info
>> *fs_info);
>>   int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
>>   int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>                 struct btrfs_fs_info *fs_info, u64 *start,
>> -              u64 *num_bytes, u64 type);
>> -int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
>> -               struct btrfs_fs_info *fs_info, u64 *start,
>> -               u64 num_bytes, u64 type, int convert);
>> +              u64 *num_bytes, u64 type, bool convert);
>>   int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>>                  int flags);
>>   int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk
  2018-02-02  9:41     ` Qu Wenruo
@ 2018-02-02  9:54       ` Su Yue
  0 siblings, 0 replies; 17+ messages in thread
From: Su Yue @ 2018-02-02  9:54 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs, dsterba



On 02/02/2018 05:41 PM, Qu Wenruo wrote:
> 
> 
> On 2018年02月02日 17:20, Su Yue wrote:
>>
>>
>> On 02/02/2018 04:19 PM, Qu Wenruo wrote:
>>> We used to have two chunk allocators, btrfs_alloc_chunk() and
>>> btrfs_alloc_data_chunk(), the former is the more generic one, while the
>>> later is only used in mkfs and convert, to allocate SINGLE data chunk.
>>>
>>> Although btrfs_alloc_data_chunk() has some special hacks to cooperate
>>> with convert, it's quite simple to integrity it into the generic chunk
>>> allocator.
>>>
>>> So merge them into one btrfs_alloc_chunk(), with extra @convert
>>> parameter and necessary comment, to make code less duplicated and less
>>> thing to maintain.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>    convert/main.c |   6 +-
>>>    extent-tree.c  |   2 +-
>>>    mkfs/main.c    |   8 +--
>>>    volumes.c      | 219
>>> ++++++++++++++++++++++-----------------------------------
> [snip]
>>> diff --git a/volumes.c b/volumes.c
>>> index 677d085de96c..9ee4650351c3 100644
>>> --- a/volumes.c
>>> +++ b/volumes.c
>>> @@ -836,9 +836,23 @@ error:
>>>                    - 2 * sizeof(struct btrfs_chunk))    \
>>>                    / sizeof(struct btrfs_stripe) + 1)
>>>    +/*
>>> + * Alloc a chunk, will insert dev extents, chunk item.
>>> + * NOTE: This function will not insert block group item nor mark newly
>>> + * allocated chunk available for later allocation.
>>> + * Block group item and free space update is handled by
>>> btrfs_make_block_group()
>>> + *
>>> + * @start:    return value of allocated chunk start bytenr.
>>> + * @num_bytes:    return value of allocated chunk size
>>> + * @type:    chunk type (including both profile and type)
>>> + * @convert:    if the chunk is allocated for convert case.
>>> + *         If @convert is true, chunk allocator will skip device extent
>>> + *         search, but use *start and *num_bytes as chunk
>>> start/num_bytes
>>> + *         and devive offset, to build a 1:1 chunk mapping for convert.
>>> + */
>>>    int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>>                  struct btrfs_fs_info *info, u64 *start,
>>> -              u64 *num_bytes, u64 type)
>>> +              u64 *num_bytes, u64 type, bool convert)
>>>    {
>>>        u64 dev_offset;
>>>        struct btrfs_root *extent_root = info->extent_root;
>>> @@ -868,10 +882,38 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle
>>> *trans,
>>>        struct btrfs_key key;
>>>        u64 offset;
>>>    -    if (list_empty(dev_list)) {
>>> +    if (list_empty(dev_list))
>>>            return -ENOSPC;
>>> +
>>> +    if (convert) {
>>> +        /* For convert, profile must be SINGLE */
>>> +        if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>> Maybe BTRFS_RAID_SINGLE?
> 
> Here we're using type and raid index is not introduced yet, so what
> we're doing is correct.
> 
OK.

Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>

> Thanks,
> Qu
> 
>>
>> Thanks,
>> Su
>>
>>> +            error("convert only suports SINGLE profile");
>>> +            return -EINVAL;
>>> +        }
>>> +        if (!IS_ALIGNED(*start, info->sectorsize)) {
>>> +            error("chunk start not aligned, start=%llu sectorsize=%u",
>>> +                *start, info->sectorsize);
>>> +            return -EINVAL;
>>> +        }
>>> +        if (!IS_ALIGNED(*num_bytes, info->sectorsize)) {
>>> +            error("chunk size not aligned, size=%llu sectorsize=%u",
>>> +                *num_bytes, info->sectorsize);
>>> +            return -EINVAL;
>>> +        }
>>> +        calc_size = *num_bytes;
>>> +        offset = *start;
>>> +        /*
>>> +         * For convert, we use 1:1 chunk mapping specified by @start and
>>> +         * @num_bytes, so there is no need to go through dev_extent
>>> +         * searching.
>>> +         */
>>> +        goto alloc_chunk;
>>>        }
>>>    +    /*
>>> +     * Chunk size calculation part.
>>> +     */
>>>        if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>>>            if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
>>>                calc_size = SZ_8M;
>>> @@ -942,6 +984,9 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle
>>> *trans,
>>>        percent_max =
>>> div_factor(btrfs_super_total_bytes(info->super_copy), 1);
>>>        max_chunk_size = min(percent_max, max_chunk_size);
>>>    +    /*
>>> +     * Reserve space from each device.
>>> +     */
>>>    again:
>>>        if (chunk_bytes_by_type(type, calc_size, num_stripes,
>>> sub_stripes) >
>>>            max_chunk_size) {
>>> @@ -972,7 +1017,8 @@ again:
>>>                return ret;
>>>            cur = cur->next;
>>>            if (avail >= min_free) {
>>> -            list_move_tail(&device->dev_list, &private_devs);
>>> +            list_move_tail(&device->dev_list,
>>> +                       &private_devs);
>>>                index++;
>>>                if (type & BTRFS_BLOCK_GROUP_DUP)
>>>                    index++;
>>> @@ -999,9 +1045,16 @@ again:
>>>            }
>>>            return -ENOSPC;
>>>        }
>>> -    ret = find_next_chunk(info, &offset);
>>> -    if (ret)
>>> -        return ret;
>>> +
>>> +    /*
>>> +     * Fill chunk mapping and chunk stripes
>>> +     */
>>> +alloc_chunk:
>>> +    if (!convert) {
>>> +        ret = find_next_chunk(info, &offset);
>>> +        if (ret)
>>> +            return ret;
>>> +    }
>>>        key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
>>>        key.type = BTRFS_CHUNK_ITEM_KEY;
>>>        key.offset = offset;
>>> @@ -1022,17 +1075,31 @@ again:
>>>        index = 0;
>>>        while(index < num_stripes) {
>>>            struct btrfs_stripe *stripe;
>>> -        BUG_ON(list_empty(&private_devs));
>>> -        cur = private_devs.next;
>>> -        device = list_entry(cur, struct btrfs_device, dev_list);
>>>    -        /* loop over this device again if we're doing a dup group */
>>> -        if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
>>> -            (index == num_stripes - 1))
>>> -            list_move_tail(&device->dev_list, dev_list);
>>> +        if (!convert) {
>>> +            if (list_empty(&private_devs))
>>> +                return -ENODEV;
>>> +            cur = private_devs.next;
>>> +            device = list_entry(cur, struct btrfs_device, dev_list);
>>> +            if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
>>> +                (index == num_stripes - 1)) {
>>> +                /*
>>> +                 * loop over this device again if we're doing a
>>> +                 * dup group
>>> +                 */
>>> +                list_move_tail(&device->dev_list, dev_list);
>>> +            }
>>> +        } else {
>>> +            /* Only SINGLE is accepted in convert case */
>>> +            BUG_ON(num_stripes > 1);
>>> +            device = list_entry(dev_list->next, struct btrfs_device,
>>> +                        dev_list);
>>> +            key.offset = *start;
>>> +            dev_offset = *start;
>>> +        }
>>>              ret = btrfs_alloc_dev_extent(trans, device, key.offset,
>>> -                 calc_size, &dev_offset, 0);
>>> +                 calc_size, &dev_offset, convert);
>>>            if (ret < 0)
>>>                goto out_chunk_map;
>>>    @@ -1049,7 +1116,7 @@ again:
>>>            memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
>>>            index++;
>>>        }
>>> -    BUG_ON(!list_empty(&private_devs));
>>> +    BUG_ON(!convert && !list_empty(&private_devs));
>>>          /* key was set above */
>>>        btrfs_set_stack_chunk_length(chunk, *num_bytes);
>>> @@ -1069,6 +1136,9 @@ again:
>>>        map->num_stripes = num_stripes;
>>>        map->sub_stripes = sub_stripes;
>>>    +    /*
>>> +     * Insert chunk item and chunk mapping.
>>> +     */
>>>        ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
>>>                    btrfs_chunk_item_size(num_stripes));
>>>        BUG_ON(ret);
>>> @@ -1098,125 +1168,6 @@ out_chunk:
>>>        return ret;
>>>    }
>>>    -/*
>>> - * Alloc a DATA chunk with SINGLE profile.
>>> - *
>>> - * If 'convert' is set, it will alloc a chunk with 1:1 mapping
>>> - * (btrfs logical bytenr == on-disk bytenr)
>>> - * For that case, caller must make sure the chunk and dev_extent are not
>>> - * occupied.
>>> - */
>>> -int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
>>> -               struct btrfs_fs_info *info, u64 *start,
>>> -               u64 num_bytes, u64 type, int convert)
>>> -{
>>> -    u64 dev_offset;
>>> -    struct btrfs_root *extent_root = info->extent_root;
>>> -    struct btrfs_root *chunk_root = info->chunk_root;
>>> -    struct btrfs_stripe *stripes;
>>> -    struct btrfs_device *device = NULL;
>>> -    struct btrfs_chunk *chunk;
>>> -    struct list_head *dev_list = &info->fs_devices->devices;
>>> -    struct list_head *cur;
>>> -    struct map_lookup *map;
>>> -    u64 calc_size = SZ_8M;
>>> -    int num_stripes = 1;
>>> -    int sub_stripes = 0;
>>> -    int ret;
>>> -    int index;
>>> -    int stripe_len = BTRFS_STRIPE_LEN;
>>> -    struct btrfs_key key;
>>> -
>>> -    key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
>>> -    key.type = BTRFS_CHUNK_ITEM_KEY;
>>> -    if (convert) {
>>> -        if (*start != round_down(*start, info->sectorsize)) {
>>> -            error("DATA chunk start not sectorsize aligned: %llu",
>>> -                    (unsigned long long)*start);
>>> -            return -EINVAL;
>>> -        }
>>> -        key.offset = *start;
>>> -        dev_offset = *start;
>>> -    } else {
>>> -        u64 tmp;
>>> -
>>> -        ret = find_next_chunk(info, &tmp);
>>> -        key.offset = tmp;
>>> -        if (ret)
>>> -            return ret;
>>> -    }
>>> -
>>> -    chunk = kmalloc(btrfs_chunk_item_size(num_stripes), GFP_NOFS);
>>> -    if (!chunk)
>>> -        return -ENOMEM;
>>> -
>>> -    map = kmalloc(btrfs_map_lookup_size(num_stripes), GFP_NOFS);
>>> -    if (!map) {
>>> -        kfree(chunk);
>>> -        return -ENOMEM;
>>> -    }
>>> -
>>> -    stripes = &chunk->stripe;
>>> -    calc_size = num_bytes;
>>> -
>>> -    index = 0;
>>> -    cur = dev_list->next;
>>> -    device = list_entry(cur, struct btrfs_device, dev_list);
>>> -
>>> -    while (index < num_stripes) {
>>> -        struct btrfs_stripe *stripe;
>>> -
>>> -        ret = btrfs_alloc_dev_extent(trans, device, key.offset,
>>> -                 calc_size, &dev_offset, convert);
>>> -        BUG_ON(ret);
>>> -
>>> -        device->bytes_used += calc_size;
>>> -        ret = btrfs_update_device(trans, device);
>>> -        BUG_ON(ret);
>>> -
>>> -        map->stripes[index].dev = device;
>>> -        map->stripes[index].physical = dev_offset;
>>> -        stripe = stripes + index;
>>> -        btrfs_set_stack_stripe_devid(stripe, device->devid);
>>> -        btrfs_set_stack_stripe_offset(stripe, dev_offset);
>>> -        memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
>>> -        index++;
>>> -    }
>>> -
>>> -    /* key was set above */
>>> -    btrfs_set_stack_chunk_length(chunk, num_bytes);
>>> -    btrfs_set_stack_chunk_owner(chunk, extent_root->root_key.objectid);
>>> -    btrfs_set_stack_chunk_stripe_len(chunk, stripe_len);
>>> -    btrfs_set_stack_chunk_type(chunk, type);
>>> -    btrfs_set_stack_chunk_num_stripes(chunk, num_stripes);
>>> -    btrfs_set_stack_chunk_io_align(chunk, stripe_len);
>>> -    btrfs_set_stack_chunk_io_width(chunk, stripe_len);
>>> -    btrfs_set_stack_chunk_sector_size(chunk, info->sectorsize);
>>> -    btrfs_set_stack_chunk_sub_stripes(chunk, sub_stripes);
>>> -    map->sector_size = info->sectorsize;
>>> -    map->stripe_len = stripe_len;
>>> -    map->io_align = stripe_len;
>>> -    map->io_width = stripe_len;
>>> -    map->type = type;
>>> -    map->num_stripes = num_stripes;
>>> -    map->sub_stripes = sub_stripes;
>>> -
>>> -    ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
>>> -                btrfs_chunk_item_size(num_stripes));
>>> -    BUG_ON(ret);
>>> -    if (!convert)
>>> -        *start = key.offset;
>>> -
>>> -    map->ce.start = key.offset;
>>> -    map->ce.size = num_bytes;
>>> -
>>> -    ret = insert_cache_extent(&info->mapping_tree.cache_tree, &map->ce);
>>> -    BUG_ON(ret);
>>> -
>>> -    kfree(chunk);
>>> -    return ret;
>>> -}
>>> -
>>>    int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64
>>> len)
>>>    {
>>>        struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>>> diff --git a/volumes.h b/volumes.h
>>> index 11572e78c04f..7bbdf615d31a 100644
>>> --- a/volumes.h
>>> +++ b/volumes.h
>>> @@ -208,10 +208,7 @@ int btrfs_read_sys_array(struct btrfs_fs_info
>>> *fs_info);
>>>    int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
>>>    int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>>                  struct btrfs_fs_info *fs_info, u64 *start,
>>> -              u64 *num_bytes, u64 type);
>>> -int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
>>> -               struct btrfs_fs_info *fs_info, u64 *start,
>>> -               u64 num_bytes, u64 type, int convert);
>>> +              u64 *num_bytes, u64 type, bool convert);
>>>    int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>>>                   int flags);
>>>    int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
>>>
>>
>>
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] btrfs-progs: Make btrfs_alloc_chunk to handle block group creation
  2018-02-02  8:19 ` [PATCH 3/7] btrfs-progs: Make btrfs_alloc_chunk to handle block group creation Qu Wenruo
@ 2018-02-02 11:33   ` Nikolay Borisov
  0 siblings, 0 replies; 17+ messages in thread
From: Nikolay Borisov @ 2018-02-02 11:33 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, dsterba



On  2.02.2018 10:19, Qu Wenruo wrote:
> Before this patch, chunk allocation is split into 2 parts:
> 
> 1) Chunk allocation
>    Handled by btrfs_alloc_chunk(), which will insert chunk and device
>    extent items.
> 
> 2) Block group allocation
>    Handled by btrfs_make_block_group(), which will insert block group
>    item and update space info.
> 
> However for chunk allocation, we don't really need to split these
> operations as all btrfs_alloc_chunk() has btrfs_make_block_group()
> followed.
> 
> So it's reasonable to merge btrfs_make_block_group() call into
> btrfs_alloc_chunk() to save several lines, and provides the basis for
> later btrfs_alloc_chunk() rework.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  convert/main.c |  4 ----
>  extent-tree.c  | 10 ++--------
>  mkfs/main.c    | 19 -------------------
>  volumes.c      | 10 ++++++----
>  4 files changed, 8 insertions(+), 35 deletions(-)
> 
> diff --git a/convert/main.c b/convert/main.c
> index b2444bb2ff21..240d3aa46db9 100644
> --- a/convert/main.c
> +++ b/convert/main.c
> @@ -915,10 +915,6 @@ static int make_convert_data_block_groups(struct btrfs_trans_handle *trans,
>  					BTRFS_BLOCK_GROUP_DATA, true);
>  			if (ret < 0)
>  				break;
> -			ret = btrfs_make_block_group(trans, fs_info, 0,
> -					BTRFS_BLOCK_GROUP_DATA, cur, len);
> -			if (ret < 0)
> -				break;
>  			cur += len;
>  		}
>  	}
> diff --git a/extent-tree.c b/extent-tree.c
> index b085ab0352b3..bccd83d1bae6 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -1909,15 +1909,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>  	                        space_info->flags, false);
>  	if (ret == -ENOSPC) {
>  		space_info->full = 1;
> -		return 0;
> +		return ret;
>  	}
> -
> -	BUG_ON(ret);
> -
> -	ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags,
> -				     start, num_bytes);
> -	BUG_ON(ret);
> -	return 0;
> +	return ret;
>  }
>  
>  static int update_block_group(struct btrfs_root *root,
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 358395ca0250..49159ea533b9 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -87,12 +87,6 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
>  			error("no space to allocate data/metadata chunk");
>  			goto err;
>  		}
> -		if (ret)
> -			return ret;
> -		ret = btrfs_make_block_group(trans, fs_info, 0,
> -					     BTRFS_BLOCK_GROUP_METADATA |
> -					     BTRFS_BLOCK_GROUP_DATA,
> -					     chunk_start, chunk_size);
>  		if (ret)
>  			return ret;
>  		allocation->mixed += chunk_size;
> @@ -106,12 +100,7 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
>  		}
>  		if (ret)
>  			return ret;
> -		ret = btrfs_make_block_group(trans, fs_info, 0,
> -					     BTRFS_BLOCK_GROUP_METADATA,
> -					     chunk_start, chunk_size);
>  		allocation->metadata += chunk_size;
> -		if (ret)
> -			return ret;
>  	}
>  
>  	root->fs_info->system_allocs = 0;
> @@ -140,12 +129,7 @@ static int create_data_block_groups(struct btrfs_trans_handle *trans,
>  		}
>  		if (ret)
>  			return ret;
> -		ret = btrfs_make_block_group(trans, fs_info, 0,
> -					     BTRFS_BLOCK_GROUP_DATA,
> -					     chunk_start, chunk_size);
>  		allocation->data += chunk_size;
> -		if (ret)
> -			return ret;
>  	}
>  
>  err:
> @@ -249,9 +233,6 @@ static int create_one_raid_group(struct btrfs_trans_handle *trans,
>  	if (ret)
>  		return ret;
>  
> -	ret = btrfs_make_block_group(trans, fs_info, 0,
> -				     type, chunk_start, chunk_size);
> -
>  	type &= BTRFS_BLOCK_GROUP_TYPE_MASK;
>  	if (type == BTRFS_BLOCK_GROUP_DATA) {
>  		allocation->data += chunk_size;
> diff --git a/volumes.c b/volumes.c
> index 9ee4650351c3..a9dc8c939dc5 100644
> --- a/volumes.c
> +++ b/volumes.c
> @@ -837,10 +837,9 @@ error:
>  				/ sizeof(struct btrfs_stripe) + 1)
>  
>  /*
> - * Alloc a chunk, will insert dev extents, chunk item.
> - * NOTE: This function will not insert block group item nor mark newly
> - * allocated chunk available for later allocation.
> - * Block group item and free space update is handled by btrfs_make_block_group()
> + * Alloc a chunk, will insert dev extents, chunk item, and insert new
> + * block group and update space info (so that extent allocator can use
> + * newly allocated chunk).
>   *
>   * @start:	return value of allocated chunk start bytenr.
>   * @num_bytes:	return value of allocated chunk size
> @@ -1159,6 +1158,9 @@ alloc_chunk:
>  	}
>  
>  	kfree(chunk);
> +

Add a comment here, visually separating this step of the process,
similar to your previous patch:

/*
 * Create the block group describing the chunk item
 */

or whatever text you deem correct.
> +	ret = btrfs_make_block_group(trans, info, 0, type, map->ce.start,
> +				     map->ce.size);
>  	return ret;
>  
>  out_chunk_map:
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] btrfs-progs: Introduce btrfs_raid_array and related infrastructures
  2018-02-02  8:19 ` [PATCH 4/7] btrfs-progs: Introduce btrfs_raid_array and related infrastructures Qu Wenruo
@ 2018-02-02 11:37   ` Nikolay Borisov
  2018-02-02 11:53     ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2018-02-02 11:37 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, dsterba



On  2.02.2018 10:19, Qu Wenruo wrote:
> As part of the effort to unify code and behavior between btrfs-progs and
> kernel, copy the btrfs_raid_array from kernel to btrfs-progs.
> 
> So later we can use the btrfs_raid_array[] to get needed raid info other
> than manually do if-else branches.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  ctree.h   | 12 +++++++++++-
>  volumes.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  volumes.h | 30 +++++++++++++++++++++++++++++
>  3 files changed, 107 insertions(+), 1 deletion(-)
> 
> diff --git a/ctree.h b/ctree.h
> index 17cdac76c58c..c76849d8deb7 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -958,7 +958,17 @@ struct btrfs_csum_item {
>  #define BTRFS_BLOCK_GROUP_RAID5    	(1ULL << 7)
>  #define BTRFS_BLOCK_GROUP_RAID6    	(1ULL << 8)
>  #define BTRFS_BLOCK_GROUP_RESERVED	BTRFS_AVAIL_ALLOC_BIT_SINGLE
<not related to this patch but i just saw it> 

BTRFS_BLOCK_GROUP_RESERVED differs in userspace than in kernel space. 
The btrfs_Tree header has it defined as: 

#define BTRFS_BLOCK_GROUP_RESERVED      (BTRFS_AVAIL_ALLOC_BIT_SINGLE | \       
                                         BTRFS_SPACE_INFO_GLOBAL_RSV)


Otherwise LGTM: 

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> -#define BTRFS_NR_RAID_TYPES             7
> +
> +enum btrfs_raid_types {
> +	BTRFS_RAID_RAID10,
> +	BTRFS_RAID_RAID1,
> +	BTRFS_RAID_DUP,
> +	BTRFS_RAID_RAID0,
> +	BTRFS_RAID_SINGLE,
> +	BTRFS_RAID_RAID5,
> +	BTRFS_RAID_RAID6,
> +	BTRFS_NR_RAID_TYPES
> +};
>  
>  #define BTRFS_BLOCK_GROUP_TYPE_MASK	(BTRFS_BLOCK_GROUP_DATA |    \
>  					 BTRFS_BLOCK_GROUP_SYSTEM |  \
> diff --git a/volumes.c b/volumes.c
> index a9dc8c939dc5..b47ff1f392b5 100644
> --- a/volumes.c
> +++ b/volumes.c
> @@ -30,6 +30,72 @@
>  #include "utils.h"
>  #include "kernel-lib/raid56.h"
>  
> +const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
> +	[BTRFS_RAID_RAID10] = {
> +		.sub_stripes	= 2,
> +		.dev_stripes	= 1,
> +		.devs_max	= 0,	/* 0 == as many as possible */
> +		.devs_min	= 4,
> +		.tolerated_failures = 1,
> +		.devs_increment	= 2,
> +		.ncopies	= 2,
> +	},
> +	[BTRFS_RAID_RAID1] = {
> +		.sub_stripes	= 1,
> +		.dev_stripes	= 1,
> +		.devs_max	= 2,
> +		.devs_min	= 2,
> +		.tolerated_failures = 1,
> +		.devs_increment	= 2,
> +		.ncopies	= 2,
> +	},
> +	[BTRFS_RAID_DUP] = {
> +		.sub_stripes	= 1,
> +		.dev_stripes	= 2,
> +		.devs_max	= 1,
> +		.devs_min	= 1,
> +		.tolerated_failures = 0,
> +		.devs_increment	= 1,
> +		.ncopies	= 2,
> +	},
> +	[BTRFS_RAID_RAID0] = {
> +		.sub_stripes	= 1,
> +		.dev_stripes	= 1,
> +		.devs_max	= 0,
> +		.devs_min	= 2,
> +		.tolerated_failures = 0,
> +		.devs_increment	= 1,
> +		.ncopies	= 1,
> +	},
> +	[BTRFS_RAID_SINGLE] = {
> +		.sub_stripes	= 1,
> +		.dev_stripes	= 1,
> +		.devs_max	= 1,
> +		.devs_min	= 1,
> +		.tolerated_failures = 0,
> +		.devs_increment	= 1,
> +		.ncopies	= 1,
> +	},
> +	[BTRFS_RAID_RAID5] = {
> +		.sub_stripes	= 1,
> +		.dev_stripes	= 1,
> +		.devs_max	= 0,
> +		.devs_min	= 2,
> +		.tolerated_failures = 1,
> +		.devs_increment	= 1,
> +		.ncopies	= 2,
> +	},
> +	[BTRFS_RAID_RAID6] = {
> +		.sub_stripes	= 1,
> +		.dev_stripes	= 1,
> +		.devs_max	= 0,
> +		.devs_min	= 3,
> +		.tolerated_failures = 2,
> +		.devs_increment	= 1,
> +		.ncopies	= 3,
> +	},
> +};
> +
>  struct stripe {
>  	struct btrfs_device *dev;
>  	u64 physical;
> diff --git a/volumes.h b/volumes.h
> index 7bbdf615d31a..612a0a7586f4 100644
> --- a/volumes.h
> +++ b/volumes.h
> @@ -108,6 +108,36 @@ struct map_lookup {
>  	struct btrfs_bio_stripe stripes[];
>  };
>  
> +struct btrfs_raid_attr {
> +	int sub_stripes;	/* sub_stripes info for map */
> +	int dev_stripes;	/* stripes per dev */
> +	int devs_max;		/* max devs to use */
> +	int devs_min;		/* min devs needed */
> +	int tolerated_failures; /* max tolerated fail devs */
> +	int devs_increment;	/* ndevs has to be a multiple of this */
> +	int ncopies;		/* how many copies to data has */
> +};
> +
> +extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES];
> +
> +static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags)
> +{
> +	if (flags & BTRFS_BLOCK_GROUP_RAID10)
> +		return BTRFS_RAID_RAID10;
> +	else if (flags & BTRFS_BLOCK_GROUP_RAID1)
> +		return BTRFS_RAID_RAID1;
> +	else if (flags & BTRFS_BLOCK_GROUP_DUP)
> +		return BTRFS_RAID_DUP;
> +	else if (flags & BTRFS_BLOCK_GROUP_RAID0)
> +		return BTRFS_RAID_RAID0;
> +	else if (flags & BTRFS_BLOCK_GROUP_RAID5)
> +		return BTRFS_RAID_RAID5;
> +	else if (flags & BTRFS_BLOCK_GROUP_RAID6)
> +		return BTRFS_RAID_RAID6;
> +
> +	return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */
> +}
> +
>  #define btrfs_multi_bio_size(n) (sizeof(struct btrfs_multi_bio) + \
>  			    (sizeof(struct btrfs_bio_stripe) * (n)))
>  #define btrfs_map_lookup_size(n) (sizeof(struct map_lookup) + \
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/7] btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum hole size
  2018-02-02  8:19 ` [PATCH 5/7] btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum hole size Qu Wenruo
@ 2018-02-02 11:41   ` Nikolay Borisov
  2018-02-02 11:49     ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2018-02-02 11:41 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, dsterba



On  2.02.2018 10:19, Qu Wenruo wrote:
> Just as kernel find_free_dev_extent(), allow it to return maximum hole
> size for us to build device list for later chunk allocator rework.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  volumes.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/volumes.c b/volumes.c
> index b47ff1f392b5..f4009ffa7c9e 100644
> --- a/volumes.c
> +++ b/volumes.c
> @@ -516,10 +516,10 @@ out:
>  }
>  
>  static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
> -				u64 *start)
> +				u64 *start, u64 *len)
>  {
>  	/* FIXME use last free of some kind */
> -	return find_free_dev_extent_start(device, num_bytes, 0, start, NULL);
> +	return find_free_dev_extent_start(device, num_bytes, 0, start, len);

Why do we need the free_dev_extent() wrapper over free_dev_extent_start
at all?
>  }
>  
>  static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
> @@ -543,7 +543,7 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
>  	 * is responsible to make sure it's free.
>  	 */
>  	if (!convert) {
> -		ret = find_free_dev_extent(device, num_bytes, start);
> +		ret = find_free_dev_extent(device, num_bytes, start, NULL);
>  		if (ret)
>  			goto err;
>  	}
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/7] btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum hole size
  2018-02-02 11:41   ` Nikolay Borisov
@ 2018-02-02 11:49     ` Qu Wenruo
  2018-02-02 11:57       ` Nikolay Borisov
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02 11:49 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs, dsterba


[-- Attachment #1.1: Type: text/plain, Size: 1788 bytes --]



On 2018年02月02日 19:41, Nikolay Borisov wrote:
> 
> 
> On  2.02.2018 10:19, Qu Wenruo wrote:
>> Just as kernel find_free_dev_extent(), allow it to return maximum hole
>> size for us to build device list for later chunk allocator rework.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  volumes.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/volumes.c b/volumes.c
>> index b47ff1f392b5..f4009ffa7c9e 100644
>> --- a/volumes.c
>> +++ b/volumes.c
>> @@ -516,10 +516,10 @@ out:
>>  }
>>  
>>  static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
>> -				u64 *start)
>> +				u64 *start, u64 *len)
>>  {
>>  	/* FIXME use last free of some kind */
>> -	return find_free_dev_extent_start(device, num_bytes, 0, start, NULL);
>> +	return find_free_dev_extent_start(device, num_bytes, 0, start, len);
> 
> Why do we need the free_dev_extent() wrapper over free_dev_extent_start
> at all?

Because kernel part still needs find_free_dev_extent(), in extent-tree.c.

I'd prefer to keep the function across kernel and btrfs-progs.

Thanks,
Qu

>>  }
>>  
>>  static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
>> @@ -543,7 +543,7 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
>>  	 * is responsible to make sure it's free.
>>  	 */
>>  	if (!convert) {
>> -		ret = find_free_dev_extent(device, num_bytes, start);
>> +		ret = find_free_dev_extent(device, num_bytes, start, NULL);
>>  		if (ret)
>>  			goto err;
>>  	}
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] btrfs-progs: Introduce btrfs_raid_array and related infrastructures
  2018-02-02 11:37   ` Nikolay Borisov
@ 2018-02-02 11:53     ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-02-02 11:53 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs, dsterba


[-- Attachment #1.1: Type: text/plain, Size: 5540 bytes --]



On 2018年02月02日 19:37, Nikolay Borisov wrote:
> 
> 
> On  2.02.2018 10:19, Qu Wenruo wrote:
>> As part of the effort to unify code and behavior between btrfs-progs and
>> kernel, copy the btrfs_raid_array from kernel to btrfs-progs.
>>
>> So later we can use the btrfs_raid_array[] to get needed raid info other
>> than manually do if-else branches.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  ctree.h   | 12 +++++++++++-
>>  volumes.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  volumes.h | 30 +++++++++++++++++++++++++++++
>>  3 files changed, 107 insertions(+), 1 deletion(-)
>>
>> diff --git a/ctree.h b/ctree.h
>> index 17cdac76c58c..c76849d8deb7 100644
>> --- a/ctree.h
>> +++ b/ctree.h
>> @@ -958,7 +958,17 @@ struct btrfs_csum_item {
>>  #define BTRFS_BLOCK_GROUP_RAID5    	(1ULL << 7)
>>  #define BTRFS_BLOCK_GROUP_RAID6    	(1ULL << 8)
>>  #define BTRFS_BLOCK_GROUP_RESERVED	BTRFS_AVAIL_ALLOC_BIT_SINGLE
> <not related to this patch but i just saw it> 
> 
> BTRFS_BLOCK_GROUP_RESERVED differs in userspace than in kernel space. 
> The btrfs_Tree header has it defined as: 
> 
> #define BTRFS_BLOCK_GROUP_RESERVED      (BTRFS_AVAIL_ALLOC_BIT_SINGLE | \       
>                                          BTRFS_SPACE_INFO_GLOBAL_RSV)

Another good place to cleanup. :)

Thanks,
Qu

> 
> 
> Otherwise LGTM: 
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> 
>> -#define BTRFS_NR_RAID_TYPES             7
>> +
>> +enum btrfs_raid_types {
>> +	BTRFS_RAID_RAID10,
>> +	BTRFS_RAID_RAID1,
>> +	BTRFS_RAID_DUP,
>> +	BTRFS_RAID_RAID0,
>> +	BTRFS_RAID_SINGLE,
>> +	BTRFS_RAID_RAID5,
>> +	BTRFS_RAID_RAID6,
>> +	BTRFS_NR_RAID_TYPES
>> +};
>>  
>>  #define BTRFS_BLOCK_GROUP_TYPE_MASK	(BTRFS_BLOCK_GROUP_DATA |    \
>>  					 BTRFS_BLOCK_GROUP_SYSTEM |  \
>> diff --git a/volumes.c b/volumes.c
>> index a9dc8c939dc5..b47ff1f392b5 100644
>> --- a/volumes.c
>> +++ b/volumes.c
>> @@ -30,6 +30,72 @@
>>  #include "utils.h"
>>  #include "kernel-lib/raid56.h"
>>  
>> +const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>> +	[BTRFS_RAID_RAID10] = {
>> +		.sub_stripes	= 2,
>> +		.dev_stripes	= 1,
>> +		.devs_max	= 0,	/* 0 == as many as possible */
>> +		.devs_min	= 4,
>> +		.tolerated_failures = 1,
>> +		.devs_increment	= 2,
>> +		.ncopies	= 2,
>> +	},
>> +	[BTRFS_RAID_RAID1] = {
>> +		.sub_stripes	= 1,
>> +		.dev_stripes	= 1,
>> +		.devs_max	= 2,
>> +		.devs_min	= 2,
>> +		.tolerated_failures = 1,
>> +		.devs_increment	= 2,
>> +		.ncopies	= 2,
>> +	},
>> +	[BTRFS_RAID_DUP] = {
>> +		.sub_stripes	= 1,
>> +		.dev_stripes	= 2,
>> +		.devs_max	= 1,
>> +		.devs_min	= 1,
>> +		.tolerated_failures = 0,
>> +		.devs_increment	= 1,
>> +		.ncopies	= 2,
>> +	},
>> +	[BTRFS_RAID_RAID0] = {
>> +		.sub_stripes	= 1,
>> +		.dev_stripes	= 1,
>> +		.devs_max	= 0,
>> +		.devs_min	= 2,
>> +		.tolerated_failures = 0,
>> +		.devs_increment	= 1,
>> +		.ncopies	= 1,
>> +	},
>> +	[BTRFS_RAID_SINGLE] = {
>> +		.sub_stripes	= 1,
>> +		.dev_stripes	= 1,
>> +		.devs_max	= 1,
>> +		.devs_min	= 1,
>> +		.tolerated_failures = 0,
>> +		.devs_increment	= 1,
>> +		.ncopies	= 1,
>> +	},
>> +	[BTRFS_RAID_RAID5] = {
>> +		.sub_stripes	= 1,
>> +		.dev_stripes	= 1,
>> +		.devs_max	= 0,
>> +		.devs_min	= 2,
>> +		.tolerated_failures = 1,
>> +		.devs_increment	= 1,
>> +		.ncopies	= 2,
>> +	},
>> +	[BTRFS_RAID_RAID6] = {
>> +		.sub_stripes	= 1,
>> +		.dev_stripes	= 1,
>> +		.devs_max	= 0,
>> +		.devs_min	= 3,
>> +		.tolerated_failures = 2,
>> +		.devs_increment	= 1,
>> +		.ncopies	= 3,
>> +	},
>> +};
>> +
>>  struct stripe {
>>  	struct btrfs_device *dev;
>>  	u64 physical;
>> diff --git a/volumes.h b/volumes.h
>> index 7bbdf615d31a..612a0a7586f4 100644
>> --- a/volumes.h
>> +++ b/volumes.h
>> @@ -108,6 +108,36 @@ struct map_lookup {
>>  	struct btrfs_bio_stripe stripes[];
>>  };
>>  
>> +struct btrfs_raid_attr {
>> +	int sub_stripes;	/* sub_stripes info for map */
>> +	int dev_stripes;	/* stripes per dev */
>> +	int devs_max;		/* max devs to use */
>> +	int devs_min;		/* min devs needed */
>> +	int tolerated_failures; /* max tolerated fail devs */
>> +	int devs_increment;	/* ndevs has to be a multiple of this */
>> +	int ncopies;		/* how many copies to data has */
>> +};
>> +
>> +extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES];
>> +
>> +static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags)
>> +{
>> +	if (flags & BTRFS_BLOCK_GROUP_RAID10)
>> +		return BTRFS_RAID_RAID10;
>> +	else if (flags & BTRFS_BLOCK_GROUP_RAID1)
>> +		return BTRFS_RAID_RAID1;
>> +	else if (flags & BTRFS_BLOCK_GROUP_DUP)
>> +		return BTRFS_RAID_DUP;
>> +	else if (flags & BTRFS_BLOCK_GROUP_RAID0)
>> +		return BTRFS_RAID_RAID0;
>> +	else if (flags & BTRFS_BLOCK_GROUP_RAID5)
>> +		return BTRFS_RAID_RAID5;
>> +	else if (flags & BTRFS_BLOCK_GROUP_RAID6)
>> +		return BTRFS_RAID_RAID6;
>> +
>> +	return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */
>> +}
>> +
>>  #define btrfs_multi_bio_size(n) (sizeof(struct btrfs_multi_bio) + \
>>  			    (sizeof(struct btrfs_bio_stripe) * (n)))
>>  #define btrfs_map_lookup_size(n) (sizeof(struct map_lookup) + \
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/7] btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum hole size
  2018-02-02 11:49     ` Qu Wenruo
@ 2018-02-02 11:57       ` Nikolay Borisov
  0 siblings, 0 replies; 17+ messages in thread
From: Nikolay Borisov @ 2018-02-02 11:57 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs, dsterba



On  2.02.2018 13:49, Qu Wenruo wrote:
> 
> 
> On 2018年02月02日 19:41, Nikolay Borisov wrote:
>>
>>
>> On  2.02.2018 10:19, Qu Wenruo wrote:
>>> Just as kernel find_free_dev_extent(), allow it to return maximum hole
>>> size for us to build device list for later chunk allocator rework.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  volumes.c | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/volumes.c b/volumes.c
>>> index b47ff1f392b5..f4009ffa7c9e 100644
>>> --- a/volumes.c
>>> +++ b/volumes.c
>>> @@ -516,10 +516,10 @@ out:
>>>  }
>>>  
>>>  static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
>>> -				u64 *start)
>>> +				u64 *start, u64 *len)
>>>  {
>>>  	/* FIXME use last free of some kind */
>>> -	return find_free_dev_extent_start(device, num_bytes, 0, start, NULL);
>>> +	return find_free_dev_extent_start(device, num_bytes, 0, start, len);
>>
>> Why do we need the free_dev_extent() wrapper over free_dev_extent_start
>> at all?
> 
> Because kernel part still needs find_free_dev_extent(), in extent-tree.c.
> 
> I'd prefer to keep the function across kernel and btrfs-progs.

You are right, I missed it while I was grepping the first time.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> 
> Thanks,
> Qu
> 
>>>  }
>>>  
>>>  static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
>>> @@ -543,7 +543,7 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
>>>  	 * is responsible to make sure it's free.
>>>  	 */
>>>  	if (!convert) {
>>> -		ret = find_free_dev_extent(device, num_bytes, start);
>>> +		ret = find_free_dev_extent(device, num_bytes, start, NULL);
>>>  		if (ret)
>>>  			goto err;
>>>  	}
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-02-02 11:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-02  8:19 [PATCH 0/7] Chunk allocator unification Qu Wenruo
2018-02-02  8:19 ` [PATCH 1/7] btrfs-progs: Refactor parameter of BTRFS_MAX_DEVS() from root to fs_info Qu Wenruo
2018-02-02  8:19 ` [PATCH 2/7] btrfs-progs: Merge btrfs_alloc_data_chunk into btrfs_alloc_chunk Qu Wenruo
2018-02-02  9:20   ` Su Yue
2018-02-02  9:41     ` Qu Wenruo
2018-02-02  9:54       ` Su Yue
2018-02-02  8:19 ` [PATCH 3/7] btrfs-progs: Make btrfs_alloc_chunk to handle block group creation Qu Wenruo
2018-02-02 11:33   ` Nikolay Borisov
2018-02-02  8:19 ` [PATCH 4/7] btrfs-progs: Introduce btrfs_raid_array and related infrastructures Qu Wenruo
2018-02-02 11:37   ` Nikolay Borisov
2018-02-02 11:53     ` Qu Wenruo
2018-02-02  8:19 ` [PATCH 5/7] btrfs-progs: volumes: Allow find_free_dev_extent() to return maximum hole size Qu Wenruo
2018-02-02 11:41   ` Nikolay Borisov
2018-02-02 11:49     ` Qu Wenruo
2018-02-02 11:57       ` Nikolay Borisov
2018-02-02  8:19 ` [PATCH 6/7] btrfs-progs: kernel-lib: Port kernel sort() to btrfs-progs Qu Wenruo
2018-02-02  8:19 ` [PATCH 7/7] btrfs-progs: volumes: Unify free dev extent search behavior between kernel and btrfs-progs Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).