linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] btrfs-progs: prevent mkfs from aborting with small volume
@ 2013-09-05  6:51 Hidetoshi Seto
  2013-09-05  6:53 ` [PATCH v2 1/3] btrfs-progs: error if device for mkfs is too small Hidetoshi Seto
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2013-09-05  6:51 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org; +Cc: chris.mason

Here are 3 patches to avoid undesired aborts of mkfs.btrfs.
These are based on top of Chris's btrfs-progs.git:

  git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

Thanks,
H.Seto


Hidetoshi Seto (3):
      btrfs-progs: error if device for mkfs is too small
      btrfs-progs: error if device have no space to make primary chunks
      btrfs-progs: calculate available blocks on device properly

 ctree.h   |    8 +++++
 mkfs.c    |   23 +++++++++++++
 volumes.c |  104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 129 insertions(+), 6 deletions(-)



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/3] btrfs-progs: error if device for mkfs is too small
  2013-09-05  6:51 [PATCH v2 0/3] btrfs-progs: prevent mkfs from aborting with small volume Hidetoshi Seto
@ 2013-09-05  6:53 ` Hidetoshi Seto
  2013-09-05  6:55 ` [PATCH v2 2/3] btrfs-progs: error if device have no space to make primary chunks Hidetoshi Seto
  2013-09-05  6:57 ` [PATCH v2 3/3] btrfs-progs: calculate available blocks on device properly Hidetoshi Seto
  2 siblings, 0 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2013-09-05  6:53 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org; +Cc: chris.mason, Eric Sandeen

Eric pointed out that mkfs abort if specified volume is too small:

  # truncate --size=2m testfile
  # ./mkfs.btrfs testfile
   :
  SMALL VOLUME: forcing mixed metadata/data groups
  mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
  Aborted (core dumped)

As the first step to fix problems around there, let mkfs to report
error if the size of target volume is less than the size of the first
system block group, BTRFS_MKFS_SYSTEM_GROUP_SIZE (= 4MB).

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 mkfs.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index b412b7e..a98fe54 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1422,6 +1422,12 @@ int main(int ac, char **av)
 		}
 	}
 
+	/* To create the first block group and chunk 0 in make_btrfs */
+	if (dev_block_count < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
+		fprintf(stderr, "device is too small to make filesystem\n");
+		exit(1);
+	}
+
 	blocks[0] = BTRFS_SUPER_INFO_OFFSET;
 	for (i = 1; i < 7; i++) {
 		blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 +
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/3] btrfs-progs: error if device have no space to make primary chunks
  2013-09-05  6:51 [PATCH v2 0/3] btrfs-progs: prevent mkfs from aborting with small volume Hidetoshi Seto
  2013-09-05  6:53 ` [PATCH v2 1/3] btrfs-progs: error if device for mkfs is too small Hidetoshi Seto
@ 2013-09-05  6:55 ` Hidetoshi Seto
  2013-09-05  6:57 ` [PATCH v2 3/3] btrfs-progs: calculate available blocks on device properly Hidetoshi Seto
  2 siblings, 0 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2013-09-05  6:55 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org; +Cc: chris.mason

The previous patch works fine if the size of specified volume to mkfs
is less than 4MB. However usually btrfs requires more than 4MB to work,
and the minimum preferred size is depending on the raid setting etc.

This patch let mkfs print error message if it cannot allocate one of
chunks should be there at first.

 [before]
  # truncate --size=4500K testfile
  # ./mkfs.btrfs -f testfile
   :
  SMALL VOLUME: forcing mixed metadata/data groups
  mkfs.btrfs: mkfs.c:84: make_root_dir: Assertion `!(ret)' failed.
  Aborted (core dumped)

 [After]
  # truncate --size=4500K testfile
  # ./mkfs.btrfs -f testfile
   :
  SMALL VOLUME: forcing mixed metadata/data groups
  no space to alloc data/metadata chunk
  failed to setup the root directory

TBD is calculate minimum size for setting and put it in the error
message to let user know how large amount of volume is required.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 mkfs.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index a98fe54..bac122f 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -81,6 +81,11 @@ static int make_root_dir(struct btrfs_root *root, int mixed)
 					&chunk_start, &chunk_size,
 					BTRFS_BLOCK_GROUP_METADATA |
 					BTRFS_BLOCK_GROUP_DATA);
+		if (ret == -ENOSPC) {
+			fprintf(stderr,
+				"no space to alloc data/metadata chunk\n");
+			goto err;
+		}
 		BUG_ON(ret);
 		ret = btrfs_make_block_group(trans, root, 0,
 					     BTRFS_BLOCK_GROUP_METADATA |
@@ -93,6 +98,10 @@ static int make_root_dir(struct btrfs_root *root, int mixed)
 		ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root,
 					&chunk_start, &chunk_size,
 					BTRFS_BLOCK_GROUP_METADATA);
+		if (ret == -ENOSPC) {
+			fprintf(stderr, "no space to alloc metadata chunk\n");
+			goto err;
+		}
 		BUG_ON(ret);
 		ret = btrfs_make_block_group(trans, root, 0,
 					     BTRFS_BLOCK_GROUP_METADATA,
@@ -110,6 +119,10 @@ static int make_root_dir(struct btrfs_root *root, int mixed)
 		ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root,
 					&chunk_start, &chunk_size,
 					BTRFS_BLOCK_GROUP_DATA);
+		if (ret == -ENOSPC) {
+			fprintf(stderr, "no space to alloc data chunk\n");
+			goto err;
+		}
 		BUG_ON(ret);
 		ret = btrfs_make_block_group(trans, root, 0,
 					     BTRFS_BLOCK_GROUP_DATA,
@@ -181,6 +194,10 @@ static int create_one_raid_group(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root,
 				&chunk_start, &chunk_size, type);
+	if (ret == -ENOSPC) {
+		fprintf(stderr, "not enough free space\n");
+		exit(1);
+	}
 	BUG_ON(ret);
 	ret = btrfs_make_block_group(trans, root->fs_info->extent_root, 0,
 				     type, BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 3/3] btrfs-progs: calculate available blocks on device properly
  2013-09-05  6:51 [PATCH v2 0/3] btrfs-progs: prevent mkfs from aborting with small volume Hidetoshi Seto
  2013-09-05  6:53 ` [PATCH v2 1/3] btrfs-progs: error if device for mkfs is too small Hidetoshi Seto
  2013-09-05  6:55 ` [PATCH v2 2/3] btrfs-progs: error if device have no space to make primary chunks Hidetoshi Seto
@ 2013-09-05  6:57 ` Hidetoshi Seto
  2 siblings, 0 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2013-09-05  6:57 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org; +Cc: chris.mason

I found that mkfs.btrfs aborts when assigned multi volumes contain
a small volume:

  # parted /dev/sdf p
  Model: LSI MegaRAID SAS RMB (scsi)
  Disk /dev/sdf: 72.8GB
  Sector size (logical/physical): 512B/512B
  Partition Table: msdos

  Number  Start   End     Size    Type     File system  Flags
   1      32.3kB  72.4GB  72.4GB  primary
   2      72.4GB  72.8GB  461MB   primary

  # ./mkfs.btrfs -f /dev/sdf1 /dev/sdf2
  :
  SMALL VOLUME: forcing mixed metadata/data groups
  adding device /dev/sdf2 id 2
  mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
  Aborted (core dumped)

This failure of btrfs_alloc_chunk was caused by following steps:
 1) since there is only small space in the small device, mkfs was
    going to allocate a chunk from free space as much as available.
    So mkfs called btrfs_alloc_chunk with
        size = device->total_bytes - device->used_bytes.
 2) (According to the comment in source code, to avoid overwriting
    superblock,) btrfs_alloc_chunk starts taking chunks at an offset
    of 1MB. It means that the layout of a disk will be like:
     [[1MB at beginning for sb][allocated chunks]* ... free space ... ]
    and you can see that the available free space for allocation is:
        avail = device->total_bytes - device->used_bytes - 1MB.
 3) Therefore there is only free space 1MB less than requested. damn.

>From further investigations I also found that this issue is easily
reproduced by using -A, --alloc-start option:

  # truncate --size=1G testfile
  # ./mkfs.btrfs -A900M -f testfile
   :
  mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
  Aborted (core dumped)

In this case there is only 100MB for allocation but btrfs_alloc_chunk
was going to allocate more than the 100MB.

The root cause of both of above troubles is a same simple bug:
btrfs_chunk_alloc does not calculate available bytes properly even
though it researches how many devices have enough room to have a
chunk to be allocated.

So this patch introduces new function btrfs_device_avail_bytes()
which returns available bytes for allocation in specified device.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 ctree.h   |    8 +++++
 volumes.c |  104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 106 insertions(+), 6 deletions(-)

diff --git a/ctree.h b/ctree.h
index 0b0d701..90be7ab 100644
--- a/ctree.h
+++ b/ctree.h
@@ -811,6 +811,14 @@ struct btrfs_csum_item {
 	u8 csum;
 } __attribute__ ((__packed__));
 
+/*
+ * We don't want to overwrite 1M at the beginning of device, even though
+ * there is our 1st superblock at 64k. Some possible reasons:
+ *  - the first 64k blank is useful for some boot loader/manager
+ *  - the first 1M could be scratched by buggy partitioner or somesuch
+ */
+#define BTRFS_BLOCK_RESERVED_1M_FOR_SUPER	((u64)1024 * 1024)
+
 /* tag for the radix tree of block groups in ram */
 #define BTRFS_BLOCK_GROUP_DATA		(1ULL << 0)
 #define BTRFS_BLOCK_GROUP_SYSTEM	(1ULL << 1)
diff --git a/volumes.c b/volumes.c
index 0ff2283..e8d7f25 100644
--- a/volumes.c
+++ b/volumes.c
@@ -268,7 +268,7 @@ static int find_free_dev_extent(struct btrfs_trans_handle *trans,
 	struct btrfs_dev_extent *dev_extent = NULL;
 	u64 hole_size = 0;
 	u64 last_byte = 0;
-	u64 search_start = 0;
+	u64 search_start = root->fs_info->alloc_start;
 	u64 search_end = device->total_bytes;
 	int ret;
 	int slot = 0;
@@ -283,10 +283,12 @@ static int find_free_dev_extent(struct btrfs_trans_handle *trans,
 	/* we don't want to overwrite the superblock on the drive,
 	 * so we make sure to start at an offset of at least 1MB
 	 */
-	search_start = max((u64)1024 * 1024, search_start);
+	search_start = max(BTRFS_BLOCK_RESERVED_1M_FOR_SUPER, search_start);
 
-	if (root->fs_info->alloc_start + num_bytes <= device->total_bytes)
-		search_start = max(root->fs_info->alloc_start, search_start);
+	if (search_start >= search_end) {
+		ret = -ENOSPC;
+		goto error;
+	}
 
 	key.objectid = device->devid;
 	key.offset = search_start;
@@ -660,6 +662,94 @@ static u32 find_raid56_stripe_len(u32 data_devices, u32 dev_stripe_target)
 	return 64 * 1024;
 }
 
+/*
+ * btrfs_device_avail_bytes - count bytes available for alloc_chunk
+ *
+ * It is not equal to "device->total_bytes - device->bytes_used".
+ * We do not allocate any chunk in 1M at beginning of device, and not
+ * allowed to allocate any chunk before alloc_start if it is specified.
+ * So search holes from max(1M, alloc_start) to device->total_bytes.
+ */
+static int btrfs_device_avail_bytes(struct btrfs_trans_handle *trans,
+				    struct btrfs_device *device,
+				    u64 *avail_bytes)
+{
+	struct btrfs_path *path;
+	struct btrfs_root *root = device->dev_root;
+	struct btrfs_key key;
+	struct btrfs_dev_extent *dev_extent = NULL;
+	struct extent_buffer *l;
+	u64 search_start = root->fs_info->alloc_start;
+	u64 search_end = device->total_bytes;
+	u64 extent_end = 0;
+	u64 free_bytes = 0;
+	int ret;
+	int slot = 0;
+
+	search_start = max(BTRFS_BLOCK_RESERVED_1M_FOR_SUPER, search_start);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = device->devid;
+	key.offset = root->fs_info->alloc_start;
+	key.type = BTRFS_DEV_EXTENT_KEY;
+
+	path->reada = 2;
+	ret = btrfs_search_slot(trans, root, &key, path, 0, 0);
+	if (ret < 0)
+		goto error;
+	ret = btrfs_previous_item(root, path, 0, key.type);
+	if (ret < 0)
+		goto error;
+
+	while (1) {
+		l = path->nodes[0];
+		slot = path->slots[0];
+		if (slot >= btrfs_header_nritems(l)) {
+			ret = btrfs_next_leaf(root, path);
+			if (ret == 0)
+				continue;
+			if (ret < 0)
+				goto error;
+			break;
+		}
+		btrfs_item_key_to_cpu(l, &key, slot);
+
+		if (key.objectid < device->devid)
+			goto next;
+		if (key.objectid > device->devid)
+			break;
+		if (btrfs_key_type(&key) != BTRFS_DEV_EXTENT_KEY)
+			goto next;
+		if (key.offset > search_end)
+			break;
+		if (key.offset > search_start)
+			free_bytes += key.offset - search_start;
+
+		dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent);
+		extent_end = key.offset + btrfs_dev_extent_length(l,
+								  dev_extent);
+		if (extent_end > search_start)
+			search_start = extent_end;
+		if (search_start > search_end)
+			break;
+next:
+		path->slots[0]++;
+		cond_resched();
+	}
+
+	if (search_start < search_end)
+		free_bytes += search_end - search_start;
+
+	*avail_bytes = free_bytes;
+	ret = 0;
+error:
+	btrfs_free_path(path);
+	return ret;
+}
+
 int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		      struct btrfs_root *extent_root, u64 *start,
 		      u64 *num_bytes, u64 type)
@@ -678,7 +768,7 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	u64 calc_size = 8 * 1024 * 1024;
 	u64 min_free;
 	u64 max_chunk_size = 4 * calc_size;
-	u64 avail;
+	u64 avail = 0;
 	u64 max_avail = 0;
 	u64 percent_max;
 	int num_stripes = 1;
@@ -782,7 +872,9 @@ again:
 	/* build a private list of devices we will allocate from */
 	while(index < num_stripes) {
 		device = list_entry(cur, struct btrfs_device, dev_list);
-		avail = device->total_bytes - device->bytes_used;
+		ret = btrfs_device_avail_bytes(trans, device, &avail);
+		if (ret)
+			return ret;
 		cur = cur->next;
 		if (avail >= min_free) {
 			list_move_tail(&device->dev_list, &private_devs);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-09-05  6:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-05  6:51 [PATCH v2 0/3] btrfs-progs: prevent mkfs from aborting with small volume Hidetoshi Seto
2013-09-05  6:53 ` [PATCH v2 1/3] btrfs-progs: error if device for mkfs is too small Hidetoshi Seto
2013-09-05  6:55 ` [PATCH v2 2/3] btrfs-progs: error if device have no space to make primary chunks Hidetoshi Seto
2013-09-05  6:57 ` [PATCH v2 3/3] btrfs-progs: calculate available blocks on device properly Hidetoshi Seto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).