[PATCH][BTRFS] raid5/6: chunk allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Goffredo Baroncelli <kreijack@inwind.it>
To: Chris Mason <chris.mason@fusionio.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: [PATCH][BTRFS] raid5/6: chunk allocation
Date: Sun, 17 Feb 2013 10:41:00 +0100	[thread overview]
Message-ID: <5120A5AC.8020107@inwind.it> (raw)

Hi Chris,

I am playing with the raid5/6 code, to adapt my "disk-usage" 
patches to the raid5/6 code.
During this develop I found that the chunk allocation is strange. 
Looking at the code I found in volume.c the following codes:

3576 static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,

3730         /*
3731          * this will have to be fixed for RAID1 and RAID10 over
3732          * more drives
3733          */
3734         data_stripes = num_stripes / ncopies;
3735
3736         if (stripe_size * ndevs > max_chunk_size * ncopies) {
3737                 stripe_size = max_chunk_size * ncopies;
3738                 do_div(stripe_size, ndevs);
3739         }

This code decides how big is a chunk, following two mains roles:
1) the chunk stripe shall be less than max_stripe_size
2) the chunk capability (the space usable by the user) shall 
be less than max_chunk_size.

The code above works well in case of RAID0/RAID1/DUP/SINGLE/RAID10
but doesn't play well in case of RAID5/6. In fact in case the chunk
type is BTRFS_BLOCK_GROUP_METADATA then max_stripe_size is 1GB 
and max_chunk_size is 1GB too. If the number of devices (ndevs) is 7
and the raid profile is RAID6, then ncopies is 3, the stripe_size is
1GB*3/7 = 438MB, which lead to a chunk size of 2.14GB ! Which is 
not the expected value. 
I think that we should change the test above in raid6 case to

	data_stripes = ndevs - 2;
	if (stripe_size * data_stripes > max_chunk_size) {
		stripe_size = max_chunk_size;
                do_div(stripe_size, data_stripes);
	}

The patch below should solve this issue, and clean up a bit the logic
separating the code of raid5, raid6 from the code of the others
raid profiles.

Anyway I would like to point out another possible issue: the 
fragmentation. To avoid the fragmentation should we round up 
the stripe size to a more sane value like like 256MB ? 

I know that this could led to an "insane" chunk size when 
the number of disk is higher; but the current logic (
the stripe_size is equal to the chunk_size / number_of_device)
could lead to fragmentation problem when different raid profiles
where used together.

BR
G.Baroncelli

Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c372264..88d17b4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3724,25 +3724,32 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	stripe_size = devices_info[ndevs-1].max_avail;
 	num_stripes = ndevs * dev_stripes;

-	/*
-	 * this will have to be fixed for RAID1 and RAID10 over
-	 * more drives
-	 */
-	data_stripes = num_stripes / ncopies;
-
-	if (stripe_size * ndevs > max_chunk_size * ncopies) {
-		stripe_size = max_chunk_size * ncopies;
-		do_div(stripe_size, ndevs);
-	}
 	if (type & BTRFS_BLOCK_GROUP_RAID5) {
 		raid_stripe_len = find_raid56_stripe_len(ndevs - 1,
 				 btrfs_super_stripesize(info->super_copy));
-		data_stripes = num_stripes - 1;
-	}
-	if (type & BTRFS_BLOCK_GROUP_RAID6) {
+		data_stripes = ndevs - 1;
+		if (stripe_size * data_stripes > max_chunk_size) {
+			stripe_size = max_chunk_size;
+			do_div(stripe_size, data_stripes);
+		}
+	} else if (type & BTRFS_BLOCK_GROUP_RAID6) {
 		raid_stripe_len = find_raid56_stripe_len(ndevs - 2,
 				 btrfs_super_stripesize(info->super_copy));
-		data_stripes = num_stripes - 2;
+		data_stripes = ndevs - 2;
+		if (stripe_size * data_stripes > max_chunk_size) {
+			stripe_size = max_chunk_size;
+			do_div(stripe_size, data_stripes);
+		}
+	} else { /* RAID1, RAID0, RAID10, SINGLE, SUP */
+		/*
+		 * this will have to be fixed for RAID1 and RAID10 over
+		 * more drives
+		 */
+		data_stripes = num_stripes / ncopies;
+		if (stripe_size * ndevs > max_chunk_size * ncopies) {
+			stripe_size = max_chunk_size * ncopies;
+			do_div(stripe_size, ndevs);
+		}
 	}
 	do_div(stripe_size, dev_stripes);

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

                 reply	other threads:[~2013-02-17  9:40 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c372264 dfblob:88d17b4 )
 OR (
bs:"[PATCH][BTRFS] raid5/6: chunk allocation" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5120A5AC.8020107@inwind.it \
    --to=kreijack@inwind.it \
    --cc=chris.mason@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.