From: Eric Sandeen <sandeen@redhat.com>
To: Dan Ehrenberg <dehrenberg@google.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
"Theodore Ts'o" <tytso@mit.edu>
Subject: Re: [PATCH] ext4: Change the handling of RAID stripe width
Date: Wed, 06 Jul 2011 16:18:40 -0500 [thread overview]
Message-ID: <4E14D130.6080605@redhat.com> (raw)
In-Reply-To: <1309985245-14835-1-git-send-email-dehrenberg@google.com>
On 7/6/11 3:47 PM, Dan Ehrenberg wrote:
> Previously, the stripe width was blindly used for determining the size
> of allocations. Now, the stripe width is used as a hint for the initial
> mb_group_prealloc; if it is greater than 1, then we make sure that
> mb_group_prealloc is some multiple of it, and otherwise it is ignored.
> mb_group_prealloc is always usable to adjust the preallocation strategy,
> not just when the stripe-width is 0 as before.
>
> Signed-off-by: Dan Ehrenberg <dehrenberg@google.com>
> ---
> fs/ext4/mballoc.c | 40 +++++++++++++++++++++++++++++-----------
> 1 files changed, 29 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 6ed859d..710c27f 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -127,13 +127,14 @@
> * based on file size. This can be found in ext4_mb_normalize_request. If
> * we are doing a group prealloc we try to normalize the request to
> * sbi->s_mb_group_prealloc. Default value of s_mb_group_prealloc is
> - * 512 blocks. This can be tuned via
> - * /sys/fs/ext4/<partition/mb_group_prealloc. The value is represented in
> - * terms of number of blocks. If we have mounted the file system with -O
> + * 512 blocks. If we have mounted the file system with -O
> * stripe=<value> option the group prealloc request is normalized to the
> - * stripe value (sbi->s_stripe)
> + * the smallest multiple of the stripe value (sbi->s_stripe) which is
> + * greater than the default mb_group_prealloc. This can be tuned via
> + * /sys/fs/ext4/<partition>/mb_group_prealloc. The value is represented in
> + * terms of number of blocks.
> *
> - * The regular allocator(using the buddy cache) supports few tunables.
> + * The regular allocator (using the buddy cache) supports a few tunables.
> *
> * /sys/fs/ext4/<partition>/mb_min_to_scan
> * /sys/fs/ext4/<partition>/mb_max_to_scan
> @@ -2471,7 +2472,26 @@ int ext4_mb_init(struct super_block *sb, int needs_recovery)
> sbi->s_mb_stats = MB_DEFAULT_STATS;
> sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD;
> sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS;
> + /*
> + * If the stripe width is 1, this makes no sense and
> + * we set it to 0 to turn off stripe handling code.
> + */
> + if (sbi->s_stripe == 1)
> + sbi->s_stripe = 0;
This strikes me as a weird band-aid-y place to fix this up.
Wouldn't it be better suited for the option-parsing code, and/or
in ext4_get_stripe_size()? Why let a value of 1 get this far
only to override it here?
-Eric
> sbi->s_mb_group_prealloc = MB_DEFAULT_GROUP_PREALLOC;
> + /*
> + * If there is a s_stripe > 1, then we set the s_mb_group_prealloc
> + * to the lowest multiple of s_stripe which is bigger than
> + * the s_mb_group_prealloc as determined above. We want
> + * the preallocation size to be an exact multiple of the
> + * RAID stripe size so that preallocations don't fragment
> + * the stripes.
> + */
> + if (sbi->s_stripe > 1) {
> + sbi->s_mb_group_prealloc = roundup(
> + sbi->s_mb_group_prealloc, sbi->s_stripe);
> + }
>
> sbi->s_locality_groups = alloc_percpu(struct ext4_locality_group);
> if (sbi->s_locality_groups == NULL) {
> @@ -2830,8 +2850,9 @@ out_err:
>
> /*
> * here we normalize request for locality group
> - * Group request are normalized to s_strip size if we set the same via mount
> - * option. If not we set it to s_mb_group_prealloc which can be configured via
> + * Group request are normalized to s_mb_group_prealloc, which goes to
> + * s_strip if we set the same via mount option.
> + * s_mb_group_prealloc can be configured via
> * /sys/fs/ext4/<partition>/mb_group_prealloc
> *
> * XXX: should we try to preallocate more than the group has now?
> @@ -2842,10 +2863,7 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac)
> struct ext4_locality_group *lg = ac->ac_lg;
>
> BUG_ON(lg == NULL);
> - if (EXT4_SB(sb)->s_stripe)
> - ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe;
> - else
> - ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
> + ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
> mb_debug(1, "#%u: goal %u blocks for locality group\n",
> current->pid, ac->ac_g_ex.fe_len);
> }
next prev parent reply other threads:[~2011-07-06 21:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-06 20:47 [PATCH] ext4: Change the handling of RAID stripe width Dan Ehrenberg
2011-07-06 21:18 ` Eric Sandeen [this message]
2011-07-06 22:17 ` Daniel Ehrenberg
2011-07-06 22:17 ` Daniel Ehrenberg
2011-07-06 22:40 ` Eric Sandeen
2011-07-06 21:46 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E14D130.6080605@redhat.com \
--to=sandeen@redhat.com \
--cc=adilger.kernel@dilger.ca \
--cc=dehrenberg@google.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.