public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Eric Sandeen <sandeen@redhat.com>
Cc: Doug Hunley <doug@hunley.homeip.net>, linux-ext4@vger.kernel.org
Subject: Re: changing stride and stripe_width post-fs-creation?
Date: Tue, 20 Oct 2009 15:30:59 -0600	[thread overview]
Message-ID: <0ABACA66-004A-43AE-83B0-203CFD73AB86@sun.com> (raw)
In-Reply-To: <4ADE28BF.6000605@redhat.com>

On 20-Oct-09, at 15:16, Eric Sandeen wrote:
> Andreas Dilger wrote:
>> The stride is mostly used at fs creation time, but there is no  
>> problem
>> with changing it.  The stripe_width is used by the allocator to align
>> file allocations with the RAID layout.
>> One question for Eric is whether the new libdisk patches he made  
>> will set
>> the stripe_width to something ridiculous like 512 or 4096 bytes, or  
>> if it
>> just leaves that field unset in that case.  I suspect it would be  
>> bad for
>> mballoc to see the stripe_width be such a small value.
>
> well... yes, it does set it to whatever is reported:
>
> +       min_io = blkid_topology_get_minimum_io_size(tp);
> +       opt_io = blkid_topology_get_optimal_io_size(tp);
> +       blocksize = EXT2_BLOCK_SIZE(fs_param);
> +
> +       fs_param->s_raid_stride = min_io / blocksize;
> +       fs_param->s_raid_stripe_width = opt_io / blocksize;
>
>
> if mballoc can't handle certain values then maybe the kernel code  
> should be changed to ignore it?  Small values could just as easily  
> come from a user too

That probably makes the most sense to have the kernel ignore the  
value.  It's
not that it can't "handle" it, just that I suspect mballoc will work  
poorly if
it is trying to align the allocations to 1-block values (i.e. no  
alignment at
all).  Even with regular disks, reading in 64kB-aligned chunks is more  
efficient
than reading misaligned chunks because of the track buffer.

Probably ignoring anything below 64kB makes sense, or possibly using  
some
multiple of the specified size until it is larger than 64kB is better  
(in
case someone formats their RAID-5 with 5 disks * 8kB chunk size or  
similar).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


      reply	other threads:[~2009-10-20 21:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-20 17:32 changing stride and stripe_width post-fs-creation? Doug Hunley
2009-10-20 21:08 ` Andreas Dilger
2009-10-20 21:16   ` Eric Sandeen
2009-10-20 21:30     ` Andreas Dilger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ABACA66-004A-43AE-83B0-203CFD73AB86@sun.com \
    --to=adilger@sun.com \
    --cc=doug@hunley.homeip.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox