From: Andreas Dilger <adilger@clusterfs.com>
To: Eric <erpo41@gmail.com>
Cc: linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: [RFC] store RAID stride in superblock
Date: Sat, 12 May 2007 08:26:37 -0700 [thread overview]
Message-ID: <20070512152637.GS6375@schatzie.adilger.int> (raw)
In-Reply-To: <1178957506.20145.41.camel@eric-laptop>
On May 12, 2007 01:11 -0700, Eric wrote:
> The concept is really tempting. RAID is good, and not asking the user
> for information that the system can find out for itself is good too.
>
> In the unlikely event that the RAID stride were to change, I think the
> autodetect-each-time method would be superior to the store-in-superblock
> method. Doubly so if the code to detect MD and LVM stride is lean and
> clean.
I've asked the block layer folks a couple of times if it would be possible
to have an interface for this in the kernel, but so far I've had little
success in getting them to do it and I don't have time for it myself.
I agree that auto-detection is best (would need a userspace interface too)
but a lot can be done with a format-time detection. It is unlikely that
the RAID striping will change under the filesystem, and if it does then
the stripe size is usually kept the same (e.g. RAID 5 restriping to add
a disk).
Even if the stiping does change, the current alignment of bitmaps is
about the worst possible case for power-of-two stride sizes because a
single disk has all of the bitmaps (using the terms "stripe = N * stride"
for N+1 RAID5 or N+2 RAID6 - if anyone knows the "more correct" terms
please speak up). It would also be possible to use tune2fs to change
the stride + stripe size in the superblock to at least tune the mballoc
allocation even if we can't move the bitmaps around very easily.
> I wonder if, in a RAID 0 configuration, deliberately misaligning data
> structures smaller than (size of stride * number of disks in array)
> would yield a performance benefit.
Yes, that would definitely be something to do. If you have N-disk RAID0,
each disk having "stride" blocks at a time, then offsetting the bitmaps by
"stride" blocks each is exactly what "mke2fs -E stride=" does. The
mballoc "stripe" option tries to put large allocations covering the whole
stripe to avoid parity read-modify-write if possible.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
next prev parent reply other threads:[~2007-05-12 15:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-12 2:02 [RFC] store RAID stride in superblock Andreas Dilger
2007-05-12 2:21 ` Eric Sandeen
2007-05-12 8:11 ` Eric
2007-05-12 8:33 ` Alex Tomas
2007-05-12 9:32 ` Eric
2007-05-12 9:38 ` Alex Tomas
2007-05-12 16:14 ` Eric
2007-05-12 15:26 ` Andreas Dilger [this message]
2007-05-19 2:08 ` Theodore Tso
2007-05-24 11:44 ` Andreas Dilger
2007-05-24 14:15 ` Rupesh Thakare
2007-05-31 16:21 ` Theodore Tso
2007-05-31 20:19 ` Andreas Dilger
2007-05-31 21:02 ` Kalpak Shah
2007-05-31 21:33 ` Theodore Tso
2007-05-31 22:01 ` Eric Sandeen
2007-05-31 22:03 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070512152637.GS6375@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=erpo41@gmail.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).