From: John Robinson <john.robinson@anonymous.org.uk>
To: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: RAID5 reconstruction ?
Date: Sun, 31 May 2009 13:11:44 +0100 [thread overview]
Message-ID: <4A227400.7080703@anonymous.org.uk> (raw)
In-Reply-To: <87ljod7aiy.fsf@frosties.localdomain>
On 31/05/2009 12:54, Goswin von Brederlow wrote:
> SandeepKsinha <sandeepksinha@gmail.com> writes:
>>> On Sat, 2009-05-30 at 20:55 +0200, Goswin von Brederlow wrote:
>>>> And just when I hit send I thought of something else.
>>>>
>>>> Instead of the initial sync when creating a raid the bitmap could just
>>>> mark all blocks as unused. Much faster raid creation.
>>
>> This really sounds like a good option. This would have a slight hit
>> for writes which I believe will compensate for later re-constructions,
>> replacing a disk, mirror resysnc and many more operation.
>
> What hit? Currently with bitmap support a write will set the block to
> "unclean", write the data, write the parity and set the block to
> "clean". Setting the "used" bit along the way should not cost much.
>
> Only difference I see is that the bitmap would have to have finer
> granularity so one "used" bit covers one filesystem block (4k usualy).
> Otherwise you could only "use" blocks but not "unuse" them again when
> the filesystem frees them in 4k chunks.
I think the whole thing probably ought to be done in such a way as to
support the pass-down and pass-through of TRIM/DISCARD commands, which I
vaguely recall from previous discussions operate at sector granularity.
The idea would be for md to be able to use a bitmap (or other some other
data structure for a free/used block/sector list) when operating over
devices which don't support TRIM/DISCARD themselves, but take advantage
of the devices' own capability when it's there - and since it'll be
SSDs, we'd want to avoid repeatedly rewriting a bitmap since the point
of TRIM/DISCARD is to help SSDs manage wear levelling.
I am assuming that devices supporting TRIM/DISCARD are able to indicate
whether a given sector is used or free; if they don't and just return
arbitary data we would have to keep a bitmap (or whatever) in md to be
able to support TRIM/DISCARD at all.
Of course any bitmap (or whatever) might still be optimised if we know
md and its clients never use anything smaller than e.g. 4k.
Cheers,
John.
next prev parent reply other threads:[~2009-05-31 12:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-30 5:44 RAID5 reconstruction ? SandeepKsinha
2009-05-30 12:52 ` Sujit Karataparambil
2009-05-30 13:28 ` SandeepKsinha
2009-05-30 13:31 ` Sujit Karataparambil
2009-06-09 4:13 ` Nifty Fedora Mitch
2009-05-30 13:35 ` John Robinson
2009-05-30 14:06 ` Maxime Boissonneault
2009-05-30 15:46 ` John Robinson
2009-05-30 16:16 ` Maxime Boissonneault
2009-05-30 16:30 ` John Robinson
2009-05-30 16:08 ` Redeeman
2009-05-30 18:39 ` Bill Davidsen
2009-05-30 18:54 ` Goswin von Brederlow
2009-05-31 8:10 ` SandeepKsinha
2009-05-30 18:55 ` Goswin von Brederlow
2009-05-30 19:37 ` Redeeman
2009-05-31 8:02 ` SandeepKsinha
2009-05-31 11:54 ` Goswin von Brederlow
2009-05-31 12:11 ` John Robinson [this message]
2009-05-31 12:14 ` NeilBrown
2009-06-03 1:54 ` Greg Freemyer
2009-06-02 18:42 ` Bill Davidsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A227400.7080703@anonymous.org.uk \
--to=john.robinson@anonymous.org.uk \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).