linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Brown <david@westcontrol.com>
To: linux-raid@vger.kernel.org
Subject: Re: md road-map: 2011
Date: Wed, 16 Feb 2011 16:42:26 +0100	[thread overview]
Message-ID: <ijgr9p$7v8$1@dough.gmane.org> (raw)
In-Reply-To: <20110216212751.51a294aa@notabene.brown>

On 16/02/2011 11:27, NeilBrown wrote:
>
> I all,
>   I wrote this today and posted it at
> http://neil.brown.name/blog/20110216044002
>
> I thought it might be worth posting it here too...
>
> NeilBrown
>


The bad block log will be a huge step up for reliability by making 
failures fine-grained.  Occasional failures are a serious risk, 
especially with very large disks.  The bad block log, especially 
combined with the "hot replace" idea, will make md raid a lot safer 
because you avoid running the array in degraded mode (except for a few 
stripes).

When a block is marked as bad on a disk, is it possible to inform the 
file system that the whole stripe is considered bad?  Then the 
filesystem will (I hope) add that stripe to its own bad block list, move 
the data out to another stripe (or block, from the fs's viewpoint), thus 
restoring the raid redundancy for that data.

Can a "hot spare" automatically turn into a "hot replace" based on some 
criteria (such as a certain number of bad blocks)?  Can the replaced 
drive then become a "hot spare" again?  It may not be perfect, but it is 
still better than nothing, and useful if the admin can't replace the 
drive quickly.

It strikes me that "hot replace" is much like one of the original disks 
out of the array and replacing it with a RAID 1 pair using the original 
disk and a missing second.  The new disk is then added to the pair and 
they are sync'ed.  Finally, you remove the old disk from the RAID 1 
pair, then re-assign the drive from the RAID 1 "pair" to the original RAID.

I may be missing something, but if I think that using the bad-block list 
and the non-sync bitmaps, the only thing needed to support hot replace 
is a way to turn a member drive into a degraded RAID 1 set in an atomic 
action, and to reverse this action afterwards.  This may also give extra 
flexibility - it is conceivable that someone would want to keep the RAID 
1 set afterwards as a reshape (turning a RAID 5 into a RAID 1+0, for 
example).

For your non-sync bitmap, would it make sense to have a two-level 
bitmap?  Perhaps a coarse bitmap in blocks of 32 MB, with each entry 
showing a state of in sync, out of sync, partially synced, or never 
synced.  Partially synced coarse blocks would have their own fine bitmap 
at the 4K block size (or perhaps a bit bigger - maybe 32K or 64K would 
fit well with SSD block sizes).  Partially synced and out of sync blocks 
would be gradually brought into sync when the disks are otherwise free, 
while never synced blocks would not need to be synced at all.

This would let you efficiently store the state during initial builds 
(everything is marked "never synced" until it is used), and rebuilds are 
done by marking everything as "out of sync" on the new device.  The 
two-level structure would let you keep fine-grained sync information 
from file system discards without taking up unreasonable space.





  parent reply	other threads:[~2011-02-16 15:42 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-16 10:27 md road-map: 2011 NeilBrown
2011-02-16 11:28 ` Giovanni Tessore
2011-02-16 13:40   ` Roberto Spadim
2011-02-16 14:00     ` Robin Hill
2011-02-16 14:09       ` Roberto Spadim
2011-02-16 14:21         ` Roberto Spadim
2011-02-16 21:55           ` NeilBrown
2011-02-17  1:30             ` Roberto Spadim
2011-02-16 14:13 ` Joe Landman
2011-02-16 21:24   ` NeilBrown
2011-02-16 21:44     ` Roman Mamedov
2011-02-16 21:59       ` NeilBrown
2011-02-17  0:48         ` Phil Turmel
2011-02-16 22:12       ` Joe Landman
2011-02-16 15:42 ` David Brown [this message]
2011-02-16 21:35   ` NeilBrown
2011-02-16 22:34     ` David Brown
2011-02-16 23:01       ` NeilBrown
2011-02-17  0:30         ` David Brown
2011-02-17  0:55           ` NeilBrown
2011-02-17  1:04           ` Keld Jørn Simonsen
2011-02-17 10:45             ` David Brown
2011-02-17 10:58               ` Keld Jørn Simonsen
2011-02-17 11:45                 ` Giovanni Tessore
2011-02-17 15:44                   ` Keld Jørn Simonsen
2011-02-17 16:22                     ` Roberto Spadim
2011-02-18  0:13                     ` Giovanni Tessore
2011-02-18  2:56                       ` Keld Jørn Simonsen
2011-02-18  4:27                         ` Roberto Spadim
2011-02-18  9:47                         ` Giovanni Tessore
2011-02-18 18:43                           ` Keld Jørn Simonsen
2011-02-18 19:00                             ` Roberto Spadim
2011-02-18 19:18                               ` Keld Jørn Simonsen
2011-02-18 19:22                                 ` Roberto Spadim
2011-02-16 17:20 ` Joe Landman
2011-02-16 21:36   ` NeilBrown
2011-02-16 19:37 ` Phil Turmel
2011-02-16 21:44   ` NeilBrown
2011-02-17  0:11     ` Phil Turmel
2011-02-16 20:29 ` Piergiorgio Sartor
2011-02-16 21:48   ` NeilBrown
2011-02-16 22:53     ` Piergiorgio Sartor
2011-02-17  0:24     ` Phil Turmel
2011-02-17  0:52       ` NeilBrown
2011-02-17  1:14         ` Phil Turmel
2011-02-17  3:10           ` NeilBrown
2011-02-17 18:46             ` Phil Turmel
2011-02-17 21:04             ` Mr. James W. Laferriere
2011-02-18  1:48               ` NeilBrown
2011-02-17 19:56           ` Piergiorgio Sartor
2011-02-16 22:50 ` Keld Jørn Simonsen
2011-02-23  5:06 ` Daniel Reurich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='ijgr9p$7v8$1@dough.gmane.org' \
    --to=david@westcontrol.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).