linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Phil Turmel <philip@turmel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: md road-map: 2011
Date: Thu, 17 Feb 2011 08:44:03 +1100	[thread overview]
Message-ID: <20110217084403.46b4d8ee@notabene.brown> (raw)
In-Reply-To: <4D5C2776.2080609@turmel.org>

On Wed, 16 Feb 2011 14:37:26 -0500 Phil Turmel <philip@turmel.org> wrote:

> Hi Neil,
> 
> On 02/16/2011 05:27 AM, NeilBrown wrote:
> > 
> > I all,
> >  I wrote this today and posted it at
> > http://neil.brown.name/blog/20110216044002
> > 
> > I thought it might be worth posting it here too...
> > 
> > NeilBrown
> > 
> > 
> > -------------------------
> > 
> > 
> > It is about 2 years since I last published a road-map[1] for md/raid
> > so I thought it was time for another one.  Unfortunately quite a few
> > things on the previous list remain undone, but there has been some
> > progress.
> > 
> > I think one of the problems with some to-do lists is that they aren't
> > detailed enough.  High-level design, low level design, implementation,
> > and testing are all very different sorts of tasks that seem to require
> > different styles of thinking and so are best done separately.  As
> > writing up a road-map is a high-level design task it makes sense to do
> > the full high-level design at that point so that the tasks are
> > detailed enough to be addressed individually with little reference to
> > the other tasks in the list (except what is explicit in the road map).
> > 
> > A particular need I am finding for this road map is to make explicit
> > the required ordering and interdependence of certain tasks.  Hopefully
> > that will make it easier to address them in an appropriate order, and
> > mean that I waste less time saying "this is too hard, I might go read
> > some email instead".
> > 
> > So the following is a detailed road-map for md raid for the coming
> > months.
> > 
> > [1] http://neil.brown.name/blog/20090129234603
> > 
> > Bad Block Log
> > -------------
> [trim /]
> > Bitmap of non-sync regions.
> > ---------------------------
> [trim /]
> 
> It occurred to me that if you go to the trouble (and space and performance)
> to create and maintain metadata for lists of bad blocks, and separate
> metadata for sync status aka "trim", or hot-replace status, or reshape-status,
> or whatever features are dreamt up later, why not create an infrastructure to
> carry all of it efficiently?
> 
> David Brown suggested a multi-level metadata structure.  I concur, but somewhat
> more generic:
> 	Level 1:  Coarse bitmap, set bit indicates 'look at level 2'
> 	Level 2:  Fine bitmap, set bit indicates 'look at level 3'
> 	Level 3:  Extent list, with starting block, length, and feature payload
> 
> The bitmap levels are purely for hot-path performance.
> 
> As an option, it should be possible to spread the detailed metadata through the
> data area, possibly in chunk-sized areas spread out at some user-defined
> interval.  "meta-span", perhaps.  Then resizing partitions that compose an
> array would be less likely to bump up against metadata size limits.  The coarse
> bitmap should stay near the superblock, of course.

This is starting to sound a lot more like a filesystem than a RAID system.

I really don't want there to be so much metadata that I am tempted to spread
it out among the data.  I think that implies too much complexity.

Maybe that is a good place to draw the line:  If some metadata doesn't fit
easily at the start of end of the devices, it has no place in RAID - you
should add it to a filesystem instead.


> 
> Personally, I'd like to see the bad-block feature actually perform block
> remapping, much like hard drives themselves do, but with the option to unmap the
> block if a later write succeeds.  Using one retry per array restart as you
> described makes a lot of sense.  In any case, remapping would retain redundancy
> where applicable short of full drive failure or remap overflow.

If the hard drives already do this, why should md try to do it as well??
If a hard drive has had some many write errors that it has used up all of its
spare space, then it is long past time to replace it.


> 
> My $0.02, of course.

Here in .au, the smallest legal tender is $0.05 - but thanks anyway :-)

NeilBrown

> 
> Phil


  reply	other threads:[~2011-02-16 21:44 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-16 10:27 md road-map: 2011 NeilBrown
2011-02-16 11:28 ` Giovanni Tessore
2011-02-16 13:40   ` Roberto Spadim
2011-02-16 14:00     ` Robin Hill
2011-02-16 14:09       ` Roberto Spadim
2011-02-16 14:21         ` Roberto Spadim
2011-02-16 21:55           ` NeilBrown
2011-02-17  1:30             ` Roberto Spadim
2011-02-16 14:13 ` Joe Landman
2011-02-16 21:24   ` NeilBrown
2011-02-16 21:44     ` Roman Mamedov
2011-02-16 21:59       ` NeilBrown
2011-02-17  0:48         ` Phil Turmel
2011-02-16 22:12       ` Joe Landman
2011-02-16 15:42 ` David Brown
2011-02-16 21:35   ` NeilBrown
2011-02-16 22:34     ` David Brown
2011-02-16 23:01       ` NeilBrown
2011-02-17  0:30         ` David Brown
2011-02-17  0:55           ` NeilBrown
2011-02-17  1:04           ` Keld Jørn Simonsen
2011-02-17 10:45             ` David Brown
2011-02-17 10:58               ` Keld Jørn Simonsen
2011-02-17 11:45                 ` Giovanni Tessore
2011-02-17 15:44                   ` Keld Jørn Simonsen
2011-02-17 16:22                     ` Roberto Spadim
2011-02-18  0:13                     ` Giovanni Tessore
2011-02-18  2:56                       ` Keld Jørn Simonsen
2011-02-18  4:27                         ` Roberto Spadim
2011-02-18  9:47                         ` Giovanni Tessore
2011-02-18 18:43                           ` Keld Jørn Simonsen
2011-02-18 19:00                             ` Roberto Spadim
2011-02-18 19:18                               ` Keld Jørn Simonsen
2011-02-18 19:22                                 ` Roberto Spadim
2011-02-16 17:20 ` Joe Landman
2011-02-16 21:36   ` NeilBrown
2011-02-16 19:37 ` Phil Turmel
2011-02-16 21:44   ` NeilBrown [this message]
2011-02-17  0:11     ` Phil Turmel
2011-02-16 20:29 ` Piergiorgio Sartor
2011-02-16 21:48   ` NeilBrown
2011-02-16 22:53     ` Piergiorgio Sartor
2011-02-17  0:24     ` Phil Turmel
2011-02-17  0:52       ` NeilBrown
2011-02-17  1:14         ` Phil Turmel
2011-02-17  3:10           ` NeilBrown
2011-02-17 18:46             ` Phil Turmel
2011-02-17 21:04             ` Mr. James W. Laferriere
2011-02-18  1:48               ` NeilBrown
2011-02-17 19:56           ` Piergiorgio Sartor
2011-02-16 22:50 ` Keld Jørn Simonsen
2011-02-23  5:06 ` Daniel Reurich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110217084403.46b4d8ee@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=philip@turmel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).