Re: md road-map: 2011 - Keld Jørn Simonsen

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Keld Jørn Simonsen" <keld@keldix.com>
To: David Brown <david@westcontrol.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: md road-map: 2011
Date: Thu, 17 Feb 2011 11:58:15 +0100	[thread overview]
Message-ID: <20110217105815.GA24580@www2.open-std.org> (raw)
In-Reply-To: <ijiu99$ill$1@dough.gmane.org>

On Thu, Feb 17, 2011 at 11:45:35AM +0100, David Brown wrote:
> On 17/02/2011 02:04, Keld Jørn Simonsen wrote:
> >On Thu, Feb 17, 2011 at 01:30:49AM +0100, David Brown wrote:
> >>On 17/02/11 00:01, NeilBrown wrote:
> >>>On Wed, 16 Feb 2011 23:34:43 +0100 David Brown<david.brown@hesbynett.no>
> >>>wrote:
> >>>
> >>>>I thought there was some mechanism for block devices to report bad
> >>>>blocks back to the file system, and that file systems tracked bad block
> >>>>lists.  Modern drives automatically relocate bad blocks (at least, they
> >>>>do if they can), but there was a time when they did not and it was up to
> >>>>the file system to track these.  Whether that still applies to modern
> >>>>file systems, I do not know - they only file system I have studied in
> >>>>low-level detail is FAT16.
> >>>
> >>>When the block device reports an error the filesystem can certainly 
> >>>record
> >>>that information in a bad-block list, and possibly does.
> >>>
> >>>However I thought you were suggesting a situation where the block device
> >>>could succeed with the request, but knew that area of the device was of 
> >>>low
> >>>quality.
> >>
> >>I guess that is what I was trying to suggest, though not very clearly.
> >>
> >>>e.g. IO to a block on a stripe which had one 'bad block'.  The IO should
> >>>succeed, but the data isn't as safe as elsewhere.  It would be nice if we
> >>>could tell the filesystem that fact, and if it could make use of it. But 
> >>>we
> >>>currently cannot.   We can say "success" or "failure", but we cannot say
> >>>"success, but you might not be so lucky next time".
> >>>
> >>
> >>Do filesystems re-try reads when there is a failure?  Could you return
> >>fail on one read, then success on a re-read, which could be interpreted
> >>as "dying, but not yet dead" by the file system?
> >
> >This should not be a file system feature. The file system is built upon
> >the raid, and in mirrorred raid types like raid1 and raid10, and also
> >other raid types, you cannot be sure which specific drive and sector the
> >data was read from - it could be one out of many (typically two) places.
> >So the bad blocks of a raid is a feature of the raid and its individual
> >drives, not the file system. If it was a property of the file system,
> >then the fs should be aware of the underlying raid topology, and know if
> >this was a parity block or data block of raid5 or raid6, or which
> >mirror instance of a raid1/10 type which  was involved.
> >
> 
> Thanks for the explanation.
> 
> I guess my worry is that if md layer has tracked a bad block on a disk, 
> then that stripe will be in a degraded mode.  It's great that it will 
> still work, and it's great that the bad block list means that it is 
> /only/ that stripe that is degraded - not the whole raid.

I am proposing that the stripe not be degraded, using a recovery area for bad
blocks on the disk, that goes together with the metadata area.

> But I'm hoping there can be some sort of relocation somewhere 
> (ultimately it doesn't matter if it is handled by the file system, or by 
> md for the whole stripe, or by md for just that disk block, or by the 
> disk itself), so that you can get raid protection again for that stripe.

I think we agree in hoping:-)

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-02-17 10:58 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-16 10:27 md road-map: 2011 NeilBrown
2011-02-16 11:28 ` Giovanni Tessore
2011-02-16 13:40   ` Roberto Spadim
2011-02-16 14:00     ` Robin Hill
2011-02-16 14:09       ` Roberto Spadim
2011-02-16 14:21         ` Roberto Spadim
2011-02-16 21:55           ` NeilBrown
2011-02-17  1:30             ` Roberto Spadim
2011-02-16 14:13 ` Joe Landman
2011-02-16 21:24   ` NeilBrown
2011-02-16 21:44     ` Roman Mamedov
2011-02-16 21:59       ` NeilBrown
2011-02-17  0:48         ` Phil Turmel
2011-02-16 22:12       ` Joe Landman
2011-02-16 15:42 ` David Brown
2011-02-16 21:35   ` NeilBrown
2011-02-16 22:34     ` David Brown
2011-02-16 23:01       ` NeilBrown
2011-02-17  0:30         ` David Brown
2011-02-17  0:55           ` NeilBrown
2011-02-17  1:04           ` Keld Jørn Simonsen
2011-02-17 10:45             ` David Brown
2011-02-17 10:58               ` Keld Jørn Simonsen [this message]
2011-02-17 11:45                 ` Giovanni Tessore
2011-02-17 15:44                   ` Keld Jørn Simonsen
2011-02-17 16:22                     ` Roberto Spadim
2011-02-18  0:13                     ` Giovanni Tessore
2011-02-18  2:56                       ` Keld Jørn Simonsen
2011-02-18  4:27                         ` Roberto Spadim
2011-02-18  9:47                         ` Giovanni Tessore
2011-02-18 18:43                           ` Keld Jørn Simonsen
2011-02-18 19:00                             ` Roberto Spadim
2011-02-18 19:18                               ` Keld Jørn Simonsen
2011-02-18 19:22                                 ` Roberto Spadim
2011-02-16 17:20 ` Joe Landman
2011-02-16 21:36   ` NeilBrown
2011-02-16 19:37 ` Phil Turmel
2011-02-16 21:44   ` NeilBrown
2011-02-17  0:11     ` Phil Turmel
2011-02-16 20:29 ` Piergiorgio Sartor
2011-02-16 21:48   ` NeilBrown
2011-02-16 22:53     ` Piergiorgio Sartor
2011-02-17  0:24     ` Phil Turmel
2011-02-17  0:52       ` NeilBrown
2011-02-17  1:14         ` Phil Turmel
2011-02-17  3:10           ` NeilBrown
2011-02-17 18:46             ` Phil Turmel
2011-02-17 21:04             ` Mr. James W. Laferriere
2011-02-18  1:48               ` NeilBrown
2011-02-17 19:56           ` Piergiorgio Sartor
2011-02-16 22:50 ` Keld Jørn Simonsen
2011-02-23  5:06 ` Daniel Reurich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110217105815.GA24580@www2.open-std.org \
    --to=keld@keldix.com \
    --cc=david@westcontrol.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).