linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brett Russ <bruss@netezza.com>
To: linux-raid@vger.kernel.org
Subject: Re: [md PATCH 00/16] bad block list management for md and RAID1
Date: Thu, 17 Jun 2010 08:48:07 -0400	[thread overview]
Message-ID: <hvd5iq$u83$1@dough.gmane.org> (raw)
In-Reply-To: <20100606235833.13302.60932.stgit@notabene.brown>

On 06/06/2010 08:07 PM, NeilBrown wrote:
> The goal of these patches is to add a 'bad block list' to each device
> and use it to allow us to fail single blocks rather than whole
> devices.

Hi Neil,

This is a worthwhile addition, I think.  However, one concern we have is 
there appears to be no distinction between media errors (i.e. bad 
blocks) and other SCSI errors.  One situation we commonly see in the 
enterprise is non-media SCSI errors due to i.e. path failure.  We've 
tested dm multipath as a solution for that but it has its own problems, 
primarily performance due to its apparent decomposition of large 
contiguous I/Os into smaller I/Os and we're investigating that.  Until 
that is fixed, we have patched md to retry failed writes (md already has 
a mechanism for failed reads).  Commonly these retries will succeed as 
many of the path failures we've seen have been transient (i.e. a SAS 
expander undergoes a reset).  Today in the vanilla md code that would 
cause a drive failure.  In this patch, it would identify a range of 
blocks as bad.  Presumably later they might be revalidated and removed 
from the bad block list if the original error(s) were in fact transient, 
but in the meantime we lose that member from any reads.

As an aside, it would be handy to have mechanisms exposed to userspace 
(via mdadm) to display, test, and possibly override the memory of these 
bad blocks such that in these instances where md has (possibly 
incorrectly) forced a range of blocks unavailable on a member that we 
can recover data if the automated recovery doesn't succeed.

Do you have thoughts or plans to behave differently based on the type of 
error?  I believe today the SCSI layer only provides pass/fail, is that 
correct?  If so, plumbing would need to be added to make the upper layer 
aware of the nature of the failure.  It seems that the bad block 
management in md should only take effect for media errors and that there 
should be more intelligent handling of other types of errors.  We would 
be happy to help in this area if it aligns with your/the community's 
longer term view of things.

Thanks,
Brett


  parent reply	other threads:[~2010-06-17 12:48 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-07  0:07 [md PATCH 00/16] bad block list management for md and RAID1 NeilBrown
2010-06-07  0:07 ` [md PATCH 02/16] md/bad-block-log: add sysfs interface for accessing bad-block-log NeilBrown
2010-06-07  0:07 ` [md PATCH 01/16] md: beginnings of bad block management NeilBrown
2010-06-07  0:07 ` [md PATCH 04/16] md: load/store badblock list from v1.x metadata NeilBrown
2010-06-07  0:07 ` [md PATCH 03/16] md: don't allow arrays to contain devices with bad blocks NeilBrown
2010-06-07  0:07 ` [md PATCH 06/16] md/raid1: clean up read_balance NeilBrown
2010-06-07  0:07 ` [md PATCH 07/16] md: simplify raid10 read_balance NeilBrown
2010-06-07  0:07 ` [md PATCH 05/16] md: reject devices with bad blocks and v0.90 metadata NeilBrown
2010-06-07  0:07 ` [md PATCH 08/16] md/raid1: avoid reading from known bad blocks NeilBrown
2010-06-07  0:07 ` [md PATCH 09/16] md/raid1: avoid reading known bad blocks during resync NeilBrown
2010-06-07  0:07 ` [md PATCH 11/16] md/multipath: discard ->working_disks in favour of ->degraded NeilBrown
2010-06-07  0:07 ` [md PATCH 12/16] md: make error_handler functions more uniform and correct NeilBrown
2010-06-07  0:07 ` [md PATCH 10/16] md: add 'write_error' flag to component devices NeilBrown
2010-06-07  0:07 ` [md PATCH 15/16] md/raid1: clear bad-block record when write succeeds NeilBrown
2010-06-07  0:07 ` [md PATCH 13/16] md: make it easier to wait for bad blocks to be acknowledged NeilBrown
2010-06-07  0:07 ` [md PATCH 14/16] md/raid1: avoid writing to known-bad blocks on known-bad drives NeilBrown
2010-06-07  0:07 ` [md PATCH 16/16] md/raid1: Handle write errors by updating badblock log NeilBrown
2010-06-07  0:28 ` [md PATCH 00/16] bad block list management for md and RAID1 Berkey B Walker
2010-06-07 22:18   ` Stefan /*St0fF*/ Hübner
2010-06-17 12:48 ` Brett Russ [this message]
2010-06-17 15:53   ` Graham Mitchell
2010-06-18  3:58     ` Neil Brown
2010-06-18  4:30       ` Graham Mitchell
2010-06-18  3:23   ` Neil Brown
     [not found]     ` <4C1BABC4.3020008@tmr.com>
2010-06-29  5:06       ` Neil Brown
2010-06-29 16:54         ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='hvd5iq$u83$1@dough.gmane.org' \
    --to=bruss@netezza.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).