[md PATCH 00/36] md patches for 3.1 - part 2: bad block logs

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: linux-raid@vger.kernel.org
Subject: [md PATCH 00/36] md patches for 3.1 - part 2: bad block logs
Date: Thu, 21 Jul 2011 12:58:47 +1000	[thread overview]
Message-ID: <20110721024556.8422.99443.stgit@notabene.brown> (raw)

As promised this is the second of 2 patch-bombs full of patches
that I plan to submit for linux-3.1

While the first set was a varied assortment, these all have a very
strong theme.
This patch set implements a bad-block-log for RAID1, RAID456 and
RAID10.
i.e. the first thing on my "TODO list":
           http://neil.brown.name/blog/20110216044002


On v1.x metadata arrays created with a patched mdadm (which I'll post
a pointer to later) 4K of space is reserved to store a list of
known bad blocks.  When md hits an error, it can now fail just the
block instead of failing the whole device.  This should mean more
graceful failure modes when devices are producing bad blocks.

I have tested these a reasonable amount (and found a few bugs in the
process) but more testing is needed.  One difficulty with testing is
that you need the device to fail occasionally to exercise some of this
code.

One of my tests is below.  It inserts a 'faulty' md device between
the RAID5 and each real device and configures two of them to generate
persistent write errors at different rates.  The first "mkfs" causes
lots of bad blocks to get logged.  The second "mkfs" (after the
'faulty' targets are cleared and flushed) results in all those bad
blocks being successfully repaired and forgotten.
There are obviously lots of other combinations worth testing.

Testing both with the new mdadm and with the old one (or with 0.90
metadata which won't store bad-block lists) would be helpful.

Again, genuine "Reviewed-by" line are very welcome and will be added
if received before I submit this to Linus.

Thanks,
NeilBrown

(from the mdadm man page for "--grow" for faulty arrays:

              When setting the failure mode for level faulty, the options are:
              write-transient, wt, read-transient, rt,  write-persistent,  wp,
              read-persistent,  rp, write-all, read-fixable, rf, clear, flush,
              none.

              Each failure mode can be followed by a number, which is used  as
              a  period between fault generation.  Without a number, the fault
              is generated once on the first relevant request.  With a number,
              the  fault  will be generated after that many requests, and will
              continue to be generated every time the period elapses.

              Multiple failure modes can be current  simultaneously  by  using
              the --grow option to set subsequent failure modes.

              "clear"  or  "none"  will remove any pending or periodic failure
              modes, and "flush" will clear any persistent faults.
)

# test badblock code

mdadm -Ss
mdadm -B /dev/md10 -l faulty -n 1 /dev/sda
mdadm -B /dev/md11 -l faulty -n 1 /dev/sdb
mdadm -B /dev/md12 -l faulty -n 1 /dev/sdc
mdadm -B /dev/md13 -l faulty -n 1 /dev/sdd
./mdadm -CR /dev/md0 -l5 -n4 /dev/md1[0123] --assume-clean

mdadm -G /dev/md10 -l faulty -p wp8000
mdadm -G /dev/md11 -l faulty -p wp7000

mkfs /dev/md0

grep . /sys/block/md0/md/rd?/bad*

mdadm -S /dev/md0
mdadm -G /dev/md10 -l faulty -p clear
mdadm -G /dev/md10 -l faulty -p flush
mdadm -G /dev/md11 -l faulty -p clear
mdadm -G /dev/md11 -l faulty -p flush

mdadm -A /dev/md0 /dev/md1[0123]
mkfs /dev/md0
grep . /sys/block/md0/md/rd?/bad*



---

NeilBrown (36):
      md/raid10: handle further errors during fix_read_error better.
      md/raid10: Handle read errors during recovery better.
      md/raid10: simplify read error handling during recovery.
      md/raid10: record bad blocks due to write errors during resync/recovery.
      md/raid10:  attempt to fix read errors during resync/check
      md/raid10:  Handle write errors by updating badblock log.
      md/raid10: clear bad-block record when write succeeds.
      md/raid10: avoid writing to known bad blocks on known bad drives.
      md/raid10 record bad blocks as needed during recovery.
      md/raid10: avoid reading known bad blocks during resync/recovery.
      md/raid10 - avoid reading from known bad blocks - part 3
      md/raid10: avoid reading from known bad blocks - part 2
      md/raid10: avoid reading from known bad blocks - part 1
      md/raid10: Split handle_read_error out from raid10d.
      md/raid10: simplify/reindent some loops.
      md/raid5: Clear bad blocks on successful write.
      md/raid5.  Don't write to known bad block on doubtful devices.
      md/raid5: write errors should be recorded as bad blocks if possible.
      md/raid5: use bad-block log to improve handling of uncorrectable read errors.
      md/raid5: avoid reading from known bad blocks.
      md/raid1: factor several functions out or raid1d()
      md/raid1: improve handling of read failure during recovery.
      md/raid1: record badblocks found during resync etc.
      md/raid1:  Handle write errors by updating badblock log.
      md/raid1: store behind-write pages in bi_vecs.
      md/raid1: clear bad-block record when write succeeds.
      md/raid1: avoid writing to known-bad blocks on known-bad drives.
      md: make it easier to wait for bad blocks to be acknowledged.
      md: add 'write_error' flag to component devices.
      md/raid1: avoid reading known bad blocks during resync
      md/raid1: avoid reading from known bad blocks.
      md: Disable bad blocks and v0.90 metadata.
      md: load/store badblock list from v1.x metadata
      md: don't allow arrays to contain devices with bad blocks.
      md/bad-block-log: add sysfs interface for accessing bad-block-log.
      md: beginnings of bad block management.


 drivers/md/md.c           |  838 ++++++++++++++++++++++++++++++++++++-
 drivers/md/md.h           |   83 ++++
 drivers/md/raid1.c        |  923 ++++++++++++++++++++++++++++++++---------
 drivers/md/raid1.h        |   20 +
 drivers/md/raid10.c       | 1015 ++++++++++++++++++++++++++++++++++++---------
 drivers/md/raid10.h       |   16 +
 drivers/md/raid5.c        |  183 +++++++-
 drivers/md/raid5.h        |   21 +
 include/linux/raid/md_p.h |   14 -
 9 files changed, 2637 insertions(+), 476 deletions(-)

-- 
Signature

next             reply	other threads:[~2011-07-21  2:58 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-21  2:58 NeilBrown [this message]
2011-07-21  2:58 ` [md PATCH 03/36] md: don't allow arrays to contain devices with bad blocks NeilBrown
2011-07-22 15:47   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 04/36] md: load/store badblock list from v1.x metadata NeilBrown
2011-07-22 16:34   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 01/36] md: beginnings of bad block management NeilBrown
2011-07-22 15:03   ` Namhyung Kim
2011-07-26  2:26     ` NeilBrown
2011-07-26  5:17       ` Namhyung Kim
2011-07-22 16:52   ` Namhyung Kim
2011-07-26  3:20     ` NeilBrown
2011-07-21  2:58 ` [md PATCH 02/36] md/bad-block-log: add sysfs interface for accessing bad-block-log NeilBrown
2011-07-22 15:43   ` Namhyung Kim
2011-07-26  2:29     ` NeilBrown
2011-07-26  5:17       ` Namhyung Kim
2011-07-26  8:48   ` Namhyung Kim
2011-07-26 15:03     ` [PATCH v2] md: add documentation for bad block log Namhyung Kim
2011-07-27  1:05     ` [md PATCH 02/36] md/bad-block-log: add sysfs interface for accessing bad-block-log NeilBrown
2011-07-21  2:58 ` [md PATCH 09/36] md: make it easier to wait for bad blocks to be acknowledged NeilBrown
2011-07-26 16:04   ` Namhyung Kim
2011-07-27  1:18     ` NeilBrown
2011-07-21  2:58 ` [md PATCH 12/36] md/raid1: store behind-write pages in bi_vecs NeilBrown
2011-07-27 15:16   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 10/36] md/raid1: avoid writing to known-bad blocks on known-bad drives NeilBrown
2011-07-27  4:09   ` Namhyung Kim
2011-07-27  4:19     ` NeilBrown
2011-07-21  2:58 ` [md PATCH 06/36] md/raid1: avoid reading from known bad blocks NeilBrown
2011-07-26 14:06   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 11/36] md/raid1: clear bad-block record when write succeeds NeilBrown
2011-07-27  5:05   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 07/36] md/raid1: avoid reading known bad blocks during resync NeilBrown
2011-07-26 14:25   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 14/36] md/raid1: record badblocks found during resync etc NeilBrown
2011-07-27 15:39   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 08/36] md: add 'write_error' flag to component devices NeilBrown
2011-07-26 15:22   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 13/36] md/raid1: Handle write errors by updating badblock log NeilBrown
2011-07-27 15:28   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 05/36] md: Disable bad blocks and v0.90 metadata NeilBrown
2011-07-22 17:02   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 20/36] md/raid5. Don't write to known bad block on doubtful devices NeilBrown
2011-07-21  2:58 ` [md PATCH 22/36] md/raid10: simplify/reindent some loops NeilBrown
2011-07-21  2:58 ` [md PATCH 18/36] md/raid5: use bad-block log to improve handling of uncorrectable read errors NeilBrown
2011-07-21  2:58 ` [md PATCH 17/36] md/raid5: avoid reading from known bad blocks NeilBrown
2011-07-21  2:58 ` [md PATCH 21/36] md/raid5: Clear bad blocks on successful write NeilBrown
2011-07-21  2:58 ` [md PATCH 15/36] md/raid1: improve handling of read failure during recovery NeilBrown
2011-07-27 15:45   ` Namhyung Kim
2011-07-21  2:58 ` [md PATCH 24/36] md/raid10: avoid reading from known bad blocks - part 1 NeilBrown
2011-07-21  2:58 ` [md PATCH 16/36] md/raid1: factor several functions out or raid1d() NeilBrown
2011-07-27 15:55   ` Namhyung Kim
2011-07-28  1:39     ` NeilBrown
2011-07-21  2:58 ` [md PATCH 23/36] md/raid10: Split handle_read_error out from raid10d NeilBrown
2011-07-21  2:58 ` [md PATCH 19/36] md/raid5: write errors should be recorded as bad blocks if possible NeilBrown
2011-07-21  2:58 ` [md PATCH 28/36] md/raid10 record bad blocks as needed during recovery NeilBrown
2011-07-21  2:58 ` [md PATCH 34/36] md/raid10: simplify read error handling " NeilBrown
2011-07-21  2:58 ` [md PATCH 25/36] md/raid10: avoid reading from known bad blocks - part 2 NeilBrown
2011-07-21  2:58 ` [md PATCH 32/36] md/raid10: attempt to fix read errors during resync/check NeilBrown
2011-07-21  2:58 ` [md PATCH 26/36] md/raid10 - avoid reading from known bad blocks - part 3 NeilBrown
2011-07-21  2:58 ` [md PATCH 31/36] md/raid10: Handle write errors by updating badblock log NeilBrown
2011-07-21  2:58 ` [md PATCH 30/36] md/raid10: clear bad-block record when write succeeds NeilBrown
2011-07-21  2:58 ` [md PATCH 29/36] md/raid10: avoid writing to known bad blocks on known bad drives NeilBrown
2011-07-21  2:58 ` [md PATCH 33/36] md/raid10: record bad blocks due to write errors during resync/recovery NeilBrown
2011-07-21  2:58 ` [md PATCH 27/36] md/raid10: avoid reading known bad blocks " NeilBrown
2011-07-21  2:58 ` [md PATCH 35/36] md/raid10: Handle read errors during recovery better NeilBrown
2011-07-21  2:58 ` [md PATCH 36/36] md/raid10: handle further errors during fix_read_error better NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110721024556.8422.99443.stgit@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).