From: NeilBrown <neilb@suse.de>
To: linux-raid@vger.kernel.org
Subject: [md PATCH 17/36] md/raid5: avoid reading from known bad blocks.
Date: Thu, 21 Jul 2011 12:58:49 +1000 [thread overview]
Message-ID: <20110721025849.8422.92268.stgit@notabene.brown> (raw)
In-Reply-To: <20110721024556.8422.99443.stgit@notabene.brown>
There are two times that we might read in raid5:
1/ when a read request fits within a chunk on a single
working device.
In this case, if there is any bad block in the range of
the read, we simply fail the cache-bypass read and
perform the read though the stripe cache.
2/ when reading into the stripe cache. In this case we
mark as failed any device which has a bad block in that
strip (1 page wide).
Note that we will both avoid reading and avoid writing.
This is correct (as we will never read from the block, there
is no point writing), but not optimal (as writing could 'fix'
the error) - that will be addressed later.
If we have not seen any write errors on the device yet, we treat a bad
block like a recent read error. This will encourage an attempt to fix
the read error which will either generate a write error, or will
ensure good data is stored there. We don't yet forget the bad block
in that case. That comes later.
Now that we honour bad blocks when reading we can allow devices with
bad blocks into the array.
Signed-off-by: NeilBrown <neilb@suse.de>
---
drivers/md/raid5.c | 46 ++++++++++++++++++++++++++++++++--------------
1 files changed, 32 insertions(+), 14 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7a3fb72..52bedc6 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2916,6 +2916,9 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
spin_lock_irq(&conf->device_lock);
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
+ sector_t first_bad;
+ int bad_sectors;
+ int is_bad = 0;
dev = &sh->dev[i];
@@ -2952,15 +2955,32 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
if (dev->written)
s->written++;
rdev = rcu_dereference(conf->disks[i].rdev);
- if (s->blocked_rdev == NULL &&
- rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
- s->blocked_rdev = rdev;
- atomic_inc(&rdev->nr_pending);
+ if (rdev) {
+ is_bad = is_badblock(rdev, sh->sector, STRIPE_SECTORS,
+ &first_bad, &bad_sectors);
+ if (s->blocked_rdev == NULL
+ && (test_bit(Blocked, &rdev->flags)
+ || is_bad < 0)) {
+ if (is_bad < 0)
+ set_bit(BlockedBadBlocks,
+ &rdev->flags);
+ s->blocked_rdev = rdev;
+ atomic_inc(&rdev->nr_pending);
+ }
}
clear_bit(R5_Insync, &dev->flags);
if (!rdev)
/* Not in-sync */;
- else if (test_bit(In_sync, &rdev->flags))
+ else if (is_bad) {
+ /* also not in-sync */
+ if (!test_bit(WriteErrorSeen, &rdev->flags)) {
+ /* treat as in-sync, but with a read error
+ * which we can now try to correct
+ */
+ set_bit(R5_Insync, &dev->flags);
+ set_bit(R5_ReadError, &dev->flags);
+ }
+ } else if (test_bit(In_sync, &rdev->flags))
set_bit(R5_Insync, &dev->flags);
else {
/* in sync if before recovery_offset */
@@ -3471,6 +3491,9 @@ static int chunk_aligned_read(mddev_t *mddev, struct bio * raid_bio)
rcu_read_lock();
rdev = rcu_dereference(conf->disks[dd_idx].rdev);
if (rdev && test_bit(In_sync, &rdev->flags)) {
+ sector_t first_bad;
+ int bad_sectors;
+
atomic_inc(&rdev->nr_pending);
rcu_read_unlock();
raid_bio->bi_next = (void*)rdev;
@@ -3478,8 +3501,10 @@ static int chunk_aligned_read(mddev_t *mddev, struct bio * raid_bio)
align_bi->bi_flags &= ~(1 << BIO_SEG_VALID);
align_bi->bi_sector += rdev->data_offset;
- if (!bio_fits_rdev(align_bi)) {
- /* too big in some way */
+ if (!bio_fits_rdev(align_bi) ||
+ is_badblock(rdev, align_bi->bi_sector, align_bi->bi_size>>9,
+ &first_bad, &bad_sectors)) {
+ /* too big in some way, or has a known bad block */
bio_put(align_bi);
rdev_dec_pending(rdev, mddev);
return 0;
@@ -4671,10 +4696,6 @@ static int run(mddev_t *mddev)
* 0 for a fully functional array, 1 or 2 for a degraded array.
*/
list_for_each_entry(rdev, &mddev->disks, same_set) {
- if (rdev->badblocks.count) {
- printk(KERN_ERR "md/raid5: cannot handle bad blocks yet\n");
- goto abort;
- }
if (rdev->raid_disk < 0)
continue;
if (test_bit(In_sync, &rdev->flags)) {
@@ -4983,9 +5004,6 @@ static int raid5_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
int first = 0;
int last = conf->raid_disks - 1;
- if (rdev->badblocks.count)
- return -EINVAL;
-
if (has_failed(conf))
/* no point adding a device */
return -EINVAL;
next prev parent reply other threads:[~2011-07-21 2:58 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-21 2:58 [md PATCH 00/36] md patches for 3.1 - part 2: bad block logs NeilBrown
2011-07-21 2:58 ` [md PATCH 02/36] md/bad-block-log: add sysfs interface for accessing bad-block-log NeilBrown
2011-07-22 15:43 ` Namhyung Kim
2011-07-26 2:29 ` NeilBrown
2011-07-26 5:17 ` Namhyung Kim
2011-07-26 8:48 ` Namhyung Kim
2011-07-26 15:03 ` [PATCH v2] md: add documentation for bad block log Namhyung Kim
2011-07-27 1:05 ` [md PATCH 02/36] md/bad-block-log: add sysfs interface for accessing bad-block-log NeilBrown
2011-07-21 2:58 ` [md PATCH 01/36] md: beginnings of bad block management NeilBrown
2011-07-22 15:03 ` Namhyung Kim
2011-07-26 2:26 ` NeilBrown
2011-07-26 5:17 ` Namhyung Kim
2011-07-22 16:52 ` Namhyung Kim
2011-07-26 3:20 ` NeilBrown
2011-07-21 2:58 ` [md PATCH 03/36] md: don't allow arrays to contain devices with bad blocks NeilBrown
2011-07-22 15:47 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 04/36] md: load/store badblock list from v1.x metadata NeilBrown
2011-07-22 16:34 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 08/36] md: add 'write_error' flag to component devices NeilBrown
2011-07-26 15:22 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 13/36] md/raid1: Handle write errors by updating badblock log NeilBrown
2011-07-27 15:28 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 07/36] md/raid1: avoid reading known bad blocks during resync NeilBrown
2011-07-26 14:25 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 14/36] md/raid1: record badblocks found during resync etc NeilBrown
2011-07-27 15:39 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 05/36] md: Disable bad blocks and v0.90 metadata NeilBrown
2011-07-22 17:02 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 10/36] md/raid1: avoid writing to known-bad blocks on known-bad drives NeilBrown
2011-07-27 4:09 ` Namhyung Kim
2011-07-27 4:19 ` NeilBrown
2011-07-21 2:58 ` [md PATCH 09/36] md: make it easier to wait for bad blocks to be acknowledged NeilBrown
2011-07-26 16:04 ` Namhyung Kim
2011-07-27 1:18 ` NeilBrown
2011-07-21 2:58 ` [md PATCH 12/36] md/raid1: store behind-write pages in bi_vecs NeilBrown
2011-07-27 15:16 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 11/36] md/raid1: clear bad-block record when write succeeds NeilBrown
2011-07-27 5:05 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 06/36] md/raid1: avoid reading from known bad blocks NeilBrown
2011-07-26 14:06 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 24/36] md/raid10: avoid reading from known bad blocks - part 1 NeilBrown
2011-07-21 2:58 ` [md PATCH 16/36] md/raid1: factor several functions out or raid1d() NeilBrown
2011-07-27 15:55 ` Namhyung Kim
2011-07-28 1:39 ` NeilBrown
2011-07-21 2:58 ` [md PATCH 19/36] md/raid5: write errors should be recorded as bad blocks if possible NeilBrown
2011-07-21 2:58 ` [md PATCH 23/36] md/raid10: Split handle_read_error out from raid10d NeilBrown
2011-07-21 2:58 ` [md PATCH 18/36] md/raid5: use bad-block log to improve handling of uncorrectable read errors NeilBrown
2011-07-21 2:58 ` [md PATCH 22/36] md/raid10: simplify/reindent some loops NeilBrown
2011-07-21 2:58 ` [md PATCH 20/36] md/raid5. Don't write to known bad block on doubtful devices NeilBrown
2011-07-21 2:58 ` [md PATCH 15/36] md/raid1: improve handling of read failure during recovery NeilBrown
2011-07-27 15:45 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 21/36] md/raid5: Clear bad blocks on successful write NeilBrown
2011-07-21 2:58 ` NeilBrown [this message]
2011-07-21 2:58 ` [md PATCH 33/36] md/raid10: record bad blocks due to write errors during resync/recovery NeilBrown
2011-07-21 2:58 ` [md PATCH 29/36] md/raid10: avoid writing to known bad blocks on known bad drives NeilBrown
2011-07-21 2:58 ` [md PATCH 27/36] md/raid10: avoid reading known bad blocks during resync/recovery NeilBrown
2011-07-21 2:58 ` [md PATCH 31/36] md/raid10: Handle write errors by updating badblock log NeilBrown
2011-07-21 2:58 ` [md PATCH 32/36] md/raid10: attempt to fix read errors during resync/check NeilBrown
2011-07-21 2:58 ` [md PATCH 26/36] md/raid10 - avoid reading from known bad blocks - part 3 NeilBrown
2011-07-21 2:58 ` [md PATCH 28/36] md/raid10 record bad blocks as needed during recovery NeilBrown
2011-07-21 2:58 ` [md PATCH 25/36] md/raid10: avoid reading from known bad blocks - part 2 NeilBrown
2011-07-21 2:58 ` [md PATCH 34/36] md/raid10: simplify read error handling during recovery NeilBrown
2011-07-21 2:58 ` [md PATCH 30/36] md/raid10: clear bad-block record when write succeeds NeilBrown
2011-07-21 2:58 ` [md PATCH 36/36] md/raid10: handle further errors during fix_read_error better NeilBrown
2011-07-21 2:58 ` [md PATCH 35/36] md/raid10: Handle read errors during recovery better NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110721025849.8422.92268.stgit@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).