From: NeilBrown <neilb@suse.de>
To: linux-raid@vger.kernel.org
Subject: [md PATCH 29/36] md/raid10: avoid writing to known bad blocks on known bad drives.
Date: Thu, 21 Jul 2011 12:58:50 +1000 [thread overview]
Message-ID: <20110721025850.8422.33853.stgit@notabene.brown> (raw)
In-Reply-To: <20110721024556.8422.99443.stgit@notabene.brown>
Writing to known bad blocks on drives that have seen a write error
is asking for trouble. So try to avoid these blocks.
Signed-off-by: NeilBrown <neilb@suse.de>
---
drivers/md/raid10.c | 105 +++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 93 insertions(+), 12 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7f11924..7dcd318 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -807,6 +807,8 @@ static int make_request(mddev_t *mddev, struct bio * bio)
unsigned long flags;
mdk_rdev_t *blocked_rdev;
int plugged;
+ int sectors_handled;
+ int max_sectors;
if (unlikely(bio->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bio);
@@ -895,7 +897,6 @@ static int make_request(mddev_t *mddev, struct bio * bio)
/*
* read balancing logic:
*/
- int max_sectors;
int disk;
int slot;
@@ -925,8 +926,6 @@ read_again:
/* Could not read all from this device, so we will
* need another r10_bio.
*/
- int sectors_handled;
-
sectors_handled = (r10_bio->sectors + max_sectors
- bio->bi_sector);
r10_bio->sectors = max_sectors;
@@ -963,13 +962,22 @@ read_again:
/* first select target devices under rcu_lock and
* inc refcount on their rdev. Record them by setting
* bios[x] to bio
+ * If there are known/acknowledged bad blocks on any device
+ * on which we have seen a write error, we want to avoid
+ * writing to those blocks. This potentially requires several
+ * writes to write around the bad blocks. Each set of writes
+ * gets its own r10_bio with a set of bios attached. The number
+ * of r10_bios is recored in bio->bi_phys_segments just as with
+ * the read case.
*/
plugged = mddev_check_plugged(mddev);
raid10_find_phys(conf, r10_bio);
- retry_write:
+retry_write:
blocked_rdev = NULL;
rcu_read_lock();
+ max_sectors = r10_bio->sectors;
+
for (i = 0; i < conf->copies; i++) {
int d = r10_bio->devs[i].devnum;
mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[d].rdev);
@@ -978,13 +986,55 @@ read_again:
blocked_rdev = rdev;
break;
}
- if (rdev && !test_bit(Faulty, &rdev->flags)) {
- atomic_inc(&rdev->nr_pending);
- r10_bio->devs[i].bio = bio;
- } else {
- r10_bio->devs[i].bio = NULL;
+ r10_bio->devs[i].bio = NULL;
+ if (!rdev || test_bit(Faulty, &rdev->flags)) {
set_bit(R10BIO_Degraded, &r10_bio->state);
+ continue;
+ }
+ if (test_bit(WriteErrorSeen, &rdev->flags)) {
+ sector_t first_bad;
+ sector_t dev_sector = r10_bio->devs[i].addr;
+ int bad_sectors;
+ int is_bad;
+
+ is_bad = is_badblock(rdev, dev_sector,
+ max_sectors,
+ &first_bad, &bad_sectors);
+ if (is_bad < 0) {
+ /* Mustn't write here until the bad block
+ * is acknowledged
+ */
+ atomic_inc(&rdev->nr_pending);
+ set_bit(BlockedBadBlocks, &rdev->flags);
+ blocked_rdev = rdev;
+ break;
+ }
+ if (is_bad && first_bad <= dev_sector) {
+ /* Cannot write here at all */
+ bad_sectors -= (dev_sector - first_bad);
+ if (bad_sectors < max_sectors)
+ /* Mustn't write more than bad_sectors
+ * to other devices yet
+ */
+ max_sectors = bad_sectors;
+ /* We don't set R10BIO_Degraded as that
+ * only applies if the disk is missing,
+ * so it might be re-added, and we want to
+ * know to recover this chunk.
+ * In this case the device is here, and the
+ * fact that this chunk is not in-sync is
+ * recorded in the bad block log.
+ */
+ continue;
+ }
+ if (is_bad) {
+ int good_sectors = first_bad - dev_sector;
+ if (good_sectors < max_sectors)
+ max_sectors = good_sectors;
+ }
}
+ r10_bio->devs[i].bio = bio;
+ atomic_inc(&rdev->nr_pending);
}
rcu_read_unlock();
@@ -1004,8 +1054,22 @@ read_again:
goto retry_write;
}
+ if (max_sectors < r10_bio->sectors) {
+ /* We are splitting this into multiple parts, so
+ * we need to prepare for allocating another r10_bio.
+ */
+ r10_bio->sectors = max_sectors;
+ spin_lock_irq(&conf->device_lock);
+ if (bio->bi_phys_segments == 0)
+ bio->bi_phys_segments = 2;
+ else
+ bio->bi_phys_segments++;
+ spin_unlock_irq(&conf->device_lock);
+ }
+ sectors_handled = r10_bio->sector + max_sectors - bio->bi_sector;
+
atomic_set(&r10_bio->remaining, 1);
- bitmap_startwrite(mddev->bitmap, bio->bi_sector, r10_bio->sectors, 0);
+ bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors, 0);
for (i = 0; i < conf->copies; i++) {
struct bio *mbio;
@@ -1014,10 +1078,12 @@ read_again:
continue;
mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
+ max_sectors);
r10_bio->devs[i].bio = mbio;
- mbio->bi_sector = r10_bio->devs[i].addr+
- conf->mirrors[d].rdev->data_offset;
+ mbio->bi_sector = (r10_bio->devs[i].addr+
+ conf->mirrors[d].rdev->data_offset);
mbio->bi_bdev = conf->mirrors[d].rdev->bdev;
mbio->bi_end_io = raid10_end_write_request;
mbio->bi_rw = WRITE | do_sync | do_fua;
@@ -1042,6 +1108,21 @@ read_again:
/* In case raid10d snuck in to freeze_array */
wake_up(&conf->wait_barrier);
+ if (sectors_handled < (bio->bi_size >> 9)) {
+ /* We need another r1_bio. It has already been counted
+ * in bio->bi_phys_segments.
+ */
+ r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
+
+ r10_bio->master_bio = bio;
+ r10_bio->sectors = (bio->bi_size >> 9) - sectors_handled;
+
+ r10_bio->mddev = mddev;
+ r10_bio->sector = bio->bi_sector + sectors_handled;
+ r10_bio->state = 0;
+ goto retry_write;
+ }
+
if (do_sync || !mddev->bitmap || !plugged)
md_wakeup_thread(mddev->thread);
return 0;
next prev parent reply other threads:[~2011-07-21 2:58 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-21 2:58 [md PATCH 00/36] md patches for 3.1 - part 2: bad block logs NeilBrown
2011-07-21 2:58 ` [md PATCH 02/36] md/bad-block-log: add sysfs interface for accessing bad-block-log NeilBrown
2011-07-22 15:43 ` Namhyung Kim
2011-07-26 2:29 ` NeilBrown
2011-07-26 5:17 ` Namhyung Kim
2011-07-26 8:48 ` Namhyung Kim
2011-07-26 15:03 ` [PATCH v2] md: add documentation for bad block log Namhyung Kim
2011-07-27 1:05 ` [md PATCH 02/36] md/bad-block-log: add sysfs interface for accessing bad-block-log NeilBrown
2011-07-21 2:58 ` [md PATCH 01/36] md: beginnings of bad block management NeilBrown
2011-07-22 15:03 ` Namhyung Kim
2011-07-26 2:26 ` NeilBrown
2011-07-26 5:17 ` Namhyung Kim
2011-07-22 16:52 ` Namhyung Kim
2011-07-26 3:20 ` NeilBrown
2011-07-21 2:58 ` [md PATCH 03/36] md: don't allow arrays to contain devices with bad blocks NeilBrown
2011-07-22 15:47 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 04/36] md: load/store badblock list from v1.x metadata NeilBrown
2011-07-22 16:34 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 08/36] md: add 'write_error' flag to component devices NeilBrown
2011-07-26 15:22 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 13/36] md/raid1: Handle write errors by updating badblock log NeilBrown
2011-07-27 15:28 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 14/36] md/raid1: record badblocks found during resync etc NeilBrown
2011-07-27 15:39 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 07/36] md/raid1: avoid reading known bad blocks during resync NeilBrown
2011-07-26 14:25 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 05/36] md: Disable bad blocks and v0.90 metadata NeilBrown
2011-07-22 17:02 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 10/36] md/raid1: avoid writing to known-bad blocks on known-bad drives NeilBrown
2011-07-27 4:09 ` Namhyung Kim
2011-07-27 4:19 ` NeilBrown
2011-07-21 2:58 ` [md PATCH 12/36] md/raid1: store behind-write pages in bi_vecs NeilBrown
2011-07-27 15:16 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 09/36] md: make it easier to wait for bad blocks to be acknowledged NeilBrown
2011-07-26 16:04 ` Namhyung Kim
2011-07-27 1:18 ` NeilBrown
2011-07-21 2:58 ` [md PATCH 11/36] md/raid1: clear bad-block record when write succeeds NeilBrown
2011-07-27 5:05 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 06/36] md/raid1: avoid reading from known bad blocks NeilBrown
2011-07-26 14:06 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 16/36] md/raid1: factor several functions out or raid1d() NeilBrown
2011-07-27 15:55 ` Namhyung Kim
2011-07-28 1:39 ` NeilBrown
2011-07-21 2:58 ` [md PATCH 24/36] md/raid10: avoid reading from known bad blocks - part 1 NeilBrown
2011-07-21 2:58 ` [md PATCH 19/36] md/raid5: write errors should be recorded as bad blocks if possible NeilBrown
2011-07-21 2:58 ` [md PATCH 23/36] md/raid10: Split handle_read_error out from raid10d NeilBrown
2011-07-21 2:58 ` [md PATCH 18/36] md/raid5: use bad-block log to improve handling of uncorrectable read errors NeilBrown
2011-07-21 2:58 ` [md PATCH 22/36] md/raid10: simplify/reindent some loops NeilBrown
2011-07-21 2:58 ` [md PATCH 20/36] md/raid5. Don't write to known bad block on doubtful devices NeilBrown
2011-07-21 2:58 ` [md PATCH 15/36] md/raid1: improve handling of read failure during recovery NeilBrown
2011-07-27 15:45 ` Namhyung Kim
2011-07-21 2:58 ` [md PATCH 21/36] md/raid5: Clear bad blocks on successful write NeilBrown
2011-07-21 2:58 ` [md PATCH 17/36] md/raid5: avoid reading from known bad blocks NeilBrown
2011-07-21 2:58 ` [md PATCH 33/36] md/raid10: record bad blocks due to write errors during resync/recovery NeilBrown
2011-07-21 2:58 ` NeilBrown [this message]
2011-07-21 2:58 ` [md PATCH 27/36] md/raid10: avoid reading known bad blocks " NeilBrown
2011-07-21 2:58 ` [md PATCH 31/36] md/raid10: Handle write errors by updating badblock log NeilBrown
2011-07-21 2:58 ` [md PATCH 32/36] md/raid10: attempt to fix read errors during resync/check NeilBrown
2011-07-21 2:58 ` [md PATCH 26/36] md/raid10 - avoid reading from known bad blocks - part 3 NeilBrown
2011-07-21 2:58 ` [md PATCH 25/36] md/raid10: avoid reading from known bad blocks - part 2 NeilBrown
2011-07-21 2:58 ` [md PATCH 34/36] md/raid10: simplify read error handling during recovery NeilBrown
2011-07-21 2:58 ` [md PATCH 28/36] md/raid10 record bad blocks as needed " NeilBrown
2011-07-21 2:58 ` [md PATCH 30/36] md/raid10: clear bad-block record when write succeeds NeilBrown
2011-07-21 2:58 ` [md PATCH 36/36] md/raid10: handle further errors during fix_read_error better NeilBrown
2011-07-21 2:58 ` [md PATCH 35/36] md/raid10: Handle read errors during recovery better NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110721025850.8422.33853.stgit@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).