From: Michael Tokarev <mjt@tls.msk.ru>
To: linux-raid@vger.kernel.org
Subject: "Robust Read" (was: [PATCH 1/2] md bitmap bug fixes)
Date: Sat, 19 Mar 2005 19:08:02 +0300 [thread overview]
Message-ID: <423C4E62.4070704@tls.msk.ru> (raw)
In-Reply-To: <vvjtg2-t16.ln1@news.it.uc3m.es>
[-- Attachment #1: Type: text/plain, Size: 2758 bytes --]
Peter T. Breuer wrote:
[]
> The patch was originally developed for 2.4, then ported to 2.6.3, and
> then to 2.6.8.1. Neil has recently been doing stuff, so I don't
> think it applies cleanly to 2.6.10, but somebody WAS porting it for me
> until they found that 2.6.10 didn't support their hardware ... and I
> recall discussing with him what to do about the change of map() to
> read_balance() in the code (essentially, put map() back). And finding
> that the spinlocks have changed too.
Well, it turns out current code is easier to modify, including spinlocks.
But now, with read_balance() in place, which can pick a "random" disk
to read from, we have to keep some sort of bitmap to mark disks which
we tried to read from.
For the hack below I've added r1_bio->tried_disk of type unsigned long
which has one bit per disk, so current scheme is limited to 32 disks
in array. This is really a hack for now -- I don't know much about
kernel programming rules... ;)
> @@ -266,9 +368,19 @@
> /*
> * this branch is our 'one mirror IO has finished' event handler:
> */
> - if (!uptodate)
> - md_error(r1_bio->mddev, conf->mirrors[mirror].rdev);
> - else
> + if (!uptodate) {
> +#ifdef CONFIG_MD_RAID1_ROBUST_READ
> + /*
> + * Only fault disk out of array on write error, not read.
> + */
> + if (0)
> +#endif /* CONFIG_MD_RAID1_ROBUST_READ */
> + md_error(r1_bio->mddev, conf->mirrors[mirror].rdev);
> +#ifdef DO_ADD_READ_WRITE_CORRECT
What's this?
Was it an attempt (incomplete) to do rewrite-after-failed-read?
> + else /* tell next time we're here that we're a retry */
> + set_bit(R1BIO_ReadRetry, &r1_bio->state);
> +#endif /* DO_ADD_READ_WRITE_CORRECT */
> + } else
> /*
> * Set R1BIO_Uptodate in our master bio, so that
> * we will return a good error code for to the higher
> @@ -285,8 +397,20 @@
> /*
> * we have only one bio on the read side
> */
> - if (uptodate)
> - raid_end_bio_io(r1_bio);
> + if (uptodate
> +#ifdef CONFIG_MD_RAID1_ROBUST_READ
> + /* Give up and error if we're last */
> + || (atomic_dec_and_test(&r1_bio->remaining))
> +#endif /* CONFIG_MD_RAID1_ROBUST_READ */
> + )
> +#ifdef DO_ADD_READ_WRITE_CORRECT
> + if (uptodate && test_bit(R1BIO_ReadRetry, &r1_bio->state)) {
> + /* Success at last - rewrite failed reads */
> + set_bit(R1BIO_IsSync, &r1_bio->state);
> + reschedule_retry(r1_bio);
> + } else
> +#endif /* DO_ADD_READ_WRITE_CORRECT */
> + raid_end_bio_io(r1_bio);
Hmm. Should we do actual write here, instead of rescheduling a
successeful read further, after finishing the original request?
/mjt
[-- Attachment #2: raid1_robust_read-2.6.11.diff --]
[-- Type: text/plain, Size: 3107 bytes --]
--- ./include/linux/raid/raid1.h.orig Wed Mar 2 10:38:10 2005
+++ ./include/linux/raid/raid1.h Sat Mar 19 18:53:42 2005
@@ -83,6 +83,7 @@ struct r1bio_s {
* if the IO is in READ direction, then this is where we read
*/
int read_disk;
+ unsigned long tried_disks; /* bitmap, disks we had tried to read from */
struct list_head retry_list;
/*
--- ./drivers/md/raid1.c.orig Wed Mar 2 10:38:10 2005
+++ ./drivers/md/raid1.c Sat Mar 19 18:57:16 2005
@@ -234,9 +234,13 @@ static int raid1_end_read_request(struct
/*
* this branch is our 'one mirror IO has finished' event handler:
*/
- if (!uptodate)
- md_error(r1_bio->mddev, conf->mirrors[mirror].rdev);
- else
+
+ update_head_pos(mirror, r1_bio);
+
+ /*
+ * we have only one bio on the read side
+ */
+ if (uptodate) {
/*
* Set R1BIO_Uptodate in our master bio, so that
* we will return a good error code for to the higher
@@ -247,14 +251,8 @@ static int raid1_end_read_request(struct
* wait for the 'master' bio.
*/
set_bit(R1BIO_Uptodate, &r1_bio->state);
-
- update_head_pos(mirror, r1_bio);
-
- /*
- * we have only one bio on the read side
- */
- if (uptodate)
raid_end_bio_io(r1_bio);
+ }
else {
/*
* oops, read error:
@@ -332,6 +330,10 @@ static int raid1_end_write_request(struc
*
* The rdev for the device selected will have nr_pending incremented.
*/
+
+#define disk_tried_before(r1_bio, disk) ((r1_bio)->tried_disks & (1<<(disk)))
+#define mark_disk_tried(r1_bio, disk) ((r1_bio)->tried_disks |= 1<<(disk))
+
static int read_balance(conf_t *conf, r1bio_t *r1_bio)
{
const unsigned long this_sector = r1_bio->sector;
@@ -351,7 +353,8 @@ static int read_balance(conf_t *conf, r1
new_disk = 0;
while (!conf->mirrors[new_disk].rdev ||
- !conf->mirrors[new_disk].rdev->in_sync) {
+ !conf->mirrors[new_disk].rdev->in_sync ||
+ disk_tried_before(r1_bio, new_disk)) {
new_disk++;
if (new_disk == conf->raid_disks) {
new_disk = -1;
@@ -364,7 +367,8 @@ static int read_balance(conf_t *conf, r1
/* make sure the disk is operational */
while (!conf->mirrors[new_disk].rdev ||
- !conf->mirrors[new_disk].rdev->in_sync) {
+ !conf->mirrors[new_disk].rdev->in_sync ||
+ disk_tried_before(r1_bio, new_disk)) {
if (new_disk <= 0)
new_disk = conf->raid_disks;
new_disk--;
@@ -394,7 +398,8 @@ static int read_balance(conf_t *conf, r1
disk--;
if (!conf->mirrors[disk].rdev ||
- !conf->mirrors[disk].rdev->in_sync)
+ !conf->mirrors[disk].rdev->in_sync ||
+ disk_tried_before(r1_bio, new_disk))
continue;
if (!atomic_read(&conf->mirrors[disk].rdev->nr_pending)) {
@@ -415,6 +420,7 @@ rb_out:
conf->next_seq_sect = this_sector + sectors;
conf->last_used = new_disk;
atomic_inc(&conf->mirrors[new_disk].rdev->nr_pending);
+ mark_disk_tried(r1_bio, new_disk);
}
rcu_read_unlock();
@@ -545,6 +551,7 @@ static int make_request(request_queue_t
r1_bio->sector = bio->bi_sector;
r1_bio->state = 0;
+ r1_bio->tried_disks = 0;
if (bio_data_dir(bio) == READ) {
/*
next prev parent reply other threads:[~2005-03-19 16:08 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-09 22:18 [PATCH 1/2] md bitmap bug fixes Paul Clements
2005-03-09 22:19 ` [PATCH 2/2] " Paul Clements
2005-03-14 4:43 ` [PATCH 1/2] " Neil Brown
2005-03-14 9:44 ` Lars Marowsky-Bree
2005-03-14 10:22 ` Neil Brown
2005-03-14 11:24 ` Lars Marowsky-Bree
2005-03-14 22:54 ` Neil Brown
2005-03-18 10:33 ` Lars Marowsky-Bree
2005-03-18 12:52 ` Peter T. Breuer
2005-03-18 13:42 ` Lars Marowsky-Bree
2005-03-18 14:50 ` Peter T. Breuer
2005-03-18 17:03 ` Paul Clements
2005-03-18 18:43 ` Peter T. Breuer
2005-03-18 19:01 ` Mario Holbe
2005-03-18 19:33 ` Peter T. Breuer
2005-03-18 20:24 ` Mario Holbe
2005-03-18 21:01 ` Andy Smith
2005-03-19 11:43 ` Peter T. Breuer
2005-03-19 12:58 ` Lars Marowsky-Bree
2005-03-19 13:27 ` Peter T. Breuer
2005-03-19 14:07 ` Lars Marowsky-Bree
2005-03-19 15:06 ` Peter T. Breuer
2005-03-19 15:24 ` Mario Holbe
2005-03-19 15:58 ` Peter T. Breuer
2005-03-19 16:24 ` Lars Marowsky-Bree
2005-03-19 17:19 ` Peter T. Breuer
2005-03-19 17:36 ` Lars Marowsky-Bree
2005-03-19 17:44 ` Guy
2005-03-19 17:54 ` Lars Marowsky-Bree
2005-03-19 18:05 ` Guy
2005-03-19 20:29 ` berk walker
2005-03-19 18:11 ` Peter T. Breuer
2005-03-18 19:43 ` Paul Clements
2005-03-19 12:10 ` Peter T. Breuer
2005-03-21 16:07 ` Paul Clements
2005-03-21 18:56 ` Luca Berra
2005-03-21 19:58 ` Paul Clements
2005-03-21 20:45 ` Peter T. Breuer
2005-03-21 21:09 ` Gil
2005-03-21 21:19 ` Paul Clements
2005-03-21 22:15 ` Peter T. Breuer
2005-03-22 22:35 ` Peter T. Breuer
2005-03-21 21:32 ` Guy
2005-03-22 9:35 ` Luca Berra
2005-03-22 10:02 ` Peter T. Breuer
2005-03-23 20:31 ` Luca Berra
2005-03-25 18:51 ` Peter T. Breuer
2005-03-25 20:54 ` berk walker
2005-03-25 20:56 ` berk walker
2005-03-18 17:16 ` Luca Berra
2005-03-18 17:57 ` Lars Marowsky-Bree
2005-03-18 21:46 ` Michael Tokarev
2005-03-19 9:05 ` Lars Marowsky-Bree
2005-03-19 12:16 ` Peter T. Breuer
2005-03-19 12:34 ` Michael Tokarev
2005-03-19 12:53 ` Peter T. Breuer
2005-03-19 16:08 ` Michael Tokarev [this message]
2005-03-19 17:03 ` "Robust Read" Peter T. Breuer
2005-03-19 20:20 ` Michael Tokarev
2005-03-19 20:56 ` Peter T. Breuer
2005-03-19 22:05 ` Michael Tokarev
2005-03-19 22:30 ` Peter T. Breuer
2005-03-15 4:24 ` [PATCH 1/2] md bitmap bug fixes Paul Clements
2005-03-17 20:51 ` [PATCH 0/3] md bitmap-based asynchronous writes Paul Clements
2005-03-17 20:53 ` [PATCH 1/3] md bitmap async write enabling Paul Clements
2005-03-17 20:55 ` [PATCH 2/3] md bitmap async writes for raid1 Paul Clements
2005-03-17 20:56 ` [PATCH 3/3] mdadm: bitmap async writes Paul Clements
2005-03-21 4:21 ` [PATCH 0/3] md bitmap-based asynchronous writes Neil Brown
2005-03-21 16:31 ` Paul Clements
2005-03-21 22:09 ` Neil Brown
2005-03-22 8:35 ` Peter T. Breuer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=423C4E62.4070704@tls.msk.ru \
--to=mjt@tls.msk.ru \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.