* [PATCH 0/3] count rdev->corrected_errors after final re-read
@ 2011-06-28 16:47 Namhyung Kim
2011-06-28 16:47 ` [PATCH 1/3] md/raid1: move rdev->corrected_errors counting Namhyung Kim
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Namhyung Kim @ 2011-06-28 16:47 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Hello,
This patchset tries fix rdev->corrected_errors counting. Read errors
are considered to corrected if write-back and re-read cycle is finished
without further problems. Thus moving the rdev->corrected_errors
counting after the re-reading looks more reasonable IMHO.
For resync case in RAID10, rdev->corrected_errors are increased right
after the first read failure. This seems need an improvement though
it's not handled in this series.
The patches are against 'for-next' branch on git://neil.brown.name/md.
Thanks.
Namhyung Kim (3):
md/raid1: move rdev->corrected_errors counting
md/raid5: move rdev->corrected_errors counting
md/raid10: move rdev->corrected_errors counting
drivers/md/raid1.c | 17 ++++++-----------
drivers/md/raid10.c | 2 +-
drivers/md/raid5.c | 5 +----
3 files changed, 8 insertions(+), 16 deletions(-)
--
1.7.6
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] md/raid1: move rdev->corrected_errors counting
2011-06-28 16:47 [PATCH 0/3] count rdev->corrected_errors after final re-read Namhyung Kim
@ 2011-06-28 16:47 ` Namhyung Kim
2011-06-28 16:47 ` [PATCH 2/3] md/raid5: " Namhyung Kim
2011-06-28 16:47 ` [PATCH 3/3] md/raid10: " Namhyung Kim
2 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2011-06-28 16:47 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO. Also included a couple of whitespace fixes on sync_page_io().
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
drivers/md/raid1.c | 17 ++++++-----------
1 files changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6b7f5fdb35c0..4ed381488925 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1222,9 +1222,7 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
* active, and resync is currently active
*/
rdev = conf->mirrors[d].rdev;
- if (sync_page_io(rdev,
- sect,
- s<<9,
+ if (sync_page_io(rdev, sect, s<<9,
bio->bi_io_vec[idx].bv_page,
READ, false)) {
success = 1;
@@ -1259,16 +1257,13 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
if (r1_bio->bios[d]->bi_end_io != end_sync_read)
continue;
rdev = conf->mirrors[d].rdev;
- if (sync_page_io(rdev,
- sect,
- s<<9,
+ if (sync_page_io(rdev, sect, s<<9,
bio->bi_io_vec[idx].bv_page,
WRITE, false) == 0) {
r1_bio->bios[d]->bi_end_io = NULL;
rdev_dec_pending(rdev, mddev);
md_error(mddev, rdev);
- } else
- atomic_add(s, &rdev->corrected_errors);
+ }
}
d = start;
while (d != r1_bio->read_disk) {
@@ -1278,12 +1273,12 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
if (r1_bio->bios[d]->bi_end_io != end_sync_read)
continue;
rdev = conf->mirrors[d].rdev;
- if (sync_page_io(rdev,
- sect,
- s<<9,
+ if (sync_page_io(rdev, sect, s<<9,
bio->bi_io_vec[idx].bv_page,
READ, false) == 0)
md_error(mddev, rdev);
+ else
+ atomic_add(s, &rdev->corrected_errors);
}
sectors -= s;
sect += s;
--
1.7.6
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] md/raid5: move rdev->corrected_errors counting
2011-06-28 16:47 [PATCH 0/3] count rdev->corrected_errors after final re-read Namhyung Kim
2011-06-28 16:47 ` [PATCH 1/3] md/raid1: move rdev->corrected_errors counting Namhyung Kim
@ 2011-06-28 16:47 ` Namhyung Kim
2011-06-28 16:47 ` [PATCH 3/3] md/raid10: " Namhyung Kim
2 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2011-06-28 16:47 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
drivers/md/raid5.c | 5 +----
1 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c9eb0555f321..59acc9b4deb3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -547,10 +547,6 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
bi->bi_io_vec[0].bv_offset = 0;
bi->bi_size = STRIPE_SIZE;
bi->bi_next = NULL;
- if ((rw & WRITE) &&
- test_bit(R5_ReWrite, &sh->dev[i].flags))
- atomic_add(STRIPE_SECTORS,
- &rdev->corrected_errors);
generic_make_request(bi);
} else {
if (rw & WRITE)
@@ -1588,6 +1584,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
(unsigned long long)(sh->sector
+ rdev->data_offset),
bdevname(rdev->bdev, b));
+ atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
clear_bit(R5_ReadError, &sh->dev[i].flags);
clear_bit(R5_ReWrite, &sh->dev[i].flags);
}
--
1.7.6
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] md/raid10: move rdev->corrected_errors counting
2011-06-28 16:47 [PATCH 0/3] count rdev->corrected_errors after final re-read Namhyung Kim
2011-06-28 16:47 ` [PATCH 1/3] md/raid1: move rdev->corrected_errors counting Namhyung Kim
2011-06-28 16:47 ` [PATCH 2/3] md/raid5: " Namhyung Kim
@ 2011-06-28 16:47 ` Namhyung Kim
2011-07-14 5:54 ` NeilBrown
2 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2011-06-28 16:47 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
For end_sync_read(), I'm not sure what should I do. But I think, at
least, the counting should be moved to end_sync_write() and that
might require additional field(s) in r10_bio. So I leave it as is
for simplicity now. How do you think?
drivers/md/raid10.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8628a62a02f0..0660bc9597d8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1524,7 +1524,6 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
test_bit(In_sync, &rdev->flags)) {
atomic_inc(&rdev->nr_pending);
rcu_read_unlock();
- atomic_add(s, &rdev->corrected_errors);
if (sync_page_io(rdev,
r10_bio->devs[sl].addr +
sect,
@@ -1589,6 +1588,7 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
(unsigned long long)(
sect + rdev->data_offset),
bdevname(rdev->bdev, b));
+ atomic_add(s, &rdev->corrected_errors);
}
rdev_dec_pending(rdev, mddev);
--
1.7.6
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 3/3] md/raid10: move rdev->corrected_errors counting
2011-06-28 16:47 ` [PATCH 3/3] md/raid10: " Namhyung Kim
@ 2011-07-14 5:54 ` NeilBrown
0 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2011-07-14 5:54 UTC (permalink / raw)
To: Namhyung Kim; +Cc: linux-raid
On Wed, 29 Jun 2011 01:47:26 +0900 Namhyung Kim <namhyung@gmail.com> wrote:
> Read errors are considered to corrected if write-back and re-read
> cycle is finished without further problems. Thus moving the rdev->
> corrected_errors counting after the re-reading looks more reasonable
> IMHO.
>
> Signed-off-by: Namhyung Kim <namhyung@gmail.com>
> ---
> For end_sync_read(), I'm not sure what should I do. But I think, at
> least, the counting should be moved to end_sync_write() and that
> might require additional field(s) in r10_bio. So I leave it as is
> for simplicity now. How do you think?
I don't think it really matters.
If the read error doesn't get corrected, then the device will be failed
and the 'corrected_errors' count won't mean anything any more...
Maybe we should just change the field to 'read_errors' and adjust it whenever
we get a read error. If the device is still in the array, then the error
must have been corrected, if not, it count will be ignored.
There is a corner case where we don't evict the last mirror from an array due
to a read error - we would need to handle that some how.
But otherwise I don't think this is an issue.
I have applied all three patches because they do make the code a bit cleaner
and clear, but I don't think we really need to worry further about RAID10.
Thanks,
NeilBrown
>
> drivers/md/raid10.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 8628a62a02f0..0660bc9597d8 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1524,7 +1524,6 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
> test_bit(In_sync, &rdev->flags)) {
> atomic_inc(&rdev->nr_pending);
> rcu_read_unlock();
> - atomic_add(s, &rdev->corrected_errors);
> if (sync_page_io(rdev,
> r10_bio->devs[sl].addr +
> sect,
> @@ -1589,6 +1588,7 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
> (unsigned long long)(
> sect + rdev->data_offset),
> bdevname(rdev->bdev, b));
> + atomic_add(s, &rdev->corrected_errors);
> }
>
> rdev_dec_pending(rdev, mddev);
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-07-14 5:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-28 16:47 [PATCH 0/3] count rdev->corrected_errors after final re-read Namhyung Kim
2011-06-28 16:47 ` [PATCH 1/3] md/raid1: move rdev->corrected_errors counting Namhyung Kim
2011-06-28 16:47 ` [PATCH 2/3] md/raid5: " Namhyung Kim
2011-06-28 16:47 ` [PATCH 3/3] md/raid10: " Namhyung Kim
2011-07-14 5:54 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).