[PATCH 0/3] count rdev->corrected_errors after final re-read

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] count rdev->corrected_errors after final re-read
@ 2011-06-28 16:47 Namhyung Kim
  2011-06-28 16:47 ` [PATCH 1/3] md/raid1: move rdev->corrected_errors counting Namhyung Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Namhyung Kim @ 2011-06-28 16:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hello,

This patchset tries fix rdev->corrected_errors counting. Read errors
are considered to corrected if write-back and re-read cycle is finished
without further problems. Thus moving the rdev->corrected_errors
counting after the re-reading looks more reasonable IMHO.

For resync case in RAID10, rdev->corrected_errors are increased right
after the first read failure. This seems need an improvement though
it's not handled in this series.

The patches are against 'for-next' branch on git://neil.brown.name/md.

Thanks.

Namhyung Kim (3):
  md/raid1: move rdev->corrected_errors counting
  md/raid5: move rdev->corrected_errors counting
  md/raid10: move rdev->corrected_errors counting

 drivers/md/raid1.c  |   17 ++++++-----------
 drivers/md/raid10.c |    2 +-
 drivers/md/raid5.c  |    5 +----
 3 files changed, 8 insertions(+), 16 deletions(-)

-- 
1.7.6

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] md/raid1: move rdev->corrected_errors counting
  2011-06-28 16:47 [PATCH 0/3] count rdev->corrected_errors after final re-read Namhyung Kim
@ 2011-06-28 16:47 ` Namhyung Kim
  2011-06-28 16:47 ` [PATCH 2/3] md/raid5: " Namhyung Kim
  2011-06-28 16:47 ` [PATCH 3/3] md/raid10: " Namhyung Kim
  2 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2011-06-28 16:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO. Also included a couple of whitespace fixes on sync_page_io().

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
 drivers/md/raid1.c |   17 ++++++-----------
 1 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6b7f5fdb35c0..4ed381488925 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1222,9 +1222,7 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
 				 * active, and resync is currently active
 				 */
 				rdev = conf->mirrors[d].rdev;
-				if (sync_page_io(rdev,
-						 sect,
-						 s<<9,
+				if (sync_page_io(rdev, sect, s<<9,
 						 bio->bi_io_vec[idx].bv_page,
 						 READ, false)) {
 					success = 1;
@@ -1259,16 +1257,13 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
 			if (r1_bio->bios[d]->bi_end_io != end_sync_read)
 				continue;
 			rdev = conf->mirrors[d].rdev;
-			if (sync_page_io(rdev,
-					 sect,
-					 s<<9,
+			if (sync_page_io(rdev, sect, s<<9,
 					 bio->bi_io_vec[idx].bv_page,
 					 WRITE, false) == 0) {
 				r1_bio->bios[d]->bi_end_io = NULL;
 				rdev_dec_pending(rdev, mddev);
 				md_error(mddev, rdev);
-			} else
-				atomic_add(s, &rdev->corrected_errors);
+			}
 		}
 		d = start;
 		while (d != r1_bio->read_disk) {
@@ -1278,12 +1273,12 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
 			if (r1_bio->bios[d]->bi_end_io != end_sync_read)
 				continue;
 			rdev = conf->mirrors[d].rdev;
-			if (sync_page_io(rdev,
-					 sect,
-					 s<<9,
+			if (sync_page_io(rdev, sect, s<<9,
 					 bio->bi_io_vec[idx].bv_page,
 					 READ, false) == 0)
 				md_error(mddev, rdev);
+			else
+				atomic_add(s, &rdev->corrected_errors);
 		}
 		sectors -= s;
 		sect += s;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] md/raid5: move rdev->corrected_errors counting
  2011-06-28 16:47 [PATCH 0/3] count rdev->corrected_errors after final re-read Namhyung Kim
  2011-06-28 16:47 ` [PATCH 1/3] md/raid1: move rdev->corrected_errors counting Namhyung Kim
@ 2011-06-28 16:47 ` Namhyung Kim
  2011-06-28 16:47 ` [PATCH 3/3] md/raid10: " Namhyung Kim
  2 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2011-06-28 16:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
 drivers/md/raid5.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c9eb0555f321..59acc9b4deb3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -547,10 +547,6 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 			bi->bi_io_vec[0].bv_offset = 0;
 			bi->bi_size = STRIPE_SIZE;
 			bi->bi_next = NULL;
-			if ((rw & WRITE) &&
-			    test_bit(R5_ReWrite, &sh->dev[i].flags))
-				atomic_add(STRIPE_SECTORS,
-					&rdev->corrected_errors);
 			generic_make_request(bi);
 		} else {
 			if (rw & WRITE)
@@ -1588,6 +1584,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
 					   (unsigned long long)(sh->sector
 								+ rdev->data_offset),
 					   bdevname(rdev->bdev, b));
+			atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
 			clear_bit(R5_ReadError, &sh->dev[i].flags);
 			clear_bit(R5_ReWrite, &sh->dev[i].flags);
 		}
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] md/raid10: move rdev->corrected_errors counting
  2011-06-28 16:47 [PATCH 0/3] count rdev->corrected_errors after final re-read Namhyung Kim
  2011-06-28 16:47 ` [PATCH 1/3] md/raid1: move rdev->corrected_errors counting Namhyung Kim
  2011-06-28 16:47 ` [PATCH 2/3] md/raid5: " Namhyung Kim
@ 2011-06-28 16:47 ` Namhyung Kim
  2011-07-14  5:54   ` NeilBrown
  2 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2011-06-28 16:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
For end_sync_read(), I'm not sure what should I do. But I think, at
least, the counting should be moved to end_sync_write() and that
might require additional field(s) in r10_bio. So I leave it as is
for simplicity now. How do you think?

 drivers/md/raid10.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8628a62a02f0..0660bc9597d8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1524,7 +1524,6 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
 			    test_bit(In_sync, &rdev->flags)) {
 				atomic_inc(&rdev->nr_pending);
 				rcu_read_unlock();
-				atomic_add(s, &rdev->corrected_errors);
 				if (sync_page_io(rdev,
 						 r10_bio->devs[sl].addr +
 						 sect,
@@ -1589,6 +1588,7 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
 					       (unsigned long long)(
 						       sect + rdev->data_offset),
 					       bdevname(rdev->bdev, b));
+					atomic_add(s, &rdev->corrected_errors);
 				}
 
 				rdev_dec_pending(rdev, mddev);
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 3/3] md/raid10: move rdev->corrected_errors counting
  2011-06-28 16:47 ` [PATCH 3/3] md/raid10: " Namhyung Kim
@ 2011-07-14  5:54   ` NeilBrown
  0 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2011-07-14  5:54 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-raid

On Wed, 29 Jun 2011 01:47:26 +0900 Namhyung Kim <namhyung@gmail.com> wrote:

> Read errors are considered to corrected if write-back and re-read
> cycle is finished without further problems. Thus moving the rdev->
> corrected_errors counting after the re-reading looks more reasonable
> IMHO.
> 
> Signed-off-by: Namhyung Kim <namhyung@gmail.com>
> ---
> For end_sync_read(), I'm not sure what should I do. But I think, at
> least, the counting should be moved to end_sync_write() and that
> might require additional field(s) in r10_bio. So I leave it as is
> for simplicity now. How do you think?

I don't think it really matters.
If the read error doesn't get corrected, then the device will be failed
and the 'corrected_errors' count won't mean anything any more...

Maybe we should just change the field to 'read_errors' and adjust it whenever
we get a read error.  If the device is still in the array, then the error
must have been corrected, if not, it count will be ignored.

There is a corner case where we don't evict the last mirror from an array due
to a read error - we would need to handle that some how.

But otherwise I don't think this is an issue.

I have applied all three patches because they do make the code a bit cleaner
and clear, but I don't think we really need to worry further about RAID10.

Thanks,

NeilBrown



> 
>  drivers/md/raid10.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 8628a62a02f0..0660bc9597d8 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1524,7 +1524,6 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
>  			    test_bit(In_sync, &rdev->flags)) {
>  				atomic_inc(&rdev->nr_pending);
>  				rcu_read_unlock();
> -				atomic_add(s, &rdev->corrected_errors);
>  				if (sync_page_io(rdev,
>  						 r10_bio->devs[sl].addr +
>  						 sect,
> @@ -1589,6 +1588,7 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
>  					       (unsigned long long)(
>  						       sect + rdev->data_offset),
>  					       bdevname(rdev->bdev, b));
> +					atomic_add(s, &rdev->corrected_errors);
>  				}
>  
>  				rdev_dec_pending(rdev, mddev);


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-07-14  5:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-28 16:47 [PATCH 0/3] count rdev->corrected_errors after final re-read Namhyung Kim
2011-06-28 16:47 ` [PATCH 1/3] md/raid1: move rdev->corrected_errors counting Namhyung Kim
2011-06-28 16:47 ` [PATCH 2/3] md/raid5: " Namhyung Kim
2011-06-28 16:47 ` [PATCH 3/3] md/raid10: " Namhyung Kim
2011-07-14  5:54   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).