Re: raid1 repair does not repair errors?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: raid1 repair does not repair errors?
@ 2014-02-02 12:24 Michael Tokarev
  2014-02-02 21:51 ` Peter Grandi
  2014-02-03  1:04 ` NeilBrown
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Tokarev @ 2014-02-02 12:24 UTC (permalink / raw)
  To: linux-raid, Neil Brown

Hello.

This is a followup for a somewhat old thread, --
 http://thread.gmane.org/gmane.linux.raid/44503
with the required details.

Initial problem was that with a raid1 array on a
few drives and one of them having a bad sector,
runnig `repair' action does not actually fix the
error, it looks like the raid1 code does not see
the error.

This is a production host, so it is very difficult to
experiment.  When I initially hit this issue there, I
tried various ways to fix the issue, one of them was
to removing the bad drive from the array and adding
it back.  This forced all sectors to be re-written,
and the problem went away.

Now, the same issue happened again - another drive
developed a bad sector, and again, md `repair' action
does not fix it.

So I added some debugging as requested in the original
thread, and re-run `repair' action.

Here's the changes I added to 3.10 raid1.c file:

--- ../linux-3.10/drivers/md/raid1.c	2014-02-02 16:01:55.003119836 +0400
+++ drivers/md/raid1.c	2014-02-02 16:07:37.913808336 +0400
@@ -1636,6 +1636,8 @@ static void end_sync_read(struct bio *bi
 	 */
 	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
 		set_bit(R1BIO_Uptodate, &r1_bio->state);
+else
+printk("end_sync_read: !BIO_UPTODATE\n");

 	if (atomic_dec_and_test(&r1_bio->remaining))
 		reschedule_retry(r1_bio);
@@ -1749,6 +1751,7 @@ static int fix_sync_read_error(struct r1
 				 * active, and resync is currently active
 				 */
 				rdev = conf->mirrors[d].rdev;
+printk("fix_sync_read_error: calling sync_page_io(%Li, %Li<<9)\n", (uint64_t)sect, (uint64_t)s);
 				if (sync_page_io(rdev, sect, s<<9,
 						 bio->bi_io_vec[idx].bv_page,
 						 READ, false)) {
@@ -1931,10 +1934,12 @@ static void sync_request_write(struct md

 	bio = r1_bio->bios[r1_bio->read_disk];

-	if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
+	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
 		/* ouch - failed to read all of that. */
+printk("sync_request_write: !R1BIO_Uptodate\n");
 		if (!fix_sync_read_error(r1_bio))
 			return;
+	}

 	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
 		if (process_checks(r1_bio) < 0)


And here is the whole dmesg result from the repair run:

[   74.288227] md: requested-resync of RAID array md1
[   74.288719] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[   74.289329] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
[   74.290404] md: using 128k window, over a total of 2096064k.
[  179.213754] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[  179.214500] ata6.00: irq_stat 0x40000008
[  179.214909] ata6.00: failed command: READ FPDMA QUEUED
[  179.215452] ata6.00: cmd 60/80:00:00:3e:3e/00:00:00:00:00/40 tag 0 ncq 65536 in
[  179.215452]          res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[  179.217087] ata6.00: status: { DRDY ERR }
[  179.217500] ata6.00: error: { UNC }
[  179.220185] ata6.00: configured for UDMA/133
[  179.220656] sd 5:0:0:0: [sdd] Unhandled sense code
[  179.221149] sd 5:0:0:0: [sdd]
[  179.221476] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  179.222062] sd 5:0:0:0: [sdd]
[  179.222398] Sense Key : Medium Error [current] [descriptor]
[  179.223034] Descriptor sense data with sense descriptors (in hex):
[  179.223704]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[  179.224726]         00 3e 3e 23
[  179.225169] sd 5:0:0:0: [sdd]
[  179.225494] Add. Sense: Unrecovered read error - auto reallocate failed
[  179.226215] sd 5:0:0:0: [sdd] CDB:
[  179.226577] Read(10): 28 00 00 3e 3e 00 00 00 80 00
[  179.227344] end_request: I/O error, dev sdd, sector 4079139
[  179.227926] end_sync_read: !BIO_UPTODATE
[  179.228359] ata6: EH complete
[  181.926457] md: md1: requested-resync done.


So the only printk I've added is seen: "end_sync_read: !BIO_UPTODATE", and
it looks like rewriting code is never called.


This is root array of a production machine, so I can reboot it only at
late evenings or at weekends.  But this time I finally want to fix the
bug for real, so I will not try to fix the erroneous drive until we will
be able to fix the code.  Just one thing: the fixing process might be
a bit slow.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-02 12:24 raid1 repair does not repair errors? Michael Tokarev
@ 2014-02-02 21:51 ` Peter Grandi
  2014-02-03  1:04 ` NeilBrown
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Grandi @ 2014-02-02 21:51 UTC (permalink / raw)
  To: Linux RAID

> [ ... ] I tried various ways to fix the issue, one of them was
> to removing the bad drive from the array and adding it back.
> This forced all sectors to be re-written, and the problem went
> away.

Lucky,

> [ ... ] Now, the same issue happened again - another drive
> developed a bad sector, and again, md `repair' action does not
> fix it. [ ... ]

Thanks for letting us know things work as expected and MD RAID
has not developed psychic powers yet :-).

Similar reports happen now and then. Some people seem to develop
inflated expectations of what a RAID system can do, and how a
storage stack works. Perhaps some people imagine that "repair"
means "repair disk errors" rather than "repair RAID status".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-02 12:24 raid1 repair does not repair errors? Michael Tokarev
  2014-02-02 21:51 ` Peter Grandi
@ 2014-02-03  1:04 ` NeilBrown
  2014-02-03  4:36   ` NeilBrown
  2014-02-06 14:21   ` Mikael Abrahamsson
  1 sibling, 2 replies; 13+ messages in thread
From: NeilBrown @ 2014-02-03  1:04 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6165 bytes --]

On Sun, 02 Feb 2014 16:24:48 +0400 Michael Tokarev <mjt@tls.msk.ru> wrote:

> Hello.
> 
> This is a followup for a somewhat old thread, --
>  http://thread.gmane.org/gmane.linux.raid/44503
> with the required details.
> 
> Initial problem was that with a raid1 array on a
> few drives and one of them having a bad sector,
> runnig `repair' action does not actually fix the
> error, it looks like the raid1 code does not see
> the error.
> 
> This is a production host, so it is very difficult to
> experiment.  When I initially hit this issue there, I
> tried various ways to fix the issue, one of them was
> to removing the bad drive from the array and adding
> it back.  This forced all sectors to be re-written,
> and the problem went away.
> 
> Now, the same issue happened again - another drive
> developed a bad sector, and again, md `repair' action
> does not fix it.
> 
> So I added some debugging as requested in the original
> thread, and re-run `repair' action.
> 
> Here's the changes I added to 3.10 raid1.c file:
> 
> --- ../linux-3.10/drivers/md/raid1.c	2014-02-02 16:01:55.003119836 +0400
> +++ drivers/md/raid1.c	2014-02-02 16:07:37.913808336 +0400
> @@ -1636,6 +1636,8 @@ static void end_sync_read(struct bio *bi
>  	 */
>  	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
>  		set_bit(R1BIO_Uptodate, &r1_bio->state);
> +else
> +printk("end_sync_read: !BIO_UPTODATE\n");
> 
>  	if (atomic_dec_and_test(&r1_bio->remaining))
>  		reschedule_retry(r1_bio);
> @@ -1749,6 +1751,7 @@ static int fix_sync_read_error(struct r1
>  				 * active, and resync is currently active
>  				 */
>  				rdev = conf->mirrors[d].rdev;
> +printk("fix_sync_read_error: calling sync_page_io(%Li, %Li<<9)\n", (uint64_t)sect, (uint64_t)s);
>  				if (sync_page_io(rdev, sect, s<<9,
>  						 bio->bi_io_vec[idx].bv_page,
>  						 READ, false)) {
> @@ -1931,10 +1934,12 @@ static void sync_request_write(struct md
> 
>  	bio = r1_bio->bios[r1_bio->read_disk];
> 
> -	if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
> +	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
>  		/* ouch - failed to read all of that. */
> +printk("sync_request_write: !R1BIO_Uptodate\n");
>  		if (!fix_sync_read_error(r1_bio))
>  			return;
> +	}
> 
>  	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
>  		if (process_checks(r1_bio) < 0)
> 
> 
> And here is the whole dmesg result from the repair run:
> 
> [   74.288227] md: requested-resync of RAID array md1
> [   74.288719] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [   74.289329] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
> [   74.290404] md: using 128k window, over a total of 2096064k.
> [  179.213754] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
> [  179.214500] ata6.00: irq_stat 0x40000008
> [  179.214909] ata6.00: failed command: READ FPDMA QUEUED
> [  179.215452] ata6.00: cmd 60/80:00:00:3e:3e/00:00:00:00:00/40 tag 0 ncq 65536 in
> [  179.215452]          res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
> [  179.217087] ata6.00: status: { DRDY ERR }
> [  179.217500] ata6.00: error: { UNC }
> [  179.220185] ata6.00: configured for UDMA/133
> [  179.220656] sd 5:0:0:0: [sdd] Unhandled sense code
> [  179.221149] sd 5:0:0:0: [sdd]
> [  179.221476] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [  179.222062] sd 5:0:0:0: [sdd]
> [  179.222398] Sense Key : Medium Error [current] [descriptor]
> [  179.223034] Descriptor sense data with sense descriptors (in hex):
> [  179.223704]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [  179.224726]         00 3e 3e 23
> [  179.225169] sd 5:0:0:0: [sdd]
> [  179.225494] Add. Sense: Unrecovered read error - auto reallocate failed
> [  179.226215] sd 5:0:0:0: [sdd] CDB:
> [  179.226577] Read(10): 28 00 00 3e 3e 00 00 00 80 00
> [  179.227344] end_request: I/O error, dev sdd, sector 4079139
> [  179.227926] end_sync_read: !BIO_UPTODATE
> [  179.228359] ata6: EH complete
> [  181.926457] md: md1: requested-resync done.
> 
> 
> So the only printk I've added is seen: "end_sync_read: !BIO_UPTODATE", and
> it looks like rewriting code is never called.
> 
> 
> This is root array of a production machine, so I can reboot it only at
> late evenings or at weekends.  But this time I finally want to fix the
> bug for real, so I will not try to fix the erroneous drive until we will
> be able to fix the code.  Just one thing: the fixing process might be
> a bit slow.
> 
> Thanks,
> 
> /mjt


Hi,
 thanks for the testing and report.  I see what the problem is now.

We only try to fix a read error when all reads failed rather than when any
read fails.
Most of the time there is only one read so this makes no difference.
However for 'check' and 'repair' we read all devices so the current code will
only try to repair a read error if *every*  device failed, which of course
would be pointless.

This patch (against v3.10) should fix it.  It only leaves R1BIO_Uptodate set
if no failure is seen.

NeilBrown


diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6e17f8181c4b..ba38ef6c612b 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1633,8 +1633,8 @@ static void end_sync_read(struct bio *bio, int error)
 	 * or re-read if the read failed.
 	 * We don't do much here, just schedule handling by raid1d
 	 */
-	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
-		set_bit(R1BIO_Uptodate, &r1_bio->state);
+	if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
+		clear_bit(R1BIO_Uptodate, &r1_bio->state);
 
 	if (atomic_dec_and_test(&r1_bio->remaining))
 		reschedule_retry(r1_bio);
@@ -2609,6 +2609,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
 	/* For a user-requested sync, we read all readable devices and do a
 	 * compare
 	 */
+	set_bit(R1BIO_UPTODATE, &r1_bio->state);
 	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
 		atomic_set(&r1_bio->remaining, read_targets);
 		for (i = 0; i < conf->raid_disks * 2 && read_targets; i++) {

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-03  1:04 ` NeilBrown
@ 2014-02-03  4:36   ` NeilBrown
  2014-02-03  7:30     ` Michael Tokarev
  2014-02-06 14:21   ` Mikael Abrahamsson
  1 sibling, 1 reply; 13+ messages in thread
From: NeilBrown @ 2014-02-03  4:36 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 7345 bytes --]

On Mon, 3 Feb 2014 12:04:31 +1100 NeilBrown <neilb@suse.de> wrote:

> On Sun, 02 Feb 2014 16:24:48 +0400 Michael Tokarev <mjt@tls.msk.ru> wrote:
> 
> > Hello.
> > 
> > This is a followup for a somewhat old thread, --
> >  http://thread.gmane.org/gmane.linux.raid/44503
> > with the required details.
> > 
> > Initial problem was that with a raid1 array on a
> > few drives and one of them having a bad sector,
> > runnig `repair' action does not actually fix the
> > error, it looks like the raid1 code does not see
> > the error.
> > 
> > This is a production host, so it is very difficult to
> > experiment.  When I initially hit this issue there, I
> > tried various ways to fix the issue, one of them was
> > to removing the bad drive from the array and adding
> > it back.  This forced all sectors to be re-written,
> > and the problem went away.
> > 
> > Now, the same issue happened again - another drive
> > developed a bad sector, and again, md `repair' action
> > does not fix it.
> > 
> > So I added some debugging as requested in the original
> > thread, and re-run `repair' action.
> > 
> > Here's the changes I added to 3.10 raid1.c file:
> > 
> > --- ../linux-3.10/drivers/md/raid1.c	2014-02-02 16:01:55.003119836 +0400
> > +++ drivers/md/raid1.c	2014-02-02 16:07:37.913808336 +0400
> > @@ -1636,6 +1636,8 @@ static void end_sync_read(struct bio *bi
> >  	 */
> >  	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> >  		set_bit(R1BIO_Uptodate, &r1_bio->state);
> > +else
> > +printk("end_sync_read: !BIO_UPTODATE\n");
> > 
> >  	if (atomic_dec_and_test(&r1_bio->remaining))
> >  		reschedule_retry(r1_bio);
> > @@ -1749,6 +1751,7 @@ static int fix_sync_read_error(struct r1
> >  				 * active, and resync is currently active
> >  				 */
> >  				rdev = conf->mirrors[d].rdev;
> > +printk("fix_sync_read_error: calling sync_page_io(%Li, %Li<<9)\n", (uint64_t)sect, (uint64_t)s);
> >  				if (sync_page_io(rdev, sect, s<<9,
> >  						 bio->bi_io_vec[idx].bv_page,
> >  						 READ, false)) {
> > @@ -1931,10 +1934,12 @@ static void sync_request_write(struct md
> > 
> >  	bio = r1_bio->bios[r1_bio->read_disk];
> > 
> > -	if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
> > +	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
> >  		/* ouch - failed to read all of that. */
> > +printk("sync_request_write: !R1BIO_Uptodate\n");
> >  		if (!fix_sync_read_error(r1_bio))
> >  			return;
> > +	}
> > 
> >  	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
> >  		if (process_checks(r1_bio) < 0)
> > 
> > 
> > And here is the whole dmesg result from the repair run:
> > 
> > [   74.288227] md: requested-resync of RAID array md1
> > [   74.288719] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > [   74.289329] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
> > [   74.290404] md: using 128k window, over a total of 2096064k.
> > [  179.213754] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
> > [  179.214500] ata6.00: irq_stat 0x40000008
> > [  179.214909] ata6.00: failed command: READ FPDMA QUEUED
> > [  179.215452] ata6.00: cmd 60/80:00:00:3e:3e/00:00:00:00:00/40 tag 0 ncq 65536 in
> > [  179.215452]          res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
> > [  179.217087] ata6.00: status: { DRDY ERR }
> > [  179.217500] ata6.00: error: { UNC }
> > [  179.220185] ata6.00: configured for UDMA/133
> > [  179.220656] sd 5:0:0:0: [sdd] Unhandled sense code
> > [  179.221149] sd 5:0:0:0: [sdd]
> > [  179.221476] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [  179.222062] sd 5:0:0:0: [sdd]
> > [  179.222398] Sense Key : Medium Error [current] [descriptor]
> > [  179.223034] Descriptor sense data with sense descriptors (in hex):
> > [  179.223704]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> > [  179.224726]         00 3e 3e 23
> > [  179.225169] sd 5:0:0:0: [sdd]
> > [  179.225494] Add. Sense: Unrecovered read error - auto reallocate failed
> > [  179.226215] sd 5:0:0:0: [sdd] CDB:
> > [  179.226577] Read(10): 28 00 00 3e 3e 00 00 00 80 00
> > [  179.227344] end_request: I/O error, dev sdd, sector 4079139
> > [  179.227926] end_sync_read: !BIO_UPTODATE
> > [  179.228359] ata6: EH complete
> > [  181.926457] md: md1: requested-resync done.
> > 
> > 
> > So the only printk I've added is seen: "end_sync_read: !BIO_UPTODATE", and
> > it looks like rewriting code is never called.
> > 
> > 
> > This is root array of a production machine, so I can reboot it only at
> > late evenings or at weekends.  But this time I finally want to fix the
> > bug for real, so I will not try to fix the erroneous drive until we will
> > be able to fix the code.  Just one thing: the fixing process might be
> > a bit slow.
> > 
> > Thanks,
> > 
> > /mjt
> 
> 
> Hi,
>  thanks for the testing and report.  I see what the problem is now.
> 
> We only try to fix a read error when all reads failed rather than when any
> read fails.
> Most of the time there is only one read so this makes no difference.
> However for 'check' and 'repair' we read all devices so the current code will
> only try to repair a read error if *every*  device failed, which of course
> would be pointless.
> 
> This patch (against v3.10) should fix it.  It only leaves R1BIO_Uptodate set
> if no failure is seen.
> 
> NeilBrown
> 
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 6e17f8181c4b..ba38ef6c612b 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1633,8 +1633,8 @@ static void end_sync_read(struct bio *bio, int error)
>  	 * or re-read if the read failed.
>  	 * We don't do much here, just schedule handling by raid1d
>  	 */
> -	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> -		set_bit(R1BIO_Uptodate, &r1_bio->state);
> +	if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
> +		clear_bit(R1BIO_Uptodate, &r1_bio->state);
>  
>  	if (atomic_dec_and_test(&r1_bio->remaining))
>  		reschedule_retry(r1_bio);
> @@ -2609,6 +2609,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
>  	/* For a user-requested sync, we read all readable devices and do a
>  	 * compare
>  	 */
> +	set_bit(R1BIO_UPTODATE, &r1_bio->state);
>  	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
>  		atomic_set(&r1_bio->remaining, read_targets);
>  		for (i = 0; i < conf->raid_disks * 2 && read_targets; i++) {


Actually I've changed my mind.  That patch won't fix anything.
fix_sync_read_error() is focussed on fixing a read error on ->read_disk.
So we only set uptodate if ->read_disk succeeded.

So this patch should do it.

NeilBrown

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index fd3a2a14b587..0fe5fd469e74 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1733,7 +1733,8 @@ static void end_sync_read(struct bio *bio, int error)
 	 * or re-read if the read failed.
 	 * We don't do much here, just schedule handling by raid1d
 	 */
-	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
+	if (bio == r1_bio->bios[r1_bio->read_disk] &&
+	    test_bit(BIO_UPTODATE, &bio->bi_flags))
 		set_bit(R1BIO_Uptodate, &r1_bio->state);
 
 	if (atomic_dec_and_test(&r1_bio->remaining))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-03  4:36   ` NeilBrown
@ 2014-02-03  7:30     ` Michael Tokarev
  2014-02-03 17:46       ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2014-02-03  7:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

03.02.2014 08:36, NeilBrown wrote:
[]
> Actually I've changed my mind.  That patch won't fix anything.
> fix_sync_read_error() is focussed on fixing a read error on ->read_disk.
> So we only set uptodate if ->read_disk succeeded.
> 
> So this patch should do it.
> 
> NeilBrown
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index fd3a2a14b587..0fe5fd469e74 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1733,7 +1733,8 @@ static void end_sync_read(struct bio *bio, int error)
>  	 * or re-read if the read failed.
>  	 * We don't do much here, just schedule handling by raid1d
>  	 */
> -	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> +	if (bio == r1_bio->bios[r1_bio->read_disk] &&
> +	    test_bit(BIO_UPTODATE, &bio->bi_flags))
>  		set_bit(R1BIO_Uptodate, &r1_bio->state);
>  
>  	if (atomic_dec_and_test(&r1_bio->remaining))
> 

I changed it like this for now:

--- ../linux-3.10/drivers/md/raid1.c	2014-02-02 16:01:55.003119836 +0400
+++ drivers/md/raid1.c	2014-02-03 11:26:59.062845829 +0400
@@ -1634,8 +1634,12 @@ static void end_sync_read(struct bio *bi
 	 * or re-read if the read failed.
 	 * We don't do much here, just schedule handling by raid1d
 	 */
-	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
-		set_bit(R1BIO_Uptodate, &r1_bio->state);
+	if (bio == r1_bio->bios[r1_bio->read_disk]) {
+	    if (test_bit(BIO_UPTODATE, &bio->bi_flags))
+ 		set_bit(R1BIO_Uptodate, &r1_bio->state);
+	    else
+		printk("end_sync_read: our bio, but !BIO_UPTODATE\n");
+	}

 	if (atomic_dec_and_test(&r1_bio->remaining))
 		reschedule_retry(r1_bio);

and will test it later today (in about 10 hours from now) -- as I
mentioned, this is a prod box and testing isn't possible anytime.

Thank you for looking into this.  Hopefully it will work better now :)

/mjt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-03  7:30     ` Michael Tokarev
@ 2014-02-03 17:46       ` Michael Tokarev
  2014-02-04  4:30         ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2014-02-03 17:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

03.02.2014 11:30, Michael Tokarev пишет:
> 03.02.2014 08:36, NeilBrown wrote:
> []
>> Actually I've changed my mind.  That patch won't fix anything.
>> fix_sync_read_error() is focussed on fixing a read error on ->read_disk.
>> So we only set uptodate if ->read_disk succeeded.
>>
>> So this patch should do it.
>>
>> NeilBrown
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index fd3a2a14b587..0fe5fd469e74 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1733,7 +1733,8 @@ static void end_sync_read(struct bio *bio, int error)
>>  	 * or re-read if the read failed.
>>  	 * We don't do much here, just schedule handling by raid1d
>>  	 */
>> -	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
>> +	if (bio == r1_bio->bios[r1_bio->read_disk] &&
>> +	    test_bit(BIO_UPTODATE, &bio->bi_flags))
>>  		set_bit(R1BIO_Uptodate, &r1_bio->state);
>>  
>>  	if (atomic_dec_and_test(&r1_bio->remaining))
>>
> 
> I changed it like this for now:
> 
> --- ../linux-3.10/drivers/md/raid1.c	2014-02-02 16:01:55.003119836 +0400
> +++ drivers/md/raid1.c	2014-02-03 11:26:59.062845829 +0400
> @@ -1634,8 +1634,12 @@ static void end_sync_read(struct bio *bi
>  	 * or re-read if the read failed.
>  	 * We don't do much here, just schedule handling by raid1d
>  	 */
> -	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> -		set_bit(R1BIO_Uptodate, &r1_bio->state);
> +	if (bio == r1_bio->bios[r1_bio->read_disk]) {
> +	    if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> + 		set_bit(R1BIO_Uptodate, &r1_bio->state);
> +	    else
> +		printk("end_sync_read: our bio, but !BIO_UPTODATE\n");
> +	}
> 
>  	if (atomic_dec_and_test(&r1_bio->remaining))
>  		reschedule_retry(r1_bio);
> 
> and will test it later today (in about 10 hours from now) -- as I
> mentioned, this is a prod box and testing isn't possible anytime.

Hm. With this change, there's nothing interesting in dmesg at all anymore:

[  100.571933] md: requested-resync of RAID array md1
[  100.572430] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  100.573041] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
[  100.574135] md: using 128k window, over a total of 2096064k.
[  202.616277] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[  202.617035] ata6.00: irq_stat 0x40000008
[  202.617439] ata6.00: failed command: READ FPDMA QUEUED
[  202.617980] ata6.00: cmd 60/80:c0:00:3e:3e/00:00:00:00:00/40 tag 24 ncq 65536 in
[  202.617980]          res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[  202.619631] ata6.00: status: { DRDY ERR }
[  202.620077] ata6.00: error: { UNC }
[  202.622692] ata6.00: configured for UDMA/133
[  202.623182] sd 5:0:0:0: [sdd] Unhandled sense code
[  202.623693] sd 5:0:0:0: [sdd]
[  202.624011] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  202.624600] sd 5:0:0:0: [sdd]
[  202.624929] Sense Key : Medium Error [current] [descriptor]
[  202.625576] Descriptor sense data with sense descriptors (in hex):
[  202.626249]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[  202.627276]         00 3e 3e 23
[  202.627722] sd 5:0:0:0: [sdd]
[  202.628049] Add. Sense: Unrecovered read error - auto reallocate failed
[  202.628773] sd 5:0:0:0: [sdd] CDB:
[  202.629135] Read(10): 28 00 00 3e 3e 00 00 00 80 00
[  202.629907] end_request: I/O error, dev sdd, sector 4079139
[  202.630513] ata6: EH complete
[  205.219181] md: md1: requested-resync done.

So my printk is not called, and it still does not try to recover
the error.

So the first condition in that new if -- bio == r1_bio->bios[r1_bio->read_disk] --
is not true.

So it is - apparently - a wrong guess... ;)

BTW, just in case, -- this is a raid1 with 4 drives (yes, 4 copies of
the same data), not a "usual" 2 disk.

I can test something in next 2..3 hours, next test will be possible
only tomorrow.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-03 17:46       ` Michael Tokarev
@ 2014-02-04  4:30         ` NeilBrown
  2014-02-04 19:34           ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2014-02-04  4:30 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 7202 bytes --]

On Mon, 03 Feb 2014 21:46:48 +0400 Michael Tokarev <mjt@tls.msk.ru> wrote:

> 03.02.2014 11:30, Michael Tokarev пишет:
> > 03.02.2014 08:36, NeilBrown wrote:
> > []
> >> Actually I've changed my mind.  That patch won't fix anything.
> >> fix_sync_read_error() is focussed on fixing a read error on ->read_disk.
> >> So we only set uptodate if ->read_disk succeeded.
> >>
> >> So this patch should do it.
> >>
> >> NeilBrown
> >>
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index fd3a2a14b587..0fe5fd469e74 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -1733,7 +1733,8 @@ static void end_sync_read(struct bio *bio, int error)
> >>  	 * or re-read if the read failed.
> >>  	 * We don't do much here, just schedule handling by raid1d
> >>  	 */
> >> -	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> >> +	if (bio == r1_bio->bios[r1_bio->read_disk] &&
> >> +	    test_bit(BIO_UPTODATE, &bio->bi_flags))
> >>  		set_bit(R1BIO_Uptodate, &r1_bio->state);
> >>  
> >>  	if (atomic_dec_and_test(&r1_bio->remaining))
> >>
> > 
> > I changed it like this for now:
> > 
> > --- ../linux-3.10/drivers/md/raid1.c	2014-02-02 16:01:55.003119836 +0400
> > +++ drivers/md/raid1.c	2014-02-03 11:26:59.062845829 +0400
> > @@ -1634,8 +1634,12 @@ static void end_sync_read(struct bio *bi
> >  	 * or re-read if the read failed.
> >  	 * We don't do much here, just schedule handling by raid1d
> >  	 */
> > -	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> > -		set_bit(R1BIO_Uptodate, &r1_bio->state);
> > +	if (bio == r1_bio->bios[r1_bio->read_disk]) {
> > +	    if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> > + 		set_bit(R1BIO_Uptodate, &r1_bio->state);
> > +	    else
> > +		printk("end_sync_read: our bio, but !BIO_UPTODATE\n");
> > +	}
> > 
> >  	if (atomic_dec_and_test(&r1_bio->remaining))
> >  		reschedule_retry(r1_bio);
> > 
> > and will test it later today (in about 10 hours from now) -- as I
> > mentioned, this is a prod box and testing isn't possible anytime.
> 
> Hm. With this change, there's nothing interesting in dmesg at all anymore:
> 
> [  100.571933] md: requested-resync of RAID array md1
> [  100.572430] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  100.573041] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
> [  100.574135] md: using 128k window, over a total of 2096064k.
> [  202.616277] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
> [  202.617035] ata6.00: irq_stat 0x40000008
> [  202.617439] ata6.00: failed command: READ FPDMA QUEUED
> [  202.617980] ata6.00: cmd 60/80:c0:00:3e:3e/00:00:00:00:00/40 tag 24 ncq 65536 in
> [  202.617980]          res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
> [  202.619631] ata6.00: status: { DRDY ERR }
> [  202.620077] ata6.00: error: { UNC }
> [  202.622692] ata6.00: configured for UDMA/133
> [  202.623182] sd 5:0:0:0: [sdd] Unhandled sense code
> [  202.623693] sd 5:0:0:0: [sdd]
> [  202.624011] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [  202.624600] sd 5:0:0:0: [sdd]
> [  202.624929] Sense Key : Medium Error [current] [descriptor]
> [  202.625576] Descriptor sense data with sense descriptors (in hex):
> [  202.626249]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [  202.627276]         00 3e 3e 23
> [  202.627722] sd 5:0:0:0: [sdd]
> [  202.628049] Add. Sense: Unrecovered read error - auto reallocate failed
> [  202.628773] sd 5:0:0:0: [sdd] CDB:
> [  202.629135] Read(10): 28 00 00 3e 3e 00 00 00 80 00
> [  202.629907] end_request: I/O error, dev sdd, sector 4079139
> [  202.630513] ata6: EH complete
> [  205.219181] md: md1: requested-resync done.
> 
> So my printk is not called, and it still does not try to recover
> the error.
> 
> So the first condition in that new if -- bio == r1_bio->bios[r1_bio->read_disk] --
> is not true.
> 
> So it is - apparently - a wrong guess... ;)
> 
> BTW, just in case, -- this is a raid1 with 4 drives (yes, 4 copies of
> the same data), not a "usual" 2 disk.
> 
> I can test something in next 2..3 hours, next test will be possible
> only tomorrow.
> 
> Thanks,
> 
> /mjt

I'm really on a roll here, aren't I.

I looked again and that code I've been trying to fix as actually perfectly
fine.  I'm not sure whether to be happy to sad about that.

But... I've found the bug.  I know this time because I actually tested it.
I tested and current mainline and it didn't work.  So I hunted and found a
bug.
But that buggy code isn't in 3.10.
So I tested 3.10 and it crashed.
Ah-ha I though.  So I looked at 3.10.27, and  it has different code.  It has
the buggy code.  So I tested that and  it didn't work.
Then I applied  the patch below, and now it does.

The bug was introduced by

commit 30bc9b53878a9921b02e3b5bc4283ac1c6de102a
Author: NeilBrown <neilb@suse.de>
Date:   Wed Jul 17 15:19:29 2013 +1000

    md/raid1: fix bio handling problems in process_checks()

which moved the clearing for bi_flags up in a function to before it was
tested.  That wasn't really the right thing to do.

When that was backported to 3.10 it fixed the crash, but introduce this new
bug.

Anyway enough of my rambling - here is the patch.  As I don't much feel like
trusting my own results just a the moment I look forward to your
confirmation, one way or the other.

Thanks,
NeilBrown


diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index afaa5d425e9a..2328a8c557d4 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1854,11 +1854,15 @@ static int process_checks(struct r1bio *r1_bio)
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		int j;
 		int size;
+		int uptodate;
 		struct bio *b = r1_bio->bios[i];
 		if (b->bi_end_io != end_sync_read)
 			continue;
-		/* fixup the bio for reuse */
+		/* fixup the bio for reuse, but preserve BIO_UPTODATE */
+		uptodate = test_bit(BIO_UPTODATE, &b->bi_flags);
 		bio_reset(b);
+		if (!uptodate)
+			clear_bit(BIO_UPTODATE, &b->bi_flags);
 		b->bi_vcnt = vcnt;
 		b->bi_size = r1_bio->sectors << 9;
 		b->bi_sector = r1_bio->sector +
@@ -1891,11 +1895,14 @@ static int process_checks(struct r1bio *r1_bio)
 		int j;
 		struct bio *pbio = r1_bio->bios[primary];
 		struct bio *sbio = r1_bio->bios[i];
+		int uptodate = test_bit(BIO_UPTODATE, &sbio->bi_flags);
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
+		/*Now we can 'fixup' the BIO_UPTODATE flags */
+		set_bit(BIO_UPTODATE, &sbio->bi_flags);
 
-		if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
+		if (uptodate) {
 			for (j = vcnt; j-- ; ) {
 				struct page *p, *s;
 				p = pbio->bi_io_vec[j].bv_page;
@@ -1910,7 +1917,7 @@ static int process_checks(struct r1bio *r1_bio)
 		if (j >= 0)
 			atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
 		if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
-			      && test_bit(BIO_UPTODATE, &sbio->bi_flags))) {
+			      && uptodate)) {
 			/* No need to write to this device. */
 			sbio->bi_end_io = NULL;
 			rdev_dec_pending(conf->mirrors[i].rdev, mddev);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-04  4:30         ` NeilBrown
@ 2014-02-04 19:34           ` Michael Tokarev
  2014-02-04 22:51             ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2014-02-04 19:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

04.02.2014 08:30, NeilBrown wrote:
[]
> I'm really on a roll here, aren't I.

Well, we both are, unless I don't understand what "on a roll" means :)

> I looked again and that code I've been trying to fix as actually perfectly
> fine.  I'm not sure whether to be happy to sad about that.
> 
> But... I've found the bug.  I know this time because I actually tested it.
> I tested and current mainline and it didn't work.  So I hunted and found a
> bug.
> But that buggy code isn't in 3.10.
> So I tested 3.10 and it crashed.
> Ah-ha I though.  So I looked at 3.10.27, and  it has different code.  It has
> the buggy code.  So I tested that and  it didn't work.
> Then I applied  the patch below, and now it does.
> 
> The bug was introduced by
> 
> commit 30bc9b53878a9921b02e3b5bc4283ac1c6de102a
> Author: NeilBrown <neilb@suse.de>
> Date:   Wed Jul 17 15:19:29 2013 +1000
> 
>     md/raid1: fix bio handling problems in process_checks()
> 
> which moved the clearing for bi_flags up in a function to before it was
> tested.  That wasn't really the right thing to do.
> 
> When that was backported to 3.10 it fixed the crash, but introduce this new
> bug.
> 
> Anyway enough of my rambling - here is the patch.  As I don't much feel like
> trusting my own results just a the moment I look forward to your
> confirmation, one way or the other.

Wow.  I see.
Indeed, I'm running latest 3.10 now, 3.10.28.  I never really thought
about testing other versions, because, well, this didn't look like some
new issue to me, I thought it is some old stuff which hasn't changed
much in 3.13 and up.  Well, if either of us knew it is specific to 3.10.y,
we'd both behave differently from the beginning, aren't we? :)

So I tried your patch (on top of my initial just-the-debugging changes), had to
fix a few MIME =damages on the go, but that is not really interesing.  And
this version actually appears to work, but does it silently.

After a repair run with your last patch applied, I see this:

[  767.456457] md: requested-resync of RAID array md1
[  767.486818] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  767.517404] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
[  767.548977] md: using 128k window, over a total of 2096064k.
[  808.174908] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[  808.206395] ata6.00: irq_stat 0x40000008
[  808.237186] ata6.00: failed command: READ FPDMA QUEUED
[  808.267635] ata6.00: cmd 60/80:00:00:3e:3e/00:00:00:00:00/40 tag 0 ncq 65536 in
[  808.267635]          res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[  808.329226] ata6.00: status: { DRDY ERR }
[  808.359915] ata6.00: error: { UNC }
[  808.392438] ata6.00: configured for UDMA/133
[  808.421989] sd 5:0:0:0: [sdd] Unhandled sense code
[  808.451361] sd 5:0:0:0: [sdd]
[  808.480329] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  808.509679] sd 5:0:0:0: [sdd]
[  808.538719] Sense Key : Medium Error [current] [descriptor]
[  808.568061] Descriptor sense data with sense descriptors (in hex):
[  808.597257]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[  808.626981]         00 3e 3e 23
[  808.656380] sd 5:0:0:0: [sdd]
[  808.685550] Add. Sense: Unrecovered read error - auto reallocate failed
[  808.715375] sd 5:0:0:0: [sdd] CDB:
[  808.744933] Read(10): 28 00 00 3e 3e 00 00 00 80 00
[  808.774678] end_request: I/O error, dev sdd, sector 4079139
[  808.804412] end_sync_read: !BIO_UPTODATE
[  808.834040] ata6: EH complete
[  809.486124] md: md1: requested-resync done.

and now, all pending sectors are gone from the drive, and subsequent reads
of this place does not produce any errors.

However, mismatch_cnt right after this repair run shows 128 (and never goes
larger than 0 on subsequent repair runs).  I'm not sure what this 128 really
means, shouldn't it be just one for a single unreadable 512 bytes?

At the same time, mdadm --monitor reports:

Feb  4 23:19:24 mother mdadm[4793]: RebuildFinished event detected on md device /dev/md1
Feb  4 23:21:13 mother mdadm[4793]: RebuildFinished event detected on md device /dev/md1, component device  mismatches found: 128 (on raid level 1)

So, your patch appears to work now, the only issue is that it is too silent:
I'd expect to see at least some mention of "repairing this or that block", or
something like that.

Meanwhile I found an interesting option of hdparm -- it is --make-bad-sector.
So, despite all the warnings around it, I tried it on this very same prod.
server, and marked the same sector as bad again, and re-run the whole thing
(verifying that read of that sector actually produces an error).  And it all
repeated exactly: repair run silently fixed the error and reported 128 found
mismatches, and after repair run, this place is readable again.

(What I'd love to see now, which is not related to mdadm in any way - is an
ability to remap this place on the drive once and for all, making the first
Reallocate_Event_Count to actually happen, to not bother with it ever again.
As was possible with old good scsi drives, for many years..  Anyone know if
it still possible today with sata drives?  To remap this place and be done
with it, instead of repeating the same - rewrite, it is good now, but with
time it becomes unreadable, so rewrite it again, ad infinitum...)

> Thanks,

Thank you!

Should I try 3.13 kernel too (now when I know how to make a bad sector),
just to verify it works fine without additional patches?

/mjt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-04 19:34           ` Michael Tokarev
@ 2014-02-04 22:51             ` NeilBrown
  0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-02-04 22:51 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6684 bytes --]

On Tue, 04 Feb 2014 23:34:43 +0400 Michael Tokarev <mjt@tls.msk.ru> wrote:

> 04.02.2014 08:30, NeilBrown wrote:
> []
> > I'm really on a roll here, aren't I.
> 
> Well, we both are, unless I don't understand what "on a roll" means :)

"on a roll" usually means "enjoying a series of successes" though it can be
used ironically to mean "suffering a series of failures".  I intended the
second meaning...


> 
> > I looked again and that code I've been trying to fix as actually perfectly
> > fine.  I'm not sure whether to be happy to sad about that.
> > 
> > But... I've found the bug.  I know this time because I actually tested it.
> > I tested and current mainline and it didn't work.  So I hunted and found a
> > bug.
> > But that buggy code isn't in 3.10.
> > So I tested 3.10 and it crashed.
> > Ah-ha I though.  So I looked at 3.10.27, and  it has different code.  It has
> > the buggy code.  So I tested that and  it didn't work.
> > Then I applied  the patch below, and now it does.
> > 
> > The bug was introduced by
> > 
> > commit 30bc9b53878a9921b02e3b5bc4283ac1c6de102a
> > Author: NeilBrown <neilb@suse.de>
> > Date:   Wed Jul 17 15:19:29 2013 +1000
> > 
> >     md/raid1: fix bio handling problems in process_checks()
> > 
> > which moved the clearing for bi_flags up in a function to before it was
> > tested.  That wasn't really the right thing to do.
> > 
> > When that was backported to 3.10 it fixed the crash, but introduce this new
> > bug.
> > 
> > Anyway enough of my rambling - here is the patch.  As I don't much feel like
> > trusting my own results just a the moment I look forward to your
> > confirmation, one way or the other.
> 
> Wow.  I see.
> Indeed, I'm running latest 3.10 now, 3.10.28.  I never really thought
> about testing other versions, because, well, this didn't look like some
> new issue to me, I thought it is some old stuff which hasn't changed
> much in 3.13 and up.  Well, if either of us knew it is specific to 3.10.y,
> we'd both behave differently from the beginning, aren't we? :)
> 
> So I tried your patch (on top of my initial just-the-debugging changes), had to
> fix a few MIME =damages on the go, but that is not really interesing.  And
> this version actually appears to work, but does it silently.

I probably should get md to be a little more verbose when it tries to fix IO
errors.  I people like to know....

> 
> After a repair run with your last patch applied, I see this:
> 
> [  767.456457] md: requested-resync of RAID array md1
> [  767.486818] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  767.517404] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
> [  767.548977] md: using 128k window, over a total of 2096064k.
> [  808.174908] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
> [  808.206395] ata6.00: irq_stat 0x40000008
> [  808.237186] ata6.00: failed command: READ FPDMA QUEUED
> [  808.267635] ata6.00: cmd 60/80:00:00:3e:3e/00:00:00:00:00/40 tag 0 ncq 65536 in
> [  808.267635]          res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
> [  808.329226] ata6.00: status: { DRDY ERR }
> [  808.359915] ata6.00: error: { UNC }
> [  808.392438] ata6.00: configured for UDMA/133
> [  808.421989] sd 5:0:0:0: [sdd] Unhandled sense code
> [  808.451361] sd 5:0:0:0: [sdd]
> [  808.480329] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [  808.509679] sd 5:0:0:0: [sdd]
> [  808.538719] Sense Key : Medium Error [current] [descriptor]
> [  808.568061] Descriptor sense data with sense descriptors (in hex):
> [  808.597257]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [  808.626981]         00 3e 3e 23
> [  808.656380] sd 5:0:0:0: [sdd]
> [  808.685550] Add. Sense: Unrecovered read error - auto reallocate failed
> [  808.715375] sd 5:0:0:0: [sdd] CDB:
> [  808.744933] Read(10): 28 00 00 3e 3e 00 00 00 80 00
> [  808.774678] end_request: I/O error, dev sdd, sector 4079139
> [  808.804412] end_sync_read: !BIO_UPTODATE
> [  808.834040] ata6: EH complete
> [  809.486124] md: md1: requested-resync done.
> 
> and now, all pending sectors are gone from the drive, and subsequent reads
> of this place does not produce any errors.

Excellent!

> 
> However, mismatch_cnt right after this repair run shows 128 (and never goes
> larger than 0 on subsequent repair runs).  I'm not sure what this 128 really
> means, shouldn't it be just one for a single unreadable 512 bytes?

md/raid1 doesn't read individual sectors - it reads 64K at a time and if it
sees a problem it reports that as 128 sectors.  I agree this isn't ideal, but
refining the error down to just one sector is a lot of work for fairly little
gain.


> 
> At the same time, mdadm --monitor reports:
> 
> Feb  4 23:19:24 mother mdadm[4793]: RebuildFinished event detected on md device /dev/md1
> Feb  4 23:21:13 mother mdadm[4793]: RebuildFinished event detected on md device /dev/md1, component device  mismatches found: 128 (on raid level 1)
> 
> So, your patch appears to work now, the only issue is that it is too silent:
> I'd expect to see at least some mention of "repairing this or that block", or
> something like that.
> 
> Meanwhile I found an interesting option of hdparm -- it is --make-bad-sector.
> So, despite all the warnings around it, I tried it on this very same prod.
> server, and marked the same sector as bad again, and re-run the whole thing
> (verifying that read of that sector actually produces an error).  And it all
> repeated exactly: repair run silently fixed the error and reported 128 found
> mismatches, and after repair run, this place is readable again.
> 
> 
> (What I'd love to see now, which is not related to mdadm in any way - is an
> ability to remap this place on the drive once and for all, making the first
> Reallocate_Event_Count to actually happen, to not bother with it ever again.
> As was possible with old good scsi drives, for many years..  Anyone know if
> it still possible today with sata drives?  To remap this place and be done
> with it, instead of repeating the same - rewrite, it is good now, but with
> time it becomes unreadable, so rewrite it again, ad infinitum...)
> 
> > Thanks,
> 
> Thank you!
> 
> Should I try 3.13 kernel too (now when I know how to make a bad sector),
> just to verify it works fine without additional patches?

No, the same bug is present in every kernel since 3.10.something.
I'll send a patch upstream soon now that I have definite confirmation from
you that it works.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2014-02-03  1:04 ` NeilBrown
  2014-02-03  4:36   ` NeilBrown
@ 2014-02-06 14:21   ` Mikael Abrahamsson
  1 sibling, 0 replies; 13+ messages in thread
From: Mikael Abrahamsson @ 2014-02-06 14:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, 3 Feb 2014, NeilBrown wrote:

> We only try to fix a read error when all reads failed rather than when any
> read fails.
> Most of the time there is only one read so this makes no difference.
> However for 'check' and 'repair' we read all devices so the current code will
> only try to repair a read error if *every*  device failed, which of course
> would be pointless.

Is this the same problem I hit and documented in my thread a few weeks 
back? ("read errors not corrected when doing check on RAID6")

Is the same problem is valid for RAID6 as well and it fixes that as well?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 13+ messages in thread

* raid1 repair does not repair errors?
@ 2013-10-21 15:01 Michael Tokarev
  2013-10-22  1:11 ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2013-10-21 15:01 UTC (permalink / raw)
  To: linux-raid

Hello.

I've a raid1 array (composed of 4 drives, so it is a 4-fold
copy of data), and one of the drives has an unreadable (bad)
sector in the partition belonging to this array.

When I run md 'repair' action, it hits the error place, the
kernel clearly returns an error, but md does not do anything
with it.  For example:

Oct 21 18:43:55 mother kernel: [190018.073098] md: requested-resync of RAID array md1
Oct 21 18:43:55 mother kernel: [190018.093910] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Oct 21 18:43:55 mother kernel: [190018.114765] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
Oct 21 18:43:55 mother kernel: [190018.136459] md: using 128k window, over a total of 2096064k.
Oct 21 18:45:11 mother kernel: [190094.091974] ata6.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x0
Oct 21 18:45:11 mother kernel: [190094.114093] ata6.00: irq_stat 0x40000008
Oct 21 18:45:11 mother kernel: [190094.135906] ata6.00: failed command: READ FPDMA QUEUED
Oct 21 18:45:11 mother kernel: [190094.157710] ata6.00: cmd 60/00:00:00:3b:3e/04:00:00:00:00/40 tag 0 ncq 524288 in
Oct 21 18:45:11 mother kernel: [190094.157710]          res 41/40:00:29:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
Oct 21 18:45:11 mother kernel: [190094.202315] ata6.00: status: { DRDY ERR }
Oct 21 18:45:11 mother kernel: [190094.224517] ata6.00: error: { UNC }
Oct 21 18:45:11 mother kernel: [190094.248920] ata6.00: configured for UDMA/133
Oct 21 18:45:11 mother kernel: [190094.271003] sd 5:0:0:0: [sdc] Unhandled sense code
Oct 21 18:45:11 mother kernel: [190094.293044] sd 5:0:0:0: [sdc]
Oct 21 18:45:11 mother kernel: [190094.314654] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 21 18:45:11 mother kernel: [190094.336483] sd 5:0:0:0: [sdc]
Oct 21 18:45:11 mother kernel: [190094.357966] Sense Key : Medium Error [current] [descriptor]
Oct 21 18:45:11 mother kernel: [190094.379808] Descriptor sense data with sense descriptors (in hex):
Oct 21 18:45:11 mother kernel: [190094.402024]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Oct 21 18:45:11 mother kernel: [190094.424502]         00 3e 3e 29
Oct 21 18:45:11 mother kernel: [190094.446338] sd 5:0:0:0: [sdc]
Oct 21 18:45:11 mother kernel: [190094.467995] Add. Sense: Unrecovered read error - auto reallocate failed
Oct 21 18:45:11 mother kernel: [190094.490075] sd 5:0:0:0: [sdc] CDB:
Oct 21 18:45:11 mother kernel: [190094.511870] Read(10): 28 00 00 3e 3b 00 00 04 00 00
Oct 21 18:45:11 mother kernel: [190094.533829] end_request: I/O error, dev sdc, sector 4079145
Oct 21 18:45:11 mother kernel: [190094.555800] ata6: EH complete
Oct 21 18:45:22 mother kernel: [190105.602687] md: md1: requested-resync done.

There's no indication that raid code tried to re-write the bad spot,
and the bad block remains bad in the drive, so next read (direct from
the drive) return the same I/O error with the same kernel messages.

Shouldn't `repair' action re-write the problem place?

This is kernel 3.10.15.

Thank you!

/mjt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2013-10-21 15:01 Michael Tokarev
@ 2013-10-22  1:11 ` NeilBrown
  2013-10-24  8:58   ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2013-10-22  1:11 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4424 bytes --]

On Mon, 21 Oct 2013 19:01:33 +0400 Michael Tokarev <mjt@tls.msk.ru> wrote:

> Hello.
> 
> I've a raid1 array (composed of 4 drives, so it is a 4-fold
> copy of data), and one of the drives has an unreadable (bad)
> sector in the partition belonging to this array.
> 
> When I run md 'repair' action, it hits the error place, the
> kernel clearly returns an error, but md does not do anything
> with it.  For example:
> 
> Oct 21 18:43:55 mother kernel: [190018.073098] md: requested-resync of RAID array md1
> Oct 21 18:43:55 mother kernel: [190018.093910] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Oct 21 18:43:55 mother kernel: [190018.114765] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
> Oct 21 18:43:55 mother kernel: [190018.136459] md: using 128k window, over a total of 2096064k.
> Oct 21 18:45:11 mother kernel: [190094.091974] ata6.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x0
> Oct 21 18:45:11 mother kernel: [190094.114093] ata6.00: irq_stat 0x40000008
> Oct 21 18:45:11 mother kernel: [190094.135906] ata6.00: failed command: READ FPDMA QUEUED
> Oct 21 18:45:11 mother kernel: [190094.157710] ata6.00: cmd 60/00:00:00:3b:3e/04:00:00:00:00/40 tag 0 ncq 524288 in
> Oct 21 18:45:11 mother kernel: [190094.157710]          res 41/40:00:29:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
> Oct 21 18:45:11 mother kernel: [190094.202315] ata6.00: status: { DRDY ERR }
> Oct 21 18:45:11 mother kernel: [190094.224517] ata6.00: error: { UNC }
> Oct 21 18:45:11 mother kernel: [190094.248920] ata6.00: configured for UDMA/133
> Oct 21 18:45:11 mother kernel: [190094.271003] sd 5:0:0:0: [sdc] Unhandled sense code
> Oct 21 18:45:11 mother kernel: [190094.293044] sd 5:0:0:0: [sdc]
> Oct 21 18:45:11 mother kernel: [190094.314654] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Oct 21 18:45:11 mother kernel: [190094.336483] sd 5:0:0:0: [sdc]
> Oct 21 18:45:11 mother kernel: [190094.357966] Sense Key : Medium Error [current] [descriptor]
> Oct 21 18:45:11 mother kernel: [190094.379808] Descriptor sense data with sense descriptors (in hex):
> Oct 21 18:45:11 mother kernel: [190094.402024]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> Oct 21 18:45:11 mother kernel: [190094.424502]         00 3e 3e 29
> Oct 21 18:45:11 mother kernel: [190094.446338] sd 5:0:0:0: [sdc]
> Oct 21 18:45:11 mother kernel: [190094.467995] Add. Sense: Unrecovered read error - auto reallocate failed
> Oct 21 18:45:11 mother kernel: [190094.490075] sd 5:0:0:0: [sdc] CDB:
> Oct 21 18:45:11 mother kernel: [190094.511870] Read(10): 28 00 00 3e 3b 00 00 04 00 00
> Oct 21 18:45:11 mother kernel: [190094.533829] end_request: I/O error, dev sdc, sector 4079145
> Oct 21 18:45:11 mother kernel: [190094.555800] ata6: EH complete
> Oct 21 18:45:22 mother kernel: [190105.602687] md: md1: requested-resync done.
> 
> There's no indication that raid code tried to re-write the bad spot,
> and the bad block remains bad in the drive, so next read (direct from
> the drive) return the same I/O error with the same kernel messages.
> 
> Shouldn't `repair' action re-write the problem place?

Yes it should.
When end_sync_read() notices that BIO_UPTODATE isn't set it refuses to set
R1BIO_Uptodate.
When sync_request_write() notices that isn't set it calls
fix_sync_read_error().

fix_sync_read_error then calls sync_page_io() for each page in the region and
if that fails (as you would expect, it goes on to the next disk and the next
until a working one is found.  Then that block is written back to all those
that failed.
fix_sync_read_error doesn't report any success, but as it re-read the failing
device you should see the SCSI read error reported a second time at least.

Are you able to add some tracing and recompile the kernel and see if you can
find out what is happening?
e.g.
  if end_sync_read doesn't see BIO_UPTODATE, print something.
  if sync_request_write doesn't see R1BIO_Uptodate, print something
  when fix_sync_read_error calls sync_page_io, print something.

??

Thanks,
NeilBrown



> 
> This is kernel 3.10.15.
> 
> Thank you!
> 
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid1 repair does not repair errors?
  2013-10-22  1:11 ` NeilBrown
@ 2013-10-24  8:58   ` Michael Tokarev
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Tokarev @ 2013-10-24  8:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

22.10.2013 05:11, NeilBrown wrote:
> On Mon, 21 Oct 2013 19:01:33 +0400 Michael Tokarev <mjt@tls.msk.ru> wrote:
>
>> Hello.
>>
>> I've a raid1 array (composed of 4 drives, so it is a 4-fold
>> copy of data), and one of the drives has an unreadable (bad)
>> sector in the partition belonging to this array.
>>
>> When I run md 'repair' action, it hits the error place, the
>> kernel clearly returns an error, but md does not do anything
>> with it.  For example:
>>
>> Oct 21 18:43:55 mother kernel: [190018.073098] md: requested-resync of RAID array md1
>> Oct 21 18:43:55 mother kernel: [190018.093910] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>> Oct 21 18:43:55 mother kernel: [190018.114765] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
>> Oct 21 18:43:55 mother kernel: [190018.136459] md: using 128k window, over a total of 2096064k.
>> Oct 21 18:45:11 mother kernel: [190094.091974] ata6.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x0
>> Oct 21 18:45:11 mother kernel: [190094.114093] ata6.00: irq_stat 0x40000008
>> Oct 21 18:45:11 mother kernel: [190094.135906] ata6.00: failed command: READ FPDMA QUEUED
>> Oct 21 18:45:11 mother kernel: [190094.157710] ata6.00: cmd 60/00:00:00:3b:3e/04:00:00:00:00/40 tag 0 ncq 524288 in
>> Oct 21 18:45:11 mother kernel: [190094.157710]          res 41/40:00:29:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
>> Oct 21 18:45:11 mother kernel: [190094.202315] ata6.00: status: { DRDY ERR }
>> Oct 21 18:45:11 mother kernel: [190094.224517] ata6.00: error: { UNC }
>> Oct 21 18:45:11 mother kernel: [190094.248920] ata6.00: configured for UDMA/133
>> Oct 21 18:45:11 mother kernel: [190094.271003] sd 5:0:0:0: [sdc] Unhandled sense code
>> Oct 21 18:45:11 mother kernel: [190094.293044] sd 5:0:0:0: [sdc]
>> Oct 21 18:45:11 mother kernel: [190094.314654] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Oct 21 18:45:11 mother kernel: [190094.336483] sd 5:0:0:0: [sdc]
>> Oct 21 18:45:11 mother kernel: [190094.357966] Sense Key : Medium Error [current] [descriptor]
>> Oct 21 18:45:11 mother kernel: [190094.379808] Descriptor sense data with sense descriptors (in hex):
>> Oct 21 18:45:11 mother kernel: [190094.402024]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
>> Oct 21 18:45:11 mother kernel: [190094.424502]         00 3e 3e 29
>> Oct 21 18:45:11 mother kernel: [190094.446338] sd 5:0:0:0: [sdc]
>> Oct 21 18:45:11 mother kernel: [190094.467995] Add. Sense: Unrecovered read error - auto reallocate failed
>> Oct 21 18:45:11 mother kernel: [190094.490075] sd 5:0:0:0: [sdc] CDB:
>> Oct 21 18:45:11 mother kernel: [190094.511870] Read(10): 28 00 00 3e 3b 00 00 04 00 00
>> Oct 21 18:45:11 mother kernel: [190094.533829] end_request: I/O error, dev sdc, sector 4079145
>> Oct 21 18:45:11 mother kernel: [190094.555800] ata6: EH complete
>> Oct 21 18:45:22 mother kernel: [190105.602687] md: md1: requested-resync done.
>>
>> There's no indication that raid code tried to re-write the bad spot,
>> and the bad block remains bad in the drive, so next read (direct from
>> the drive) return the same I/O error with the same kernel messages.
>>
>> Shouldn't `repair' action re-write the problem place?
>
> Yes it should.
> When end_sync_read() notices that BIO_UPTODATE isn't set it refuses to set
> R1BIO_Uptodate.
> When sync_request_write() notices that isn't set it calls
> fix_sync_read_error().
>
> fix_sync_read_error then calls sync_page_io() for each page in the region and
> if that fails (as you would expect, it goes on to the next disk and the next
> until a working one is found.  Then that block is written back to all those
> that failed.
> fix_sync_read_error doesn't report any success, but as it re-read the failing
> device you should see the SCSI read error reported a second time at least.

I see.  I thought it should too... ;)

> Are you able to add some tracing and recompile the kernel and see if you can
> find out what is happening?

Yes that's okay, except that before your reply, I did an experiment - I removed
the "bad" drive from the array, zeroed the superblock (to force full resync) and
added it back, so whole thing were re-written.  And this time, all bad block
were successfully reallocated.

So I don't have the testcase anymore.  Which is a very bad thing actually, because
something is definitely not right there and I'd really like to find and fix it...
Oh well.

I'll watch it, and sure will try to find out what's going on, but I still hope
there will be no new bad sectors... ;)

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-02-06 14:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-02 12:24 raid1 repair does not repair errors? Michael Tokarev
2014-02-02 21:51 ` Peter Grandi
2014-02-03  1:04 ` NeilBrown
2014-02-03  4:36   ` NeilBrown
2014-02-03  7:30     ` Michael Tokarev
2014-02-03 17:46       ` Michael Tokarev
2014-02-04  4:30         ` NeilBrown
2014-02-04 19:34           ` Michael Tokarev
2014-02-04 22:51             ` NeilBrown
2014-02-06 14:21   ` Mikael Abrahamsson
  -- strict thread matches above, loose matches on Subject: below --
2013-10-21 15:01 Michael Tokarev
2013-10-22  1:11 ` NeilBrown
2013-10-24  8:58   ` Michael Tokarev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.