All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: raid1 repair does not repair errors?
Date: Mon, 03 Feb 2014 21:46:48 +0400	[thread overview]
Message-ID: <52EFD608.6020106@msgid.tls.msk.ru> (raw)
In-Reply-To: <52EF45A0.3010401@msgid.tls.msk.ru>

03.02.2014 11:30, Michael Tokarev пишет:
> 03.02.2014 08:36, NeilBrown wrote:
> []
>> Actually I've changed my mind.  That patch won't fix anything.
>> fix_sync_read_error() is focussed on fixing a read error on ->read_disk.
>> So we only set uptodate if ->read_disk succeeded.
>>
>> So this patch should do it.
>>
>> NeilBrown
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index fd3a2a14b587..0fe5fd469e74 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1733,7 +1733,8 @@ static void end_sync_read(struct bio *bio, int error)
>>  	 * or re-read if the read failed.
>>  	 * We don't do much here, just schedule handling by raid1d
>>  	 */
>> -	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
>> +	if (bio == r1_bio->bios[r1_bio->read_disk] &&
>> +	    test_bit(BIO_UPTODATE, &bio->bi_flags))
>>  		set_bit(R1BIO_Uptodate, &r1_bio->state);
>>  
>>  	if (atomic_dec_and_test(&r1_bio->remaining))
>>
> 
> I changed it like this for now:
> 
> --- ../linux-3.10/drivers/md/raid1.c	2014-02-02 16:01:55.003119836 +0400
> +++ drivers/md/raid1.c	2014-02-03 11:26:59.062845829 +0400
> @@ -1634,8 +1634,12 @@ static void end_sync_read(struct bio *bi
>  	 * or re-read if the read failed.
>  	 * We don't do much here, just schedule handling by raid1d
>  	 */
> -	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> -		set_bit(R1BIO_Uptodate, &r1_bio->state);
> +	if (bio == r1_bio->bios[r1_bio->read_disk]) {
> +	    if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> + 		set_bit(R1BIO_Uptodate, &r1_bio->state);
> +	    else
> +		printk("end_sync_read: our bio, but !BIO_UPTODATE\n");
> +	}
> 
>  	if (atomic_dec_and_test(&r1_bio->remaining))
>  		reschedule_retry(r1_bio);
> 
> and will test it later today (in about 10 hours from now) -- as I
> mentioned, this is a prod box and testing isn't possible anytime.

Hm. With this change, there's nothing interesting in dmesg at all anymore:

[  100.571933] md: requested-resync of RAID array md1
[  100.572430] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  100.573041] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
[  100.574135] md: using 128k window, over a total of 2096064k.
[  202.616277] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[  202.617035] ata6.00: irq_stat 0x40000008
[  202.617439] ata6.00: failed command: READ FPDMA QUEUED
[  202.617980] ata6.00: cmd 60/80:c0:00:3e:3e/00:00:00:00:00/40 tag 24 ncq 65536 in
[  202.617980]          res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[  202.619631] ata6.00: status: { DRDY ERR }
[  202.620077] ata6.00: error: { UNC }
[  202.622692] ata6.00: configured for UDMA/133
[  202.623182] sd 5:0:0:0: [sdd] Unhandled sense code
[  202.623693] sd 5:0:0:0: [sdd]
[  202.624011] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  202.624600] sd 5:0:0:0: [sdd]
[  202.624929] Sense Key : Medium Error [current] [descriptor]
[  202.625576] Descriptor sense data with sense descriptors (in hex):
[  202.626249]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[  202.627276]         00 3e 3e 23
[  202.627722] sd 5:0:0:0: [sdd]
[  202.628049] Add. Sense: Unrecovered read error - auto reallocate failed
[  202.628773] sd 5:0:0:0: [sdd] CDB:
[  202.629135] Read(10): 28 00 00 3e 3e 00 00 00 80 00
[  202.629907] end_request: I/O error, dev sdd, sector 4079139
[  202.630513] ata6: EH complete
[  205.219181] md: md1: requested-resync done.

So my printk is not called, and it still does not try to recover
the error.

So the first condition in that new if -- bio == r1_bio->bios[r1_bio->read_disk] --
is not true.

So it is - apparently - a wrong guess... ;)

BTW, just in case, -- this is a raid1 with 4 drives (yes, 4 copies of
the same data), not a "usual" 2 disk.

I can test something in next 2..3 hours, next test will be possible
only tomorrow.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-02-03 17:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-02 12:24 raid1 repair does not repair errors? Michael Tokarev
2014-02-02 21:51 ` Peter Grandi
2014-02-03  1:04 ` NeilBrown
2014-02-03  4:36   ` NeilBrown
2014-02-03  7:30     ` Michael Tokarev
2014-02-03 17:46       ` Michael Tokarev [this message]
2014-02-04  4:30         ` NeilBrown
2014-02-04 19:34           ` Michael Tokarev
2014-02-04 22:51             ` NeilBrown
2014-02-06 14:21   ` Mikael Abrahamsson
  -- strict thread matches above, loose matches on Subject: below --
2013-10-21 15:01 Michael Tokarev
2013-10-22  1:11 ` NeilBrown
2013-10-24  8:58   ` Michael Tokarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52EFD608.6020106@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.