* Scrubbing "check" not working for RAID10 in 3.10-rc1+
@ 2013-06-25 6:19 Jonathan Brassow
2013-06-25 6:32 ` NeilBrown
2013-07-15 15:40 ` Jonathan Brassow
0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Brassow @ 2013-06-25 6:19 UTC (permalink / raw)
To: neilb; +Cc: linux-raid
Neil,
I've noticed that the "check" operation no longer works for RAID10. It
works just fine for the other RAIDs. The ("data-check") sync_thread
kicks off just fine, sync_request_write() is called, but it never gets
past:
if (i == conf->copies)
goto done;
The test I am performing creates a RAID array, waits for it to sync,
shuts it down, writes random data to one of the devices, assembles the
array, and then runs a "check" - there should be descrepancies. The
descrepancies are found and recorded in resync_mismatches for all RAIDs
<= 3.9 and only for non-RAID10 3.10-rc1+.
I'm sorry I haven't tracked it down yet and I'm going to be on vacation
starting tomorrow with only intermittent access to e-mail. Sorry to
leave you hanging.
Thanks,
brassow
P.S. This also reminded me of a patch I have concerning tracking the
last sync action for the purpose of making mismatch_count more useful.
I'll post that before leaving.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Scrubbing "check" not working for RAID10 in 3.10-rc1+
2013-06-25 6:19 Scrubbing "check" not working for RAID10 in 3.10-rc1+ Jonathan Brassow
@ 2013-06-25 6:32 ` NeilBrown
2013-07-15 15:35 ` Brassow Jonathan
2013-07-15 15:40 ` Jonathan Brassow
1 sibling, 1 reply; 6+ messages in thread
From: NeilBrown @ 2013-06-25 6:32 UTC (permalink / raw)
To: Jonathan Brassow; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1330 bytes --]
On Tue, 25 Jun 2013 01:19:20 -0500 Jonathan Brassow <jbrassow@redhat.com>
wrote:
> Neil,
>
> I've noticed that the "check" operation no longer works for RAID10. It
> works just fine for the other RAIDs. The ("data-check") sync_thread
> kicks off just fine, sync_request_write() is called, but it never gets
> past:
> if (i == conf->copies)
> goto done;
> The test I am performing creates a RAID array, waits for it to sync,
> shuts it down, writes random data to one of the devices, assembles the
> array, and then runs a "check" - there should be descrepancies. The
> descrepancies are found and recorded in resync_mismatches for all RAIDs
> <= 3.9 and only for non-RAID10 3.10-rc1+.
I just tried on 3.10-rc5+ and it works as expected.
If you can provide a test script that fails, I'll look into it.
>
> I'm sorry I haven't tracked it down yet and I'm going to be on vacation
> starting tomorrow with only intermittent access to e-mail. Sorry to
> leave you hanging.
Go enjoy your vacation and don't worry about me hanging :-)
>
> Thanks,
> brassow
>
> P.S. This also reminded me of a patch I have concerning tracking the
> last sync action for the purpose of making mismatch_count more useful.
> I'll post that before leaving.
>
Thanks.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Scrubbing "check" not working for RAID10 in 3.10-rc1+
2013-06-25 6:32 ` NeilBrown
@ 2013-07-15 15:35 ` Brassow Jonathan
2013-07-16 7:01 ` NeilBrown
0 siblings, 1 reply; 6+ messages in thread
From: Brassow Jonathan @ 2013-07-15 15:35 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Jun 25, 2013, at 1:32 AM, NeilBrown wrote:
> On Tue, 25 Jun 2013 01:19:20 -0500 Jonathan Brassow <jbrassow@redhat.com>
> wrote:
>
>> Neil,
>>
>> I've noticed that the "check" operation no longer works for RAID10. It
>> works just fine for the other RAIDs. The ("data-check") sync_thread
>> kicks off just fine, sync_request_write() is called, but it never gets
>> past:
>> if (i == conf->copies)
>> goto done;
>> The test I am performing creates a RAID array, waits for it to sync,
>> shuts it down, writes random data to one of the devices, assembles the
>> array, and then runs a "check" - there should be descrepancies. The
>> descrepancies are found and recorded in resync_mismatches for all RAIDs
>> <= 3.9 and only for non-RAID10 3.10-rc1+.
>
> I just tried on 3.10-rc5+ and it works as expected.
> If you can provide a test script that fails, I'll look into it.
Just tried 3.10 - it fails for me there too. I'll send you the script I use shortly.
thanks,
brassow
(vacation ends soon.)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Scrubbing "check" not working for RAID10 in 3.10-rc1+
2013-06-25 6:19 Scrubbing "check" not working for RAID10 in 3.10-rc1+ Jonathan Brassow
2013-06-25 6:32 ` NeilBrown
@ 2013-07-15 15:40 ` Jonathan Brassow
1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Brassow @ 2013-07-15 15:40 UTC (permalink / raw)
To: neilb; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2786 bytes --]
Neil,
You will need to change 'devices' to suite your needs. I can run this
test with RAID 1/4/5/6 and it works, but it fails with RAID10 since
10-rc1.
thanks,
brassow
example output:
[~]# ./md.sh 1
mdadm: /dev/sda1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Sat Jul 13 14:52:49 2013
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device. If you plan to
store '/boot' on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
--metadata=0.90
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Sat Jul 13 14:52:49 2013
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Sat Jul 13 14:52:49 2013
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Sat Jul 13 14:52:49 2013
mdadm: largest drive (/dev/sda1) exceeds size (102400K) by more than 1%
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
Waiting for resync to complete
Waiting for resync to complete
Waiting for resync to complete
Waiting for resync to complete
RAID1 mismatch count after creation : 0
mdadm: stopped /dev/md0
Writing garbage to one of the MD devices...
mdadm: /dev/md0 has been started with 4 drives.
RAID1 mismatch count after reactivation : 0
Waiting for check to complete
Waiting for check to complete
Waiting for check to complete
RAID1 mismatch count after data-check : 61440
mdadm: stopped /dev/md0
[~]# ./md.sh 10
mdadm: /dev/sda1 appears to be part of a raid array:
level=raid1 devices=4 ctime=Mon Jul 15 10:30:44 2013
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid1 devices=4 ctime=Mon Jul 15 10:30:44 2013
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid1 devices=4 ctime=Mon Jul 15 10:30:44 2013
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid1 devices=4 ctime=Mon Jul 15 10:30:44 2013
mdadm: largest drive (/dev/sda1) exceeds size (102400K) by more than 1%
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
Waiting for resync to complete
Waiting for resync to complete
Waiting for resync to complete
RAID10 mismatch count after creation : 0
mdadm: stopped /dev/md0
Writing garbage to one of the MD devices...
mdadm: /dev/md0 has been started with 4 drives.
RAID10 mismatch count after reactivation : 0
Waiting for check to complete
Waiting for check to complete
Waiting for check to complete
RAID10 mismatch count after data-check : 0
mdadm: stopped /dev/md0
***** mismatch_cnt should not be zero !!!!!!
[-- Attachment #2: md.sh --]
[-- Type: application/x-shellscript, Size: 1742 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Scrubbing "check" not working for RAID10 in 3.10-rc1+
2013-07-15 15:35 ` Brassow Jonathan
@ 2013-07-16 7:01 ` NeilBrown
2013-07-17 18:24 ` Brassow Jonathan
0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2013-07-16 7:01 UTC (permalink / raw)
To: Brassow Jonathan; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3430 bytes --]
On Mon, 15 Jul 2013 10:35:07 -0500 Brassow Jonathan <jbrassow@redhat.com>
wrote:
>
> On Jun 25, 2013, at 1:32 AM, NeilBrown wrote:
>
> > On Tue, 25 Jun 2013 01:19:20 -0500 Jonathan Brassow <jbrassow@redhat.com>
> > wrote:
> >
> >> Neil,
> >>
> >> I've noticed that the "check" operation no longer works for RAID10. It
> >> works just fine for the other RAIDs. The ("data-check") sync_thread
> >> kicks off just fine, sync_request_write() is called, but it never gets
> >> past:
> >> if (i == conf->copies)
> >> goto done;
> >> The test I am performing creates a RAID array, waits for it to sync,
> >> shuts it down, writes random data to one of the devices, assembles the
> >> array, and then runs a "check" - there should be descrepancies. The
> >> descrepancies are found and recorded in resync_mismatches for all RAIDs
> >> <= 3.9 and only for non-RAID10 3.10-rc1+.
> >
> > I just tried on 3.10-rc5+ and it works as expected.
> > If you can provide a test script that fails, I'll look into it.
>
> Just tried 3.10 - it fails for me there too. I'll send you the script I use shortly.
>
> thanks,
> brassow
>
> (vacation ends soon.)
:-)
Thanks. This patch seems to fix it.
NeilBrown
From b0b0ac3ecf1e54dd6a429294082c47f1e52db41d Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 16 Jul 2013 16:50:47 +1000
Subject: [PATCH] md/raid10: fix two problems with RAID10 resync.
1/ When an different between blocks is found, data is copied from
one bio to the other. However bv_len is used as the length to
copy and this could be zero. So use r10_bio->sectors to calculate
length instead.
Using bv_len was probably always a bit dubious, but the introduction
of bio_advance made it much more likely to be a problem.
2/ When preparing some blocks for sync, we don't set BIO_UPTODATE
except on bios that we schedule for a read. This ensures that
missing/failed devices don't confuse the loop at the top of
sync_request write.
Commit 8be185f2c9d54d6 "raid10: Use bio_reset()"
removed a loop which set BIO_UPTDATE on all appropriate bios.
So we need to re-add that flag.
Reported-by: Brassow Jonathan <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index cd066b6..957a719 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2097,11 +2097,17 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
* both 'first' and 'i', so we just compare them.
* All vec entries are PAGE_SIZE;
*/
- for (j = 0; j < vcnt; j++)
+ int sectors = r10_bio->sectors;
+ for (j = 0; j < vcnt; j++) {
+ int len = PAGE_SIZE;
+ if (sectors < (len / 512))
+ len = sectors * 512;
if (memcmp(page_address(fbio->bi_io_vec[j].bv_page),
page_address(tbio->bi_io_vec[j].bv_page),
- fbio->bi_io_vec[j].bv_len))
+ len))
break;
+ sectors -= len/512;
+ }
if (j == vcnt)
continue;
atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
@@ -3407,6 +3413,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
if (bio->bi_end_io == end_sync_read) {
md_sync_acct(bio->bi_bdev, nr_sectors);
+ set_bit(BIO_UPTODATE, &bio->bi_flags);
generic_make_request(bio);
}
}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Scrubbing "check" not working for RAID10 in 3.10-rc1+
2013-07-16 7:01 ` NeilBrown
@ 2013-07-17 18:24 ` Brassow Jonathan
0 siblings, 0 replies; 6+ messages in thread
From: Brassow Jonathan @ 2013-07-17 18:24 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Jul 16, 2013, at 2:01 AM, NeilBrown wrote:
> On Mon, 15 Jul 2013 10:35:07 -0500 Brassow Jonathan <jbrassow@redhat.com>
> wrote:
>
>>
>> On Jun 25, 2013, at 1:32 AM, NeilBrown wrote:
>>
>>> On Tue, 25 Jun 2013 01:19:20 -0500 Jonathan Brassow <jbrassow@redhat.com>
>>> wrote:
>>>
>>>> Neil,
>>>>
>>>> I've noticed that the "check" operation no longer works for RAID10. It
>>>> works just fine for the other RAIDs. The ("data-check") sync_thread
>>>> kicks off just fine, sync_request_write() is called, but it never gets
>>>> past:
>>>> if (i == conf->copies)
>>>> goto done;
>>>> The test I am performing creates a RAID array, waits for it to sync,
>>>> shuts it down, writes random data to one of the devices, assembles the
>>>> array, and then runs a "check" - there should be descrepancies. The
>>>> descrepancies are found and recorded in resync_mismatches for all RAIDs
>>>> <= 3.9 and only for non-RAID10 3.10-rc1+.
>>>
>>> I just tried on 3.10-rc5+ and it works as expected.
>>> If you can provide a test script that fails, I'll look into it.
>>
>> Just tried 3.10 - it fails for me there too. I'll send you the script I use shortly.
>>
>> thanks,
>> brassow
>>
>> (vacation ends soon.)
> :-)
>
> Thanks. This patch seems to fix it.
Yes, it does. Thanks!
brassow
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-07-17 18:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-25 6:19 Scrubbing "check" not working for RAID10 in 3.10-rc1+ Jonathan Brassow
2013-06-25 6:32 ` NeilBrown
2013-07-15 15:35 ` Brassow Jonathan
2013-07-16 7:01 ` NeilBrown
2013-07-17 18:24 ` Brassow Jonathan
2013-07-15 15:40 ` Jonathan Brassow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).