* mismatch_cnt > 0 during initial sync?
@ 2017-04-24 22:01 Stephane Thiell
2017-04-25 5:14 ` Stephane Thiell
0 siblings, 1 reply; 8+ messages in thread
From: Stephane Thiell @ 2017-04-24 22:01 UTC (permalink / raw)
To: linux-raid@vger.kernel.org
Hi all,
Is it normal to get any mismatch_cnt > 0 during the initial sync? I just created a few new new raid6 volumes, and one of them is showing an increasing number of mismatch_cnt during its initial resync. Nothing is visible in the console nor after a smart short test (I also started a long test just in case).
$ mdadm --detail /dev/md7
/dev/md7:
Version : 1.2
Creation Time : Mon Apr 24 10:07:18 2017
Raid Level : raid6
Array Size : 62511163904 (59615.29 GiB 64011.43 GB)
Used Dev Size : 7813895488 (7451.91 GiB 8001.43 GB)
Raid Devices : 10
Total Devices : 10
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Apr 24 14:52:12 2017
State : clean, resyncing
Active Devices : 10
Working Devices : 10
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Resync Status : 24% complete
Name : oak-io1-s2:7 (local to host oak-io1-s2)
UUID : 6ece9d71:a97f9497:4612ed73:44dfdc0a
Events : 3308
Number Major Minor RaidDevice State
0 253 62 0 active sync /dev/dm-62
1 253 63 1 active sync /dev/dm-63
2 253 72 2 active sync /dev/dm-72
3 253 75 3 active sync /dev/dm-75
4 253 89 4 active sync /dev/dm-89
5 253 87 5 active sync /dev/dm-87
6 253 119 6 active sync /dev/dm-119
7 253 97 7 active sync /dev/dm-97
8 253 103 8 active sync /dev/dm-103
9 253 107 9 active sync /dev/dm-107
$ cat /sys/block/md7/md/sync_completed
3795284824 / 15627790976
$ cat /sys/block/md7/md/mismatch_cnt
3424
versions:
mdadm-3.4-14.el7_3.1.x86_64
CentOS7 (3.10.0-514.10.2.el7_lustre.x86_64)
Thanks!
Stephan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mismatch_cnt > 0 during initial sync?
2017-04-24 22:01 mismatch_cnt > 0 during initial sync? Stephane Thiell
@ 2017-04-25 5:14 ` Stephane Thiell
2017-06-16 17:28 ` Peter Sangas
2017-06-18 21:34 ` NeilBrown
0 siblings, 2 replies; 8+ messages in thread
From: Stephane Thiell @ 2017-04-25 5:14 UTC (permalink / raw)
To: linux-raid@vger.kernel.org
I just did some tests and this seems normal if one of the raid disk previously has some non-zero data. Perhaps there should be a note about that in md(4) or elsewhere? (sorry if I missed it)
Thanks,
Stephan
> On Apr 24, 2017, at 3:01 PM, Stephane Thiell <sthiell@stanford.edu> wrote:
>
>
> Hi all,
>
> Is it normal to get any mismatch_cnt > 0 during the initial sync? I just created a few new new raid6 volumes, and one of them is showing an increasing number of mismatch_cnt during its initial resync. Nothing is visible in the console nor after a smart short test (I also started a long test just in case).
>
>
> $ mdadm --detail /dev/md7
> /dev/md7:
> Version : 1.2
> Creation Time : Mon Apr 24 10:07:18 2017
> Raid Level : raid6
> Array Size : 62511163904 (59615.29 GiB 64011.43 GB)
> Used Dev Size : 7813895488 (7451.91 GiB 8001.43 GB)
> Raid Devices : 10
> Total Devices : 10
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Mon Apr 24 14:52:12 2017
> State : clean, resyncing
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Resync Status : 24% complete
>
> Name : oak-io1-s2:7 (local to host oak-io1-s2)
> UUID : 6ece9d71:a97f9497:4612ed73:44dfdc0a
> Events : 3308
>
> Number Major Minor RaidDevice State
> 0 253 62 0 active sync /dev/dm-62
> 1 253 63 1 active sync /dev/dm-63
> 2 253 72 2 active sync /dev/dm-72
> 3 253 75 3 active sync /dev/dm-75
> 4 253 89 4 active sync /dev/dm-89
> 5 253 87 5 active sync /dev/dm-87
> 6 253 119 6 active sync /dev/dm-119
> 7 253 97 7 active sync /dev/dm-97
> 8 253 103 8 active sync /dev/dm-103
> 9 253 107 9 active sync /dev/dm-107
>
> $ cat /sys/block/md7/md/sync_completed
> 3795284824 / 15627790976
>
> $ cat /sys/block/md7/md/mismatch_cnt
> 3424
>
> versions:
> mdadm-3.4-14.el7_3.1.x86_64
> CentOS7 (3.10.0-514.10.2.el7_lustre.x86_64)
>
> Thanks!
>
> Stephan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: mismatch_cnt > 0 during initial sync?
2017-04-25 5:14 ` Stephane Thiell
@ 2017-06-16 17:28 ` Peter Sangas
2017-06-18 21:34 ` NeilBrown
1 sibling, 0 replies; 8+ messages in thread
From: Peter Sangas @ 2017-06-16 17:28 UTC (permalink / raw)
To: 'Stephane Thiell', linux-raid
> -----Original Message-----
> From: Stephane Thiell [mailto:sthiell@stanford.edu]
> Subject: Re: mismatch_cnt > 0 during initial sync?
>
> I just did some tests and this seems normal if one of the raid disk
previously has
> some non-zero data. Perhaps there should be a note about that in md(4) or
> elsewhere? (sorry if I missed it)
Hi Stephane,
I also thought mismatch_cnt should = 0 after a sync. I am wondering if you
ran the repair command?
echo repair >> /sys/block/mdX/md/sync_action
Thanks,
Pete
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mismatch_cnt > 0 during initial sync?
2017-04-25 5:14 ` Stephane Thiell
2017-06-16 17:28 ` Peter Sangas
@ 2017-06-18 21:34 ` NeilBrown
2017-06-19 21:54 ` Peter Sangas
1 sibling, 1 reply; 8+ messages in thread
From: NeilBrown @ 2017-06-18 21:34 UTC (permalink / raw)
To: Stephane Thiell, linux-raid@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 3201 bytes --]
On Tue, Apr 25 2017, Stephane Thiell wrote:
> I just did some tests and this seems normal if one of the raid disk previously has some non-zero data. Perhaps there should be a note about that in md(4) or elsewhere? (sorry if I missed it)
From the perspective of md, the initial sync is no different from any
other sync. It will count the number of mismatches that it finds and
fixes.
Feel free to post a patch for md.4 or elsewhere to clarify this.
Thanks,
NeilBrown
>
> Thanks,
>
> Stephan
>
>> On Apr 24, 2017, at 3:01 PM, Stephane Thiell <sthiell@stanford.edu> wrote:
>>
>>
>> Hi all,
>>
>> Is it normal to get any mismatch_cnt > 0 during the initial sync? I just created a few new new raid6 volumes, and one of them is showing an increasing number of mismatch_cnt during its initial resync. Nothing is visible in the console nor after a smart short test (I also started a long test just in case).
>>
>>
>> $ mdadm --detail /dev/md7
>> /dev/md7:
>> Version : 1.2
>> Creation Time : Mon Apr 24 10:07:18 2017
>> Raid Level : raid6
>> Array Size : 62511163904 (59615.29 GiB 64011.43 GB)
>> Used Dev Size : 7813895488 (7451.91 GiB 8001.43 GB)
>> Raid Devices : 10
>> Total Devices : 10
>> Persistence : Superblock is persistent
>>
>> Intent Bitmap : Internal
>>
>> Update Time : Mon Apr 24 14:52:12 2017
>> State : clean, resyncing
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Resync Status : 24% complete
>>
>> Name : oak-io1-s2:7 (local to host oak-io1-s2)
>> UUID : 6ece9d71:a97f9497:4612ed73:44dfdc0a
>> Events : 3308
>>
>> Number Major Minor RaidDevice State
>> 0 253 62 0 active sync /dev/dm-62
>> 1 253 63 1 active sync /dev/dm-63
>> 2 253 72 2 active sync /dev/dm-72
>> 3 253 75 3 active sync /dev/dm-75
>> 4 253 89 4 active sync /dev/dm-89
>> 5 253 87 5 active sync /dev/dm-87
>> 6 253 119 6 active sync /dev/dm-119
>> 7 253 97 7 active sync /dev/dm-97
>> 8 253 103 8 active sync /dev/dm-103
>> 9 253 107 9 active sync /dev/dm-107
>>
>> $ cat /sys/block/md7/md/sync_completed
>> 3795284824 / 15627790976
>>
>> $ cat /sys/block/md7/md/mismatch_cnt
>> 3424
>>
>> versions:
>> mdadm-3.4-14.el7_3.1.x86_64
>> CentOS7 (3.10.0-514.10.2.el7_lustre.x86_64)
>>
>> Thanks!
>>
>> Stephan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: mismatch_cnt > 0 during initial sync?
2017-06-18 21:34 ` NeilBrown
@ 2017-06-19 21:54 ` Peter Sangas
2017-06-20 4:14 ` NeilBrown
0 siblings, 1 reply; 8+ messages in thread
From: Peter Sangas @ 2017-06-19 21:54 UTC (permalink / raw)
To: 'NeilBrown', 'Stephane Thiell', linux-raid
> From: NeilBrown [mailto:neilb@suse.com]
> Sent: Sunday, June 18, 2017 2:35 PM
> Subject: Re: mismatch_cnt > 0 during initial sync?
>
>
> From the perspective of md, the initial sync is no different from any
other sync. It
> will count the number of mismatches that it finds and fixes.
>
Should a sync always fix a mismatch it encounters? I have a RAID1 with 3
disks. Sometimes I need to replace one disk and after adding a replacement
disk syslog indicates "RebuildFinished event detected on md device
/dev/md/2, component device mismatches found: 256 (on raid level 1)" but
says nothing about fixing it.
cat /sys/block/md2/md/last_sync_action
recovery
mdadm -V
mdadm - v3.3 - 3rd September 2013
Thank you,
Pete
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: mismatch_cnt > 0 during initial sync?
2017-06-19 21:54 ` Peter Sangas
@ 2017-06-20 4:14 ` NeilBrown
2017-06-20 20:23 ` Peter Sangas
0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2017-06-20 4:14 UTC (permalink / raw)
To: Peter Sangas, 'Stephane Thiell', linux-raid
[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]
On Mon, Jun 19 2017, Peter Sangas wrote:
>> From: NeilBrown [mailto:neilb@suse.com]
>> Sent: Sunday, June 18, 2017 2:35 PM
>> Subject: Re: mismatch_cnt > 0 during initial sync?
>>
>>
>> From the perspective of md, the initial sync is no different from any
> other sync. It
>> will count the number of mismatches that it finds and fixes.
>>
>
> Should a sync always fix a mismatch it encounters? I have a RAID1 with 3
> disks. Sometimes I need to replace one disk and after adding a replacement
> disk syslog indicates "RebuildFinished event detected on md device
> /dev/md/2, component device mismatches found: 256 (on raid level 1)" but
> says nothing about fixing it.
No, it wouldn't say anything about fixing things. That is assumed.
>
> cat /sys/block/md2/md/last_sync_action
> recovery
This is a recovery, not a resync. They are different.
Recovery is when you add a device to an array, and the data that should
be there is recovered from elsewhere.
Resync is when the redundancy in the array might be compromised, so it
is repaired, possibly by read-check-maybe_write. Possibly by
read-write.
For raid1, recovery and resync and require similar. For raid5 they are
very different.
The mismatch count is only reset when a resync starts, not when a
recovery (or reshape) starts. So mdadm shouldn't really report the
mismatches when the recovery finishes. The number is left over from the
most recent resync.
raid1 only counts when a resync is requested, either by writing "check"
or "repair" to the sync_action file in sysfs. An automatic resync after
and unclean shutdown (or when array is started) just copies blocks
without checking, so it has nothing to count.
"repair" repairs any inconsistencies found, "check" doesn't. Both count
inconsistencies.
raid5/raid6 counts for resync, check, and repair (but not for recover or
reshape).
By default, when the array is created, raid5 performs a recovery,
nominating one of the devices to be the "spare" with the others assumed
to have "correct" data. This is faster than assuming they are all
"correct", and performing a resync.
RAID6 (the array used in the original question) does resync, rather than
recovery, on initial creation - because 2-drive recovery is/was thought
to have performance issues.
So: mdadm should be modified to not report "mismatches found" is
"last_sync_action" is recovery or reshape.
Thanks,
NeilBrown
>
> mdadm -V
> mdadm - v3.3 - 3rd September 2013
>
>
> Thank you,
> Pete
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: mismatch_cnt > 0 during initial sync?
2017-06-20 4:14 ` NeilBrown
@ 2017-06-20 20:23 ` Peter Sangas
2017-06-20 21:26 ` NeilBrown
0 siblings, 1 reply; 8+ messages in thread
From: Peter Sangas @ 2017-06-20 20:23 UTC (permalink / raw)
To: 'NeilBrown', 'Stephane Thiell', linux-raid
> From: NeilBrown [mailto:neilb@suse.com]
> Sent: Monday, June 19, 2017 9:15 PM
> Subject: RE: mismatch_cnt > 0 during initial sync?
>
> This is a recovery, not a resync. They are different.
Thank you for pointing this out and the nice explanation.
From reading this list and other sources issuing a repair command as
follows:
"echo repair > /sys/block/md2/md/sync_action"
might be ill-advised since it's luck of the draw whether or not the
operation gets the right data instead of the bad data.
Is this correct?
For reference I have a RAID1 with 3 disks. mdadm - v3.3 - 3rd September
2013
Thank you,
Pete
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: mismatch_cnt > 0 during initial sync?
2017-06-20 20:23 ` Peter Sangas
@ 2017-06-20 21:26 ` NeilBrown
0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2017-06-20 21:26 UTC (permalink / raw)
To: Peter Sangas, 'Stephane Thiell', linux-raid
[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]
On Tue, Jun 20 2017, Peter Sangas wrote:
>> From: NeilBrown [mailto:neilb@suse.com]
>> Sent: Monday, June 19, 2017 9:15 PM
>> Subject: RE: mismatch_cnt > 0 during initial sync?
>>
>> This is a recovery, not a resync. They are different.
>
> Thank you for pointing this out and the nice explanation.
>
> From reading this list and other sources issuing a repair command as
> follows:
>
> "echo repair > /sys/block/md2/md/sync_action"
>
> might be ill-advised since it's luck of the draw whether or not the
> operation gets the right data instead of the bad data.
>
> Is this correct?
How do you define "right data" and "bad data"?
What is your threat-model which explains how the two blocks are
different?
In most likely cases, neither version if the data is more correct than
the other. In some others, your hardware is broken and needs to be
replaced.
If you have no understanding of why there might be a difference, then
issuing a repair is not likely to be worse than not issuing a repair.
If you do have that understanding, then use that to make your decision.
NeilBrown
>
> For reference I have a RAID1 with 3 disks. mdadm - v3.3 - 3rd September
> 2013
>
> Thank you,
> Pete
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-06-20 21:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-24 22:01 mismatch_cnt > 0 during initial sync? Stephane Thiell
2017-04-25 5:14 ` Stephane Thiell
2017-06-16 17:28 ` Peter Sangas
2017-06-18 21:34 ` NeilBrown
2017-06-19 21:54 ` Peter Sangas
2017-06-20 4:14 ` NeilBrown
2017-06-20 20:23 ` Peter Sangas
2017-06-20 21:26 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).