mismatch_cnt > 0 during initial sync?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mismatch_cnt > 0 during initial sync?
@ 2017-04-24 22:01 Stephane Thiell
  2017-04-25  5:14 ` Stephane Thiell
  0 siblings, 1 reply; 8+ messages in thread
From: Stephane Thiell @ 2017-04-24 22:01 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org


Hi all,

Is it normal to get any mismatch_cnt > 0 during the initial sync? I just created a few new new raid6 volumes, and one of them is showing an increasing number of mismatch_cnt during its initial resync. Nothing is visible in the console nor after a smart short test (I also started a long test just in case).


$ mdadm --detail /dev/md7
/dev/md7:
        Version : 1.2
  Creation Time : Mon Apr 24 10:07:18 2017
     Raid Level : raid6
     Array Size : 62511163904 (59615.29 GiB 64011.43 GB)
  Used Dev Size : 7813895488 (7451.91 GiB 8001.43 GB)
   Raid Devices : 10
  Total Devices : 10
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Apr 24 14:52:12 2017
          State : clean, resyncing 
 Active Devices : 10
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

  Resync Status : 24% complete

           Name : oak-io1-s2:7  (local to host oak-io1-s2)
           UUID : 6ece9d71:a97f9497:4612ed73:44dfdc0a
         Events : 3308

    Number   Major   Minor   RaidDevice State
       0     253       62        0      active sync   /dev/dm-62
       1     253       63        1      active sync   /dev/dm-63
       2     253       72        2      active sync   /dev/dm-72
       3     253       75        3      active sync   /dev/dm-75
       4     253       89        4      active sync   /dev/dm-89
       5     253       87        5      active sync   /dev/dm-87
       6     253      119        6      active sync   /dev/dm-119
       7     253       97        7      active sync   /dev/dm-97
       8     253      103        8      active sync   /dev/dm-103
       9     253      107        9      active sync   /dev/dm-107

$ cat /sys/block/md7/md/sync_completed
3795284824 / 15627790976

$ cat /sys/block/md7/md/mismatch_cnt 
3424

versions:
  mdadm-3.4-14.el7_3.1.x86_64
  CentOS7 (3.10.0-514.10.2.el7_lustre.x86_64)

Thanks!

Stephan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mismatch_cnt > 0 during initial sync?
  2017-04-24 22:01 mismatch_cnt > 0 during initial sync? Stephane Thiell
@ 2017-04-25  5:14 ` Stephane Thiell
  2017-06-16 17:28   ` Peter Sangas
  2017-06-18 21:34   ` NeilBrown
  0 siblings, 2 replies; 8+ messages in thread
From: Stephane Thiell @ 2017-04-25  5:14 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

I just did some tests and this seems normal if one of the raid disk previously has some non-zero data.  Perhaps there should be a note about that in md(4) or elsewhere? (sorry if I missed it)

Thanks,

Stephan

> On Apr 24, 2017, at 3:01 PM, Stephane Thiell <sthiell@stanford.edu> wrote:
> 
> 
> Hi all,
> 
> Is it normal to get any mismatch_cnt > 0 during the initial sync? I just created a few new new raid6 volumes, and one of them is showing an increasing number of mismatch_cnt during its initial resync. Nothing is visible in the console nor after a smart short test (I also started a long test just in case).
> 
> 
> $ mdadm --detail /dev/md7
> /dev/md7:
>        Version : 1.2
>  Creation Time : Mon Apr 24 10:07:18 2017
>     Raid Level : raid6
>     Array Size : 62511163904 (59615.29 GiB 64011.43 GB)
>  Used Dev Size : 7813895488 (7451.91 GiB 8001.43 GB)
>   Raid Devices : 10
>  Total Devices : 10
>    Persistence : Superblock is persistent
> 
>  Intent Bitmap : Internal
> 
>    Update Time : Mon Apr 24 14:52:12 2017
>          State : clean, resyncing 
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
>  Spare Devices : 0
> 
>         Layout : left-symmetric
>     Chunk Size : 64K
> 
>  Resync Status : 24% complete
> 
>           Name : oak-io1-s2:7  (local to host oak-io1-s2)
>           UUID : 6ece9d71:a97f9497:4612ed73:44dfdc0a
>         Events : 3308
> 
>    Number   Major   Minor   RaidDevice State
>       0     253       62        0      active sync   /dev/dm-62
>       1     253       63        1      active sync   /dev/dm-63
>       2     253       72        2      active sync   /dev/dm-72
>       3     253       75        3      active sync   /dev/dm-75
>       4     253       89        4      active sync   /dev/dm-89
>       5     253       87        5      active sync   /dev/dm-87
>       6     253      119        6      active sync   /dev/dm-119
>       7     253       97        7      active sync   /dev/dm-97
>       8     253      103        8      active sync   /dev/dm-103
>       9     253      107        9      active sync   /dev/dm-107
> 
> $ cat /sys/block/md7/md/sync_completed
> 3795284824 / 15627790976
> 
> $ cat /sys/block/md7/md/mismatch_cnt 
> 3424
> 
> versions:
>  mdadm-3.4-14.el7_3.1.x86_64
>  CentOS7 (3.10.0-514.10.2.el7_lustre.x86_64)
> 
> Thanks!
> 
> Stephan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: mismatch_cnt > 0 during initial sync?
  2017-04-25  5:14 ` Stephane Thiell
@ 2017-06-16 17:28   ` Peter Sangas
  2017-06-18 21:34   ` NeilBrown
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Sangas @ 2017-06-16 17:28 UTC (permalink / raw)
  To: 'Stephane Thiell', linux-raid

> -----Original Message-----
> From: Stephane Thiell [mailto:sthiell@stanford.edu]
> Subject: Re: mismatch_cnt > 0 during initial sync?
> 
> I just did some tests and this seems normal if one of the raid disk
previously has
> some non-zero data.  Perhaps there should be a note about that in md(4) or
> elsewhere? (sorry if I missed it)

Hi Stephane,

I also thought mismatch_cnt should = 0 after a sync.  I am wondering if you
ran the repair command?

echo repair >> /sys/block/mdX/md/sync_action

Thanks,
Pete





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mismatch_cnt > 0 during initial sync?
  2017-04-25  5:14 ` Stephane Thiell
  2017-06-16 17:28   ` Peter Sangas
@ 2017-06-18 21:34   ` NeilBrown
  2017-06-19 21:54     ` Peter Sangas
  1 sibling, 1 reply; 8+ messages in thread
From: NeilBrown @ 2017-06-18 21:34 UTC (permalink / raw)
  To: Stephane Thiell, linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3201 bytes --]

On Tue, Apr 25 2017, Stephane Thiell wrote:

> I just did some tests and this seems normal if one of the raid disk previously has some non-zero data.  Perhaps there should be a note about that in md(4) or elsewhere? (sorry if I missed it)

From the perspective of md, the initial sync is no different from any
other sync.  It will count the number of mismatches that it finds and
fixes.

Feel free to post a patch for md.4 or elsewhere to clarify this.

Thanks,
NeilBrown


>
> Thanks,
>
> Stephan
>
>> On Apr 24, 2017, at 3:01 PM, Stephane Thiell <sthiell@stanford.edu> wrote:
>> 
>> 
>> Hi all,
>> 
>> Is it normal to get any mismatch_cnt > 0 during the initial sync? I just created a few new new raid6 volumes, and one of them is showing an increasing number of mismatch_cnt during its initial resync. Nothing is visible in the console nor after a smart short test (I also started a long test just in case).
>> 
>> 
>> $ mdadm --detail /dev/md7
>> /dev/md7:
>>        Version : 1.2
>>  Creation Time : Mon Apr 24 10:07:18 2017
>>     Raid Level : raid6
>>     Array Size : 62511163904 (59615.29 GiB 64011.43 GB)
>>  Used Dev Size : 7813895488 (7451.91 GiB 8001.43 GB)
>>   Raid Devices : 10
>>  Total Devices : 10
>>    Persistence : Superblock is persistent
>> 
>>  Intent Bitmap : Internal
>> 
>>    Update Time : Mon Apr 24 14:52:12 2017
>>          State : clean, resyncing 
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>>  Spare Devices : 0
>> 
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>> 
>>  Resync Status : 24% complete
>> 
>>           Name : oak-io1-s2:7  (local to host oak-io1-s2)
>>           UUID : 6ece9d71:a97f9497:4612ed73:44dfdc0a
>>         Events : 3308
>> 
>>    Number   Major   Minor   RaidDevice State
>>       0     253       62        0      active sync   /dev/dm-62
>>       1     253       63        1      active sync   /dev/dm-63
>>       2     253       72        2      active sync   /dev/dm-72
>>       3     253       75        3      active sync   /dev/dm-75
>>       4     253       89        4      active sync   /dev/dm-89
>>       5     253       87        5      active sync   /dev/dm-87
>>       6     253      119        6      active sync   /dev/dm-119
>>       7     253       97        7      active sync   /dev/dm-97
>>       8     253      103        8      active sync   /dev/dm-103
>>       9     253      107        9      active sync   /dev/dm-107
>> 
>> $ cat /sys/block/md7/md/sync_completed
>> 3795284824 / 15627790976
>> 
>> $ cat /sys/block/md7/md/mismatch_cnt 
>> 3424
>> 
>> versions:
>>  mdadm-3.4-14.el7_3.1.x86_64
>>  CentOS7 (3.10.0-514.10.2.el7_lustre.x86_64)
>> 
>> Thanks!
>> 
>> Stephan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: mismatch_cnt > 0 during initial sync?
  2017-06-18 21:34   ` NeilBrown
@ 2017-06-19 21:54     ` Peter Sangas
  2017-06-20  4:14       ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Sangas @ 2017-06-19 21:54 UTC (permalink / raw)
  To: 'NeilBrown', 'Stephane Thiell', linux-raid

> From: NeilBrown [mailto:neilb@suse.com]
> Sent: Sunday, June 18, 2017 2:35 PM
> Subject: Re: mismatch_cnt > 0 during initial sync?
> 
> 
> From the perspective of md, the initial sync is no different from any
other sync.  It
> will count the number of mismatches that it finds and fixes.
> 

Should a sync always fix a mismatch it encounters?   I have a RAID1 with 3
disks.  Sometimes I need to replace one disk and after adding a replacement
disk syslog indicates  "RebuildFinished event detected on md device
/dev/md/2, component device  mismatches found: 256 (on raid level 1)" but
says nothing about fixing it.

cat /sys/block/md2/md/last_sync_action
recovery

mdadm -V
mdadm - v3.3 - 3rd September 2013


Thank you,
Pete


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: mismatch_cnt > 0 during initial sync?
  2017-06-19 21:54     ` Peter Sangas
@ 2017-06-20  4:14       ` NeilBrown
  2017-06-20 20:23         ` Peter Sangas
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2017-06-20  4:14 UTC (permalink / raw)
  To: Peter Sangas, 'Stephane Thiell', linux-raid

[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]

On Mon, Jun 19 2017, Peter Sangas wrote:

>> From: NeilBrown [mailto:neilb@suse.com]
>> Sent: Sunday, June 18, 2017 2:35 PM
>> Subject: Re: mismatch_cnt > 0 during initial sync?
>> 
>> 
>> From the perspective of md, the initial sync is no different from any
> other sync.  It
>> will count the number of mismatches that it finds and fixes.
>> 
>
> Should a sync always fix a mismatch it encounters?   I have a RAID1 with 3
> disks.  Sometimes I need to replace one disk and after adding a replacement
> disk syslog indicates  "RebuildFinished event detected on md device
> /dev/md/2, component device  mismatches found: 256 (on raid level 1)" but
> says nothing about fixing it.

No, it wouldn't say anything about fixing things.  That is assumed.

>
> cat /sys/block/md2/md/last_sync_action
> recovery

This is a recovery, not a resync.  They are different.

Recovery is when you add a device to an array, and the data that should
be there is recovered from elsewhere.

Resync is when the redundancy in the array might be compromised, so it
is repaired, possibly by read-check-maybe_write.  Possibly by
read-write.

For raid1, recovery and resync and require similar.  For raid5 they are
very different.

The mismatch count is only reset when a resync starts, not when a
recovery (or reshape) starts.  So mdadm shouldn't really report the
mismatches when the recovery finishes.  The number is left over from the
most recent resync.

raid1 only counts when a resync is requested, either by writing "check"
or "repair" to the sync_action file in sysfs.  An automatic resync after
and unclean shutdown (or when array is started) just copies blocks
without checking, so it has nothing to count.
"repair" repairs any inconsistencies found, "check" doesn't.  Both count
inconsistencies.

raid5/raid6 counts for resync, check, and repair (but not for recover or
reshape).
By default, when the array is created, raid5 performs a recovery,
nominating one of the devices to be the "spare" with the others assumed
to have "correct" data.  This is faster than assuming they are all
"correct", and performing a resync.
RAID6 (the array used in the original question) does resync, rather than
recovery, on initial creation - because 2-drive recovery is/was thought
to have performance issues.

So: mdadm should be modified to not report "mismatches found" is
"last_sync_action" is recovery or reshape.

Thanks,
NeilBrown

>
> mdadm -V
> mdadm - v3.3 - 3rd September 2013
>
>
> Thank you,
> Pete

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: mismatch_cnt > 0 during initial sync?
  2017-06-20  4:14       ` NeilBrown
@ 2017-06-20 20:23         ` Peter Sangas
  2017-06-20 21:26           ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Sangas @ 2017-06-20 20:23 UTC (permalink / raw)
  To: 'NeilBrown', 'Stephane Thiell', linux-raid

> From: NeilBrown [mailto:neilb@suse.com]
> Sent: Monday, June 19, 2017 9:15 PM
> Subject: RE: mismatch_cnt > 0 during initial sync?
> 
> This is a recovery, not a resync.  They are different.
 
Thank you for pointing this out and the nice explanation.    

From reading this list and other sources issuing a repair command as
follows:

"echo repair > /sys/block/md2/md/sync_action"

might be ill-advised since it's luck of the draw whether or not the
operation gets the right data instead of the bad data.

Is this correct? 

For reference I have a RAID1 with 3 disks.  mdadm - v3.3 - 3rd September
2013

Thank you,
Pete


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: mismatch_cnt > 0 during initial sync?
  2017-06-20 20:23         ` Peter Sangas
@ 2017-06-20 21:26           ` NeilBrown
  0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2017-06-20 21:26 UTC (permalink / raw)
  To: Peter Sangas, 'Stephane Thiell', linux-raid

[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]

On Tue, Jun 20 2017, Peter Sangas wrote:

>> From: NeilBrown [mailto:neilb@suse.com]
>> Sent: Monday, June 19, 2017 9:15 PM
>> Subject: RE: mismatch_cnt > 0 during initial sync?
>> 
>> This is a recovery, not a resync.  They are different.
>  
> Thank you for pointing this out and the nice explanation.    
>
> From reading this list and other sources issuing a repair command as
> follows:
>
> "echo repair > /sys/block/md2/md/sync_action"
>
> might be ill-advised since it's luck of the draw whether or not the
> operation gets the right data instead of the bad data.
>
> Is this correct? 

How do you define "right data" and "bad data"?
What is your threat-model which explains how the two blocks are
different?

In most likely cases, neither version if the data is more correct than
the other.  In some others, your hardware is broken and needs to be
replaced.

If you have no understanding of why there might be a difference, then
issuing a repair is not likely to be worse than not issuing a repair.
If you do have that understanding, then use that to make your decision.

NeilBrown

>
> For reference I have a RAID1 with 3 disks.  mdadm - v3.3 - 3rd September
> 2013
>
> Thank you,
> Pete

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-06-20 21:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-24 22:01 mismatch_cnt > 0 during initial sync? Stephane Thiell
2017-04-25  5:14 ` Stephane Thiell
2017-06-16 17:28   ` Peter Sangas
2017-06-18 21:34   ` NeilBrown
2017-06-19 21:54     ` Peter Sangas
2017-06-20  4:14       ` NeilBrown
2017-06-20 20:23         ` Peter Sangas
2017-06-20 21:26           ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).