* mismatch_cnt and Raid6
@ 2011-04-21 13:00 Andrew Falgout
2011-04-21 13:14 ` NeilBrown
2011-04-21 13:45 ` Roman Mamedov
0 siblings, 2 replies; 5+ messages in thread
From: Andrew Falgout @ 2011-04-21 13:00 UTC (permalink / raw)
To: linux-raid
I got an error last week from a new raid6 array about a mismatch_cnt. I
did some reading online, performed a repair action on the array,
performed a check action, and checked for the mismatch_cnt again. The
number was greatly reduced, but it was still there. According to mdadm,
everything appears to be working fine. All the drives are passing short
tests on smartctl.
What is mismatch_cnt really? Should I even be concerned about this?
The array is giving me 25-30MB/sec performance on an sshfs mount over
the network. With a local copy I can see speeds of 50 to 60MB/sec.
Thanks,
Andrew Falgout
-----> inserting boring details here <-----------
==> checked for mismatch_cnt
cat /sys/block/md1/md/mismatch_cnt
7752
==> performed the repair this way:
echo "repair" >/sys/block/md1/md/sync_action
==> performed the check this way:
echo "check" >/sys/block/md1/md/sync_action
==> Array Details
Array Details as follows:
/dev/md1:
Version : 1.2
Creation Time : Fri Mar 25 20:16:33 2011
Raid Level : raid6
Array Size : 7814047744 (7452.06 GiB 8001.58 GB)
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Apr 21 07:35:08 2011
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : nas.falgout.lan:1 (local to host nas.falgout.lan)
UUID : df03c833:90129ffd:123abdca:6b30c319
Events : 75072
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 81 1 active sync /dev/sdf1
2 8 97 2 active sync /dev/sdg1
3 8 113 3 active sync /dev/sdh1
4 8 129 4 active sync /dev/sdi1
5 8 145 5 active sync /dev/sdj1
Disk Details:
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : df03c833:90129ffd:123abdca:6b30c319
Name : nas.falgout.lan:1 (local to host nas.falgout.lan)
Creation Time : Fri Mar 25 20:16:33 2011
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 15628095488 (7452.06 GiB 8001.58 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 6735ec05:80890fa1:86e03b80:1f5de847
Internal Bitmap : 8 sectors from superblock
Update Time : Thu Apr 21 07:45:17 2011
Checksum : 99fdeb3f - correct
Events : 75072
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : df03c833:90129ffd:123abdca:6b30c319
Name : nas.falgout.lan:1 (local to host nas.falgout.lan)
Creation Time : Fri Mar 25 20:16:33 2011
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 15628095488 (7452.06 GiB 8001.58 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : e18ce851:efc426b3:cadca5f9:df898fa7
Internal Bitmap : 8 sectors from superblock
Update Time : Thu Apr 21 07:45:17 2011
Checksum : d122a72e - correct
Events : 75072
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : df03c833:90129ffd:123abdca:6b30c319
Name : nas.falgout.lan:1 (local to host nas.falgout.lan)
Creation Time : Fri Mar 25 20:16:33 2011
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 15628095488 (7452.06 GiB 8001.58 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 4cff4ab7:8e7c4312:1445828a:f0be7e89
Internal Bitmap : 8 sectors from superblock
Update Time : Thu Apr 21 07:45:17 2011
Checksum : 86d6e94 - correct
Events : 75072
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : df03c833:90129ffd:123abdca:6b30c319
Name : nas.falgout.lan:1 (local to host nas.falgout.lan)
Creation Time : Fri Mar 25 20:16:33 2011
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 15628095488 (7452.06 GiB 8001.58 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 05a949c4:ab94a3f4:ae25fe55:8034dbfb
Internal Bitmap : 8 sectors from superblock
Update Time : Thu Apr 21 07:45:17 2011
Checksum : 35a48696 - correct
Events : 75072
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : df03c833:90129ffd:123abdca:6b30c319
Name : nas.falgout.lan:1 (local to host nas.falgout.lan)
Creation Time : Fri Mar 25 20:16:33 2011
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 15628095488 (7452.06 GiB 8001.58 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : ff0881e7:7e22804f:d0b57eb7:60872ff0
Internal Bitmap : 8 sectors from superblock
Update Time : Thu Apr 21 07:45:17 2011
Checksum : 98d5766 - correct
Events : 75072
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : df03c833:90129ffd:123abdca:6b30c319
Name : nas.falgout.lan:1 (local to host nas.falgout.lan)
Creation Time : Fri Mar 25 20:16:33 2011
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 15628095488 (7452.06 GiB 8001.58 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 9a200b64:483f52d0:a3d97d13:39ff95be
Internal Bitmap : 8 sectors from superblock
Update Time : Thu Apr 21 07:45:17 2011
Checksum : 314f2777 - correct
Events : 75072
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAA ('A' == active, '.' == missing)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mismatch_cnt and Raid6
2011-04-21 13:00 mismatch_cnt and Raid6 Andrew Falgout
@ 2011-04-21 13:14 ` NeilBrown
2011-04-21 13:20 ` Andrew Falgout
2011-04-21 13:45 ` Roman Mamedov
1 sibling, 1 reply; 5+ messages in thread
From: NeilBrown @ 2011-04-21 13:14 UTC (permalink / raw)
To: Andrew Falgout; +Cc: linux-raid
On Thu, 21 Apr 2011 08:00:39 -0500 Andrew Falgout <andrew.falgout@gmail.com>
wrote:
> I got an error last week from a new raid6 array about a mismatch_cnt. I
> did some reading online, performed a repair action on the array,
> performed a check action, and checked for the mismatch_cnt again. The
> number was greatly reduced, but it was still there. According to mdadm,
> everything appears to be working fine. All the drives are passing short
> tests on smartctl.
>
> What is mismatch_cnt really? Should I even be concerned about this?
Yes, you should be concerned.
mismatch_cnt is a count of sectors where the parity blocks don't match the
data blocks.
The code doesn't check every sector individually. For raid5/6 it checks 4K
at a time, so divide by 8, and that many 4K blocks are in doubt.
So something if going wrong somewhere.
I would run 'check' a few time and see if the number changes.
If it goes down at all, then it looks like you occasionally get bad reads
from a device.
If it only ever increases, then you are presumably getting bad writes
sometimes.
You could:
- stop the array
- run sha1sum on each member disk, several times.
- if any one disk has an unstable result - check cabling, or replace the disk
- if more than one disk has an unstable result, replace the controller maybe.
- if all results are stable it must be a write-only problem - much harder
to work with.
NeilBrown
> The array is giving me 25-30MB/sec performance on an sshfs mount over
> the network. With a local copy I can see speeds of 50 to 60MB/sec.
>
> Thanks,
> Andrew Falgout
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mismatch_cnt and Raid6
2011-04-21 13:14 ` NeilBrown
@ 2011-04-21 13:20 ` Andrew Falgout
2011-04-21 13:38 ` John Robinson
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Falgout @ 2011-04-21 13:20 UTC (permalink / raw)
Cc: linux-raid
It takes about 23 hours per check if I don't do anything to the array,
so it will be a while before I can get back to you with results. But
thanks for the quick response.
./Andrew
On 4/21/2011 8:14 AM, NeilBrown wrote:
> On Thu, 21 Apr 2011 08:00:39 -0500 Andrew Falgout<andrew.falgout@gmail.com>
> wrote:
>
>> I got an error last week from a new raid6 array about a mismatch_cnt. I
>> did some reading online, performed a repair action on the array,
>> performed a check action, and checked for the mismatch_cnt again. The
>> number was greatly reduced, but it was still there. According to mdadm,
>> everything appears to be working fine. All the drives are passing short
>> tests on smartctl.
>>
>> What is mismatch_cnt really? Should I even be concerned about this?
> Yes, you should be concerned.
> mismatch_cnt is a count of sectors where the parity blocks don't match the
> data blocks.
>
> The code doesn't check every sector individually. For raid5/6 it checks 4K
> at a time, so divide by 8, and that many 4K blocks are in doubt.
>
> So something if going wrong somewhere.
>
> I would run 'check' a few time and see if the number changes.
> If it goes down at all, then it looks like you occasionally get bad reads
> from a device.
> If it only ever increases, then you are presumably getting bad writes
> sometimes.
>
> You could:
> - stop the array
> - run sha1sum on each member disk, several times.
> - if any one disk has an unstable result - check cabling, or replace the disk
> - if more than one disk has an unstable result, replace the controller maybe.
> - if all results are stable it must be a write-only problem - much harder
> to work with.
>
> NeilBrown
>
>
>
>> The array is giving me 25-30MB/sec performance on an sshfs mount over
>> the network. With a local copy I can see speeds of 50 to 60MB/sec.
>>
>> Thanks,
>> Andrew Falgout
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mismatch_cnt and Raid6
2011-04-21 13:20 ` Andrew Falgout
@ 2011-04-21 13:38 ` John Robinson
0 siblings, 0 replies; 5+ messages in thread
From: John Robinson @ 2011-04-21 13:38 UTC (permalink / raw)
To: Andrew Falgout; +Cc: linux-raid
On 21/04/2011 14:20, Andrew Falgout wrote:
> It takes about 23 hours per check if I don't do anything to the array,
> so it will be a while before I can get back to you with results. But
> thanks for the quick response.
It may also be instructive to look at the full output of smartctl from
all your drives - a short test may pass and the overall status be
"PASSED" even on a very sick drive. You should also look in your system
logs for ata errors to see which (if any) drive has given any read errors.
Cheers,
John.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mismatch_cnt and Raid6
2011-04-21 13:00 mismatch_cnt and Raid6 Andrew Falgout
2011-04-21 13:14 ` NeilBrown
@ 2011-04-21 13:45 ` Roman Mamedov
1 sibling, 0 replies; 5+ messages in thread
From: Roman Mamedov @ 2011-04-21 13:45 UTC (permalink / raw)
To: Andrew Falgout; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 798 bytes --]
On Thu, 21 Apr 2011 08:00:39 -0500
Andrew Falgout <andrew.falgout@gmail.com> wrote:
> I got an error last week from a new raid6 array about a mismatch_cnt. I
> did some reading online, performed a repair action on the array,
> performed a check action, and checked for the mismatch_cnt again. The
> number was greatly reduced, but it was still there. According to mdadm,
> everything appears to be working fine. All the drives are passing short
> tests on smartctl.
>
> What is mismatch_cnt really? Should I even be concerned about this?
> The array is giving me 25-30MB/sec performance on an sshfs mount over
> the network. With a local copy I can see speeds of 50 to 60MB/sec.
Hello,
What kind of SATA cards/controllers do you use?
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-04-21 13:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-21 13:00 mismatch_cnt and Raid6 Andrew Falgout
2011-04-21 13:14 ` NeilBrown
2011-04-21 13:20 ` Andrew Falgout
2011-04-21 13:38 ` John Robinson
2011-04-21 13:45 ` Roman Mamedov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).