linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mismatch_cnt worries
@ 2007-04-02 14:45 Gavin McCullagh
  2007-04-03  0:00 ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Gavin McCullagh @ 2007-04-02 14:45 UTC (permalink / raw)
  To: Linux RAID Mailing List

Hi,

I've relatively recently started using md having had some bad experiences
with hardware raid controllers.  I've had some really good experiences
(stepwise upgrading a 800GB raid5 array to 1.5TB one by exchanging disks
and using mdadm --grow), but am in the middle of a more worrying one.  I have
read previous recent threads about mismatch_cnt and am a little unclear as yet
how to interpret this. I'm seeing this issue on a couple of machines, but I'll
just use talk about one for now.

I ran a check on the three RAID1 arrays in a machine I'm managing.  The check
finished without error.  I then had a look at the mismatch_cnt and one of them
is non-zero (128), specifically the one which holds the root filesystem.

The Gentoo Wiki on the subject seems to be moreorless saying I need to
format the partition to be sure of anything.  Needless to say that's not
desirable.

Stupidly, I have not been running Smart until now but I have installed and
configured it now and run long and short tests manually.  The most interesting
part of the smartctl output on the disks is below but only ECC fast errors are
shown.

All of the event logs look like this, so I guess there's only partial support
for Smart:

  Error event 19:
    :Sense Key  06h Unit Attention  :Add Sense Code 29h :Add Sense Code Qualif  02h :Hardware Status  00h :CCHSS Valid   :CC  ffffh :H No.  00h :SS No. 00

Neil's post here suggests either this is all normal or I'm seriously up the
creek.
	http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07349.html

My questions:

1. Should I be worried or is this normal?  If so can you explain why the
   number is non-zero?
2. Should I repair, fsck, replace a disk, something else?
3. Can someone explain how this quote can be true:
       "Though it is less likely, a regular filesystem could still (I think)
        genuinely write different data to difference devices in a raid1/10."
   when I thought the point of RAID1 was that the data should be the same on
   both disks.

Many thanks for any help/comfort,

Gavin

SDA:
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:    8878773        0         0   8878773          0        437.620           0
write:         0        0         0         0          0        277.228           0

SDB:
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:    5077782        0         0   5077782          0        455.871           0
write:         0        0         0         0          0        263.680           0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mismatch_cnt worries
  2007-04-02 14:45 mismatch_cnt worries Gavin McCullagh
@ 2007-04-03  0:00 ` Neil Brown
  2007-04-03  8:16   ` Gavin McCullagh
  2007-04-04 22:46   ` Bill Davidsen
  0 siblings, 2 replies; 4+ messages in thread
From: Neil Brown @ 2007-04-03  0:00 UTC (permalink / raw)
  To: Gavin McCullagh; +Cc: Linux RAID Mailing List

On Monday April 2, gmccullagh@gmail.com wrote:
> 
> Neil's post here suggests either this is all normal or I'm seriously up the
> creek.
> 	http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07349.html
> 
> My questions:
> 
> 1. Should I be worried or is this normal?  If so can you explain why the
>    number is non-zero?

Probably not too worried.
Is it normal?  I'm not really sure what 'normal' is.  I'm beginning to
think that it is 'normal' to get strange errors from disk drives, by
maybe I have a jaded perspective.
If you have a swap-partition or a swap-file on the device then you
should consider it normal.  If not, then it is much less likely but
still possible.

> 2. Should I repair, fsck, replace a disk, something else?

'repair' is probably a good idea.
'fsck' certainly wouldn't hurt and might show something, though I
suspect it will find the filesystem to be structurally sound.
I wouldn't replace the disk on the basis on a single difference report
from mismatch_cnt.  I don't know what the SMART message means so I
don't know if that suggests that the drive needs to be replaced.

> 3. Can someone explain how this quote can be true:
>        "Though it is less likely, a regular filesystem could still (I think)
>         genuinely write different data to difference devices in a raid1/10."
>    when I thought the point of RAID1 was that the data should be the same on
>    both disks.

Suppose I memory-map a file and often modify the mapped memory.
The system will at some point decide to write that block of the file
to the device.  It will send a request to raid1, which will send one
request each to two different devices.  They will each DMA the data
out of that memory to the controller at different times so they could
quite possibly get different data (if I changed the mapped memory
between those two DMA request).  So the data on the two drives in a
mirror can easily be different.  If a 'check' happens at exactly this
time it will notice.
Normally that block will be written out again (as it is still 'dirty')
and again and again if necessary as long as I keep writing to the
memory.  Once I stop writing to the memory (e.g. close the file,
unmount the filesystem) a final write will be made with the same data
going to both devices.  During this time we will never read that block
from the filesystem, so the filesystem will never be able to see any
difference between the two devices in a raid1.

So: if you are actively writing to a file while 'check' is running on
a raid1, it could show up as a difference in mismatch_cnt.  But you
have to get the timing just right (or wrong).

I think it is possible in the above scenario to truncate the file
while a write is underway but with new data in memory.  If you do
this, the system might not write out that last 'new' data, so the last
write to the particular block on storage may have written different
data to the two different drives, and this difference will not be
corrected by the filesystem e.g on unmount.  Note that the inconsistent
data will never be read by the filesystem (the file has been
truncated, remember) so there is no risk of data corruption.
In this case the difference could remain for some time until later
when a 'check' or 'repair' notices it.

Does that help explain the above quote?

It is still the case that:
  filesystem corruption won't happen in normal operation
  a small mismatch_cnt does not necessarily imply a problem.

NeilBrown

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mismatch_cnt worries
  2007-04-03  0:00 ` Neil Brown
@ 2007-04-03  8:16   ` Gavin McCullagh
  2007-04-04 22:46   ` Bill Davidsen
  1 sibling, 0 replies; 4+ messages in thread
From: Gavin McCullagh @ 2007-04-03  8:16 UTC (permalink / raw)
  To: Linux RAID Mailing List

Hi,

thanks for the reply.

On Tue, 03 Apr 2007, Neil Brown wrote:

> If you have a swap-partition or a swap-file on the device then you
> should consider it normal.  If not, then it is much less likely but
> still possible.

I see it on two machines' ext3 root filesystems.

> > 2. Should I repair, fsck, replace a disk, something else?
> 
> 'repair' is probably a good idea.

I ran 'repair', then 'check' and the count is still 128.  However, I'm
running 2.6.17 on ubuntu edgy (from October) so I'm guessing 'repair' is
still equivalent to check as you said here.

http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07269.html

> 'fsck' certainly wouldn't hurt and might show something, though I
> suspect it will find the filesystem to be structurally sound.

You're probably right.

> Suppose I memory-map a file and often modify the mapped memory.
> The system will at some point decide to write that block of the file
> to the device.  It will send a request to raid1, which will send one
> request each to two different devices.  They will each DMA the data
> out of that memory to the controller at different times so they could
> quite possibly get different data (if I changed the mapped memory
> between those two DMA request).  So the data on the two drives in a
> mirror can easily be different.  If a 'check' happens at exactly this
> time it will notice.
>
> So: if you are actively writing to a file while 'check' is running on
> a raid1, it could show up as a difference in mismatch_cnt.  But you
> have to get the timing just right (or wrong).

I presume then that if you run 'repair' all writes are flushed.  Just
thinking that in RAID1 where two blocks differ, one block gets chosen
arbitrarily as the correct one and the other gets overwritten.  Or should
'repair' ideally be run with the filesystem read-only?

> I think it is possible in the above scenario to truncate the file
> while a write is underway but with new data in memory.  .....

> Does that help explain the above quote?

Yes, thanks.

> It is still the case that:
>   filesystem corruption won't happen in normal operation
>   a small mismatch_cnt does not necessarily imply a problem.

Many thanks again for a very informative reply,

Gavin


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mismatch_cnt worries
  2007-04-03  0:00 ` Neil Brown
  2007-04-03  8:16   ` Gavin McCullagh
@ 2007-04-04 22:46   ` Bill Davidsen
  1 sibling, 0 replies; 4+ messages in thread
From: Bill Davidsen @ 2007-04-04 22:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: Gavin McCullagh, Linux RAID Mailing List

Neil Brown wrote:
> On Monday April 2, gmccullagh@gmail.com wrote:
>   
>> Neil's post here suggests either this is all normal or I'm seriously up the
>> creek.
>> 	http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07349.html
>>
>> My questions:
>>
>> 1. Should I be worried or is this normal?  If so can you explain why the
>>    number is non-zero?
>>     
>
> Probably not too worried.
> Is it normal?  I'm not really sure what 'normal' is.  I'm beginning to
> think that it is 'normal' to get strange errors from disk drives, by
> maybe I have a jaded perspective.
> If you have a swap-partition or a swap-file on the device then you
> should consider it normal.  If not, then it is much less likely but
> still possible.
>
>   
>> 2. Should I repair, fsck, replace a disk, something else?
>>     
>
> 'repair' is probably a good idea.
> 'fsck' certainly wouldn't hurt and might show something, though I
> suspect it will find the filesystem to be structurally sound.
> I wouldn't replace the disk on the basis on a single difference report
> from mismatch_cnt.  I don't know what the SMART message means so I
> don't know if that suggests that the drive needs to be replaced.
>
>   
>> 3. Can someone explain how this quote can be true:
>>        "Though it is less likely, a regular filesystem could still (I think)
>>         genuinely write different data to difference devices in a raid1/10."
>>    when I thought the point of RAID1 was that the data should be the same on
>>    both disks.
>>     
>
> Suppose I memory-map a file and often modify the mapped memory.
> The system will at some point decide to write that block of the file
> to the device.  It will send a request to raid1, which will send one
> request each to two different devices.  They will each DMA the data
> out of that memory to the controller at different times so they could
> quite possibly get different data (if I changed the mapped memory
> between those two DMA request).  So the data on the two drives in a
> mirror can easily be different.  If a 'check' happens at exactly this
> time it will notice.
> Normally that block will be written out again (as it is still 'dirty')
> and again and again if necessary as long as I keep writing to the
> memory.  Once I stop writing to the memory (e.g. close the file,
> unmount the filesystem) a final write will be made with the same data
> going to both devices.  During this time we will never read that block
> from the filesystem, so the filesystem will never be able to see any
> difference between the two devices in a raid1.
>
> So: if you are actively writing to a file while 'check' is running on
> a raid1, it could show up as a difference in mismatch_cnt.  But you
> have to get the timing just right (or wrong).
>
> I think it is possible in the above scenario to truncate the file
> while a write is underway but with new data in memory.  If you do
> this, the system might not write out that last 'new' data, so the last
> write to the particular block on storage may have written different
> data to the two different drives, and this difference will not be
> corrected by the filesystem e.g on unmount.  Note that the inconsistent
> data will never be read by the filesystem (the file has been
> truncated, remember) so there is no risk of data corruption.
> In this case the difference could remain for some time until later
> when a 'check' or 'repair' notices it.
>   

Some time ago I suggested that marking a block in memory copy on write 
(COW) would allow preserving a coherent block to write. You noted that 
it was harder than it sounds, and I never thought it sounded easy, due 
to issues with multiple processes or threads modifying the data.

But I do have another thought, which might be more useful, if not easier 
to implement. In the case of a repair, you really don't want to guess 
wrong which copy is the most recent. When a mismatch is detected, would 
it be feasible to either scan for a dirty block which is waiting to be 
written to that location, or just sync and check again? The performance 
hit might be considerable, but (a) running check on a busy system is 
already a serious hit, and (b) it would only happen when a problem was 
detected.

Does any of that sound useful?
> Does that help explain the above quote?
>
> It is still the case that:
>   filesystem corruption won't happen in normal operation
>   a small mismatch_cnt does not necessarily imply a problem.
>   

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-04-04 22:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-02 14:45 mismatch_cnt worries Gavin McCullagh
2007-04-03  0:00 ` Neil Brown
2007-04-03  8:16   ` Gavin McCullagh
2007-04-04 22:46   ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).