From: Bill Davidsen <davidsen@tmr.com>
To: Neil Brown <neilb@suse.de>
Cc: Gavin McCullagh <gmccullagh@gmail.com>,
Linux RAID Mailing List <linux-raid@vger.kernel.org>
Subject: Re: mismatch_cnt worries
Date: Wed, 04 Apr 2007 18:46:00 -0400 [thread overview]
Message-ID: <46142AA8.2020104@tmr.com> (raw)
In-Reply-To: <17937.39220.736583.474597@notabene.brown>
Neil Brown wrote:
> On Monday April 2, gmccullagh@gmail.com wrote:
>
>> Neil's post here suggests either this is all normal or I'm seriously up the
>> creek.
>> http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07349.html
>>
>> My questions:
>>
>> 1. Should I be worried or is this normal? If so can you explain why the
>> number is non-zero?
>>
>
> Probably not too worried.
> Is it normal? I'm not really sure what 'normal' is. I'm beginning to
> think that it is 'normal' to get strange errors from disk drives, by
> maybe I have a jaded perspective.
> If you have a swap-partition or a swap-file on the device then you
> should consider it normal. If not, then it is much less likely but
> still possible.
>
>
>> 2. Should I repair, fsck, replace a disk, something else?
>>
>
> 'repair' is probably a good idea.
> 'fsck' certainly wouldn't hurt and might show something, though I
> suspect it will find the filesystem to be structurally sound.
> I wouldn't replace the disk on the basis on a single difference report
> from mismatch_cnt. I don't know what the SMART message means so I
> don't know if that suggests that the drive needs to be replaced.
>
>
>> 3. Can someone explain how this quote can be true:
>> "Though it is less likely, a regular filesystem could still (I think)
>> genuinely write different data to difference devices in a raid1/10."
>> when I thought the point of RAID1 was that the data should be the same on
>> both disks.
>>
>
> Suppose I memory-map a file and often modify the mapped memory.
> The system will at some point decide to write that block of the file
> to the device. It will send a request to raid1, which will send one
> request each to two different devices. They will each DMA the data
> out of that memory to the controller at different times so they could
> quite possibly get different data (if I changed the mapped memory
> between those two DMA request). So the data on the two drives in a
> mirror can easily be different. If a 'check' happens at exactly this
> time it will notice.
> Normally that block will be written out again (as it is still 'dirty')
> and again and again if necessary as long as I keep writing to the
> memory. Once I stop writing to the memory (e.g. close the file,
> unmount the filesystem) a final write will be made with the same data
> going to both devices. During this time we will never read that block
> from the filesystem, so the filesystem will never be able to see any
> difference between the two devices in a raid1.
>
> So: if you are actively writing to a file while 'check' is running on
> a raid1, it could show up as a difference in mismatch_cnt. But you
> have to get the timing just right (or wrong).
>
> I think it is possible in the above scenario to truncate the file
> while a write is underway but with new data in memory. If you do
> this, the system might not write out that last 'new' data, so the last
> write to the particular block on storage may have written different
> data to the two different drives, and this difference will not be
> corrected by the filesystem e.g on unmount. Note that the inconsistent
> data will never be read by the filesystem (the file has been
> truncated, remember) so there is no risk of data corruption.
> In this case the difference could remain for some time until later
> when a 'check' or 'repair' notices it.
>
Some time ago I suggested that marking a block in memory copy on write
(COW) would allow preserving a coherent block to write. You noted that
it was harder than it sounds, and I never thought it sounded easy, due
to issues with multiple processes or threads modifying the data.
But I do have another thought, which might be more useful, if not easier
to implement. In the case of a repair, you really don't want to guess
wrong which copy is the most recent. When a mismatch is detected, would
it be feasible to either scan for a dirty block which is waiting to be
written to that location, or just sync and check again? The performance
hit might be considerable, but (a) running check on a busy system is
already a serious hit, and (b) it would only happen when a problem was
detected.
Does any of that sound useful?
> Does that help explain the above quote?
>
> It is still the case that:
> filesystem corruption won't happen in normal operation
> a small mismatch_cnt does not necessarily imply a problem.
>
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
prev parent reply other threads:[~2007-04-04 22:46 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-02 14:45 mismatch_cnt worries Gavin McCullagh
2007-04-03 0:00 ` Neil Brown
2007-04-03 8:16 ` Gavin McCullagh
2007-04-04 22:46 ` Bill Davidsen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46142AA8.2020104@tmr.com \
--to=davidsen@tmr.com \
--cc=gmccullagh@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.