From: Bill Davidsen <davidsen@tmr.com>
To: Neil Brown <neilb@suse.de>
Cc: Steven Haigh <netwiz@crc.id.au>,
Bryan Mesich <bryan.mesich@ndsu.edu>,
Jon@eHardcastle.com, linux-raid@vger.kernel.org
Subject: Re: Why does one get mismatches?
Date: Wed, 24 Feb 2010 09:46:23 -0500 [thread overview]
Message-ID: <4B853BBF.7000607@tmr.com> (raw)
In-Reply-To: <20100218100547.7aecdc34@notabene.brown>
Neil Brown wrote:
> On Wed, 17 Feb 2010 08:38:11 +1100
> Steven Haigh <netwiz@crc.id.au> wrote:
>
>
>> On Tue, 16 Feb 2010 16:25:25 -0500, Bill Davidsen <davidsen@tmr.com>
>> wrote:
>>
>>> Bryan Mesich wrote:
>>>
>>>> On Thu, Feb 11, 2010 at 04:14:44PM +1100, Neil Brown wrote:
>>>>
>>>>
>>>>>> This whole discussion simply shows that for RAID-1 software RAID is
>>>>>> less
>>>>>> reliable than hardware RAID (no, I don't mean fake-RAID), because it
>>>>>> doesn't pin the data buffer until all copies are written.
>>>>>>
>>>>>>
>>>>> That doesn't make it less reliable. It just makes it more confusing.
>>>>>
>>>>>
>>>> I agree that linux software RAID is no less reliable than
>>>> hardware RAID with regards to the above conversation. It's
>>>> however confusing to have a counter that indicates there are
>>>> problems with a RAID 1 array when in fact there is not.
>>>>
>>>>
>>> Sorry, but real hardware raid is more reliable than software raid, and
>>> Neil's justification for not doing smart recovery mentions it. Note this
>>>
>>> referes to real hardware raid, not fakeraid which is just some firmware
>>> in a BIOS to use the existing hardware.
>>>
>>> The issue lies with data changing between write to multiple drives. In
>>> hardware raid the data traverses the memory bus once, only once, and
>>> goes into cache in the controller, from which it is written to all
>>> mirrored drives. With software raid an individual write is done to each
>>> drive, and if the data in the buffer changes between writes to one drive
>>>
>>> or the other you get different values. Neil may be convinced that the OS
>>>
>>> somehow "knows" which of the mirror copies is correct, ie. most recent,
>>> and never uses the stale data, but if that information was really
>>> available reads would always return the latest value and it wouldn't be
>>> possible to read the same file multiple times and get different MD5sums.
>>>
>>> It would also be possible to do a stable smart recovery by propagating
>>> the most recent copy to the other mirror drives.
>>>
>>> I hoped that mounting data=journal would lead to consistency, that seems
>>>
>>> not to be true either.
>>>
>> I agree Bill, there is an issue with the software RAID1 when it comes down
>> to some hardware. I have one machine where the ONLY way to stop the root
>> filesystem going readonly due to journal issues is to remove RAID. Having
>> RAID1 enabled gives silent corruption of both data and the journal at
>> seemingly random times.
>>
>> I can see the data corruption from running a verify between RPM and data
>> on the drive. Reinstalling these packages fixes things - until something
>> random things get corrupted next time.
>>
>
> Sounds very much like dodgy drives.
>
>
>> The myth that data corruption in RAID1 ONLY happens to swap and/or unused
>> space on a drive is absolute rubbish.
>>
>>
>
> Absolute rubbish does seem to be a suitable phrase here.
> There is no question of data corruption.
> When memory changes between being written to one device and to another, this
> does not cause corruption, only inconsistency. Either the block will be
> written again consistently soon, or it will never be read.
>
Just what is it that rewrites the data block? The user program doesn't
know it's needed, the filesystem, if any, doesn't know it's needed, and
as far as I can tell md doesn't do checksum before issuing the write and
after the last write is done. Doesn't make a copy and write from that.
So what sees that the data has changed and rewrites it?
> If the host crashes before the blocks are made consistent, then the
> inconsistency will not be visible as the resync will fix it.
>
> If you are getting any corruption, then it is NOT due to this facet of the
> RAID1 implementation - it due to something else.
> My guess is bad hardware - anywhere from memory to hard drive.
>
Having switched an array from three way raid-1 to raid-6, using the same
kernel, utilities, and hardware, I can speak to that. When I first
started to run checks, I took the array offline to do repair, and
usually saw ~12k mismatches by the end of a week. After changing the
array to raid-6 I never had a mismatch again. Therefore, while hardware
clearly can be a factor, it is unlikely to be the cause of all mismatch
events.
--
Bill Davidsen <davidsen@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
next prev parent reply other threads:[~2010-02-24 14:46 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-20 11:52 Fw: Why does one get mismatches? Jon Hardcastle
2010-01-22 18:13 ` Goswin von Brederlow
2010-01-24 17:40 ` Jon Hardcastle
2010-01-24 21:52 ` Roger Heflin
2010-01-24 23:13 ` Goswin von Brederlow
2010-01-25 10:07 ` Jon Hardcastle
2010-01-25 10:37 ` Goswin von Brederlow
2010-01-25 10:52 ` Jon Hardcastle
2010-01-25 17:32 ` Goswin von Brederlow
2010-01-25 19:32 ` Iustin Pop
2010-02-01 21:18 ` Bill Davidsen
2010-02-01 22:37 ` Neil Brown
2010-02-02 15:11 ` Bill Davidsen
2010-02-03 11:17 ` Goswin von Brederlow
2010-02-11 5:14 ` Neil Brown
2010-02-11 17:51 ` Bryan Mesich
2010-02-16 21:25 ` Bill Davidsen
2010-02-16 21:38 ` Steven Haigh
2010-02-17 3:19 ` Bryan Mesich
2010-02-17 23:05 ` Neil Brown
2010-02-19 15:18 ` Piergiorgio Sartor
2010-02-19 22:02 ` Neil Brown
2010-02-19 22:37 ` Piergiorgio Sartor
2010-02-19 23:34 ` Asdo
2010-02-20 4:27 ` Goswin von Brederlow
2010-02-20 11:12 ` Asdo
2010-02-21 11:13 ` Goswin von Brederlow
[not found] ` <8754A21825504719B463AD9809E54349@m5>
[not found] ` <20100221194400.GA2570@lazy.lzy>
2010-02-22 13:01 ` Asdo
2010-02-22 13:30 ` Piergiorgio Sartor
2010-02-22 13:44 ` Piergiorgio Sartor
2010-02-24 19:42 ` Bill Davidsen
2010-02-20 4:23 ` Goswin von Brederlow
2010-02-24 14:54 ` Bill Davidsen
2010-02-24 21:37 ` Neil Brown
2010-02-26 20:48 ` Bill Davidsen
2010-02-26 21:09 ` Neil Brown
2010-02-26 22:01 ` Piergiorgio Sartor
2010-02-26 22:15 ` Bill Davidsen
2010-02-26 22:21 ` Piergiorgio Sartor
2010-02-26 22:20 ` Asdo
2010-02-27 6:01 ` Michael Evans
2010-02-28 0:01 ` Bill Davidsen
2010-02-24 14:46 ` Bill Davidsen [this message]
2010-02-24 16:12 ` Martin K. Petersen
2010-02-24 18:51 ` Piergiorgio Sartor
2010-02-24 22:21 ` Neil Brown
2010-02-25 8:41 ` Piergiorgio Sartor
2010-03-02 4:57 ` Neil Brown
2010-03-02 18:49 ` Piergiorgio Sartor
2010-02-24 21:39 ` Neil Brown
[not found] ` <4B8640A2.4060307@shiftmail.org>
2010-02-25 10:41 ` Neil Brown
2010-02-28 8:09 ` Luca Berra
2010-03-02 5:01 ` Neil Brown
2010-03-02 7:36 ` Luca Berra
2010-03-02 10:04 ` Michael Evans
2010-03-02 11:02 ` Luca Berra
2010-03-02 12:13 ` Michael Evans
2010-03-02 18:14 ` Asdo
2010-03-02 18:52 ` Piergiorgio Sartor
2010-03-02 23:27 ` Asdo
2010-03-03 9:13 ` Piergiorgio Sartor
2010-03-03 11:42 ` Asdo
2010-03-03 12:03 ` Piergiorgio Sartor
2010-03-02 20:17 ` Neil Brown
2010-02-24 21:32 ` Neil Brown
2010-02-25 7:22 ` Goswin von Brederlow
2010-02-25 7:39 ` Neil Brown
2010-02-25 8:47 ` John Robinson
2010-02-25 9:07 ` Neil Brown
2010-02-11 18:12 ` Piergiorgio Sartor
-- strict thread matches above, loose matches on Subject: below --
2010-02-01 23:14 Jon Hardcastle
2010-01-25 20:43 greg
2010-01-25 22:49 ` Steven Haigh
2010-01-27 21:54 ` Tirumala Reddy Marri
2010-01-28 9:16 ` Jon Hardcastle
2010-01-28 10:29 ` Asdo
2010-01-28 17:20 ` Tirumala Reddy Marri
2010-01-28 18:23 ` Goswin von Brederlow
2010-01-28 19:03 ` Tirumala Reddy Marri
2010-01-28 20:24 ` Goswin von Brederlow
2010-01-29 15:37 ` Jon Hardcastle
2010-01-29 23:52 ` Goswin von Brederlow
2010-01-30 10:39 ` Jon Hardcastle
2010-02-01 21:10 ` Bill Davidsen
2010-01-20 15:03 Jon Hardcastle
2010-01-20 15:34 ` Brett Russ
2010-01-20 20:44 ` Majed B.
2010-01-20 22:25 ` Brett Russ
2010-01-20 22:30 ` Majed B.
2010-01-20 22:43 ` Brett Russ
2010-01-20 23:01 ` Christopher Chen
2010-01-21 4:17 ` Steven Haigh
2010-01-21 8:08 ` Asdo
2010-01-21 10:52 ` Steven Haigh
2010-01-21 11:48 ` Farkas Levente
2010-01-21 12:15 ` Jon Hardcastle
2010-01-19 10:04 Jon Hardcastle
2010-01-20 14:19 ` Brett Russ
2010-01-20 14:34 ` Jon Hardcastle
2010-01-20 14:46 ` Brett Russ
2010-02-01 20:48 ` Bill Davidsen
2010-01-22 16:22 ` Jon Hardcastle
2010-01-22 16:34 ` Asdo
2010-01-22 17:41 ` Brett Russ
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B853BBF.7000607@tmr.com \
--to=davidsen@tmr.com \
--cc=Jon@eHardcastle.com \
--cc=bryan.mesich@ndsu.edu \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=netwiz@crc.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).