linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Neil Brown <neilb@suse.de>
Cc: Steven Haigh <netwiz@crc.id.au>,
	Bryan Mesich <bryan.mesich@ndsu.edu>,
	Jon@eHardcastle.com, linux-raid@vger.kernel.org
Subject: Re: Why does one get mismatches?
Date: Wed, 24 Feb 2010 09:46:23 -0500	[thread overview]
Message-ID: <4B853BBF.7000607@tmr.com> (raw)
In-Reply-To: <20100218100547.7aecdc34@notabene.brown>

Neil Brown wrote:
> On Wed, 17 Feb 2010 08:38:11 +1100
> Steven Haigh <netwiz@crc.id.au> wrote:
>
>   
>> On Tue, 16 Feb 2010 16:25:25 -0500, Bill Davidsen <davidsen@tmr.com>
>> wrote:
>>     
>>> Bryan Mesich wrote:
>>>       
>>>> On Thu, Feb 11, 2010 at 04:14:44PM +1100, Neil Brown wrote:
>>>>   
>>>>         
>>>>>> This whole discussion simply shows that for RAID-1 software RAID is
>>>>>> less
>>>>>> reliable than hardware RAID (no, I don't mean fake-RAID), because it 
>>>>>> doesn't pin the data buffer until all copies are written.
>>>>>>       
>>>>>>             
>>>>> That doesn't make it less reliable.  It just makes it more confusing.
>>>>>     
>>>>>           
>>>> I agree that linux software RAID is no less reliable than
>>>> hardware RAID with regards to the above conversation.  It's
>>>> however confusing to have a counter that indicates there are
>>>> problems with a RAID 1 array when in fact there is not.
>>>>   
>>>>         
>>> Sorry, but real hardware raid is more reliable than software raid, and 
>>> Neil's justification for not doing smart recovery mentions it. Note this
>>>       
>>> referes to real hardware raid, not fakeraid which is just some firmware 
>>> in a BIOS to use the existing hardware.
>>>
>>> The issue lies with data changing between write to multiple drives. In 
>>> hardware raid the data traverses the memory bus once, only once, and 
>>> goes into cache in the controller, from which it is written to all 
>>> mirrored drives. With software raid an individual write is done to each 
>>> drive, and if the data in the buffer changes between writes to one drive
>>>       
>>> or the other you get different values. Neil may be convinced that the OS
>>>       
>>> somehow "knows" which of the mirror copies is correct, ie. most recent, 
>>> and never uses the stale data, but if that information was really 
>>> available reads would always return the latest value and it wouldn't be 
>>> possible to read the same file multiple times and get different MD5sums.
>>>       
>>> It would also be possible to do a stable smart recovery by propagating 
>>> the most recent copy to the other mirror drives.
>>>
>>> I hoped that mounting data=journal would lead to consistency, that seems
>>>       
>>> not to be true either.
>>>       
>> I agree Bill, there is an issue with the software RAID1 when it comes down
>> to some hardware. I have one machine where the ONLY way to stop the root
>> filesystem going readonly due to journal issues is to remove RAID. Having
>> RAID1 enabled gives silent corruption of both data and the journal at
>> seemingly random times.
>>
>> I can see the data corruption from running a verify between RPM and data
>> on the drive. Reinstalling these packages fixes things - until something
>> random things get corrupted next time.
>>     
>
> Sounds very much like dodgy drives.
>
>   
>> The myth that data corruption in RAID1 ONLY happens to swap and/or unused
>> space on a drive is absolute rubbish.
>>
>>     
>
> Absolute rubbish does seem to be a suitable phrase here.
> There is no question of data corruption.
> When memory changes between being written to one device and to another, this
> does not cause corruption, only inconsistency.   Either the block will be
> written again consistently soon, or it will never be read.
>   

Just what is it that rewrites the data block? The user program doesn't 
know it's needed, the filesystem, if any, doesn't know it's needed, and 
as far as I can tell md doesn't do checksum before issuing the write and 
after the last write is done. Doesn't make a copy and write from that. 
So what sees that the data has changed and rewrites it?

> If the host crashes before the blocks are made consistent, then the 
> inconsistency will not be visible as the resync will fix it.
>
> If you are getting any corruption, then it is NOT due to this facet of the
> RAID1 implementation - it due to something else.
> My guess is bad hardware - anywhere from memory to hard drive.
>   

Having switched an array from three way raid-1 to raid-6, using the same 
kernel, utilities, and hardware, I can speak to that. When I first 
started to run checks, I took the array offline to do repair, and 
usually saw ~12k mismatches by the end of a week. After changing the 
array to raid-6 I never had a mismatch again. Therefore, while hardware 
clearly can be a factor, it is unlikely to be the cause of all mismatch 
events.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


  parent reply	other threads:[~2010-02-24 14:46 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-20 11:52 Fw: Why does one get mismatches? Jon Hardcastle
2010-01-22 18:13 ` Goswin von Brederlow
2010-01-24 17:40   ` Jon Hardcastle
2010-01-24 21:52     ` Roger Heflin
2010-01-24 23:13     ` Goswin von Brederlow
2010-01-25 10:07       ` Jon Hardcastle
2010-01-25 10:37         ` Goswin von Brederlow
2010-01-25 10:52           ` Jon Hardcastle
2010-01-25 17:32             ` Goswin von Brederlow
2010-01-25 19:32             ` Iustin Pop
2010-02-01 21:18 ` Bill Davidsen
2010-02-01 22:37   ` Neil Brown
2010-02-02 15:11     ` Bill Davidsen
2010-02-03 11:17       ` Goswin von Brederlow
2010-02-11  5:14       ` Neil Brown
2010-02-11 17:51         ` Bryan Mesich
2010-02-16 21:25           ` Bill Davidsen
2010-02-16 21:38             ` Steven Haigh
2010-02-17  3:19               ` Bryan Mesich
2010-02-17 23:05               ` Neil Brown
2010-02-19 15:18                 ` Piergiorgio Sartor
2010-02-19 22:02                   ` Neil Brown
2010-02-19 22:37                     ` Piergiorgio Sartor
2010-02-19 23:34                     ` Asdo
2010-02-20  4:27                       ` Goswin von Brederlow
2010-02-20 11:12                         ` Asdo
2010-02-21 11:13                           ` Goswin von Brederlow
     [not found]                             ` <8754A21825504719B463AD9809E54349@m5>
     [not found]                               ` <20100221194400.GA2570@lazy.lzy>
2010-02-22 13:01                                 ` Asdo
2010-02-22 13:30                                   ` Piergiorgio Sartor
2010-02-22 13:44                                   ` Piergiorgio Sartor
2010-02-24 19:42                               ` Bill Davidsen
2010-02-20  4:23                     ` Goswin von Brederlow
2010-02-24 14:54                     ` Bill Davidsen
2010-02-24 21:37                       ` Neil Brown
2010-02-26 20:48                         ` Bill Davidsen
2010-02-26 21:09                           ` Neil Brown
2010-02-26 22:01                             ` Piergiorgio Sartor
2010-02-26 22:15                             ` Bill Davidsen
2010-02-26 22:21                               ` Piergiorgio Sartor
2010-02-26 22:20                             ` Asdo
2010-02-27  6:01                               ` Michael Evans
2010-02-28  0:01                                 ` Bill Davidsen
2010-02-24 14:46                 ` Bill Davidsen [this message]
2010-02-24 16:12                   ` Martin K. Petersen
2010-02-24 18:51                     ` Piergiorgio Sartor
2010-02-24 22:21                       ` Neil Brown
2010-02-25  8:41                         ` Piergiorgio Sartor
2010-03-02  4:57                           ` Neil Brown
2010-03-02 18:49                             ` Piergiorgio Sartor
2010-02-24 21:39                     ` Neil Brown
     [not found]                       ` <4B8640A2.4060307@shiftmail.org>
2010-02-25 10:41                         ` Neil Brown
2010-02-28  8:09                       ` Luca Berra
2010-03-02  5:01                         ` Neil Brown
2010-03-02  7:36                           ` Luca Berra
2010-03-02 10:04                             ` Michael Evans
2010-03-02 11:02                               ` Luca Berra
2010-03-02 12:13                                 ` Michael Evans
2010-03-02 18:14                                 ` Asdo
2010-03-02 18:52                                   ` Piergiorgio Sartor
2010-03-02 23:27                                     ` Asdo
2010-03-03  9:13                                       ` Piergiorgio Sartor
2010-03-03 11:42                                         ` Asdo
2010-03-03 12:03                                           ` Piergiorgio Sartor
2010-03-02 20:17                                   ` Neil Brown
2010-02-24 21:32                   ` Neil Brown
2010-02-25  7:22                     ` Goswin von Brederlow
2010-02-25  7:39                       ` Neil Brown
2010-02-25  8:47                     ` John Robinson
2010-02-25  9:07                       ` Neil Brown
2010-02-11 18:12         ` Piergiorgio Sartor
  -- strict thread matches above, loose matches on Subject: below --
2010-02-01 23:14 Jon Hardcastle
2010-01-25 20:43 greg
2010-01-25 22:49 ` Steven Haigh
2010-01-27 21:54   ` Tirumala Reddy Marri
2010-01-28  9:16     ` Jon Hardcastle
2010-01-28 10:29       ` Asdo
2010-01-28 17:20     ` Tirumala Reddy Marri
2010-01-28 18:23       ` Goswin von Brederlow
2010-01-28 19:03         ` Tirumala Reddy Marri
2010-01-28 20:24           ` Goswin von Brederlow
2010-01-29 15:37             ` Jon Hardcastle
2010-01-29 23:52               ` Goswin von Brederlow
2010-01-30 10:39                 ` Jon Hardcastle
2010-02-01 21:10               ` Bill Davidsen
2010-01-20 15:03 Jon Hardcastle
2010-01-20 15:34 ` Brett Russ
2010-01-20 20:44   ` Majed B.
2010-01-20 22:25     ` Brett Russ
2010-01-20 22:30       ` Majed B.
2010-01-20 22:43         ` Brett Russ
2010-01-20 23:01           ` Christopher Chen
2010-01-21  4:17           ` Steven Haigh
2010-01-21  8:08             ` Asdo
2010-01-21 10:52               ` Steven Haigh
2010-01-21 11:48                 ` Farkas Levente
2010-01-21 12:15                   ` Jon Hardcastle
2010-01-19 10:04 Jon Hardcastle
2010-01-20 14:19 ` Brett Russ
2010-01-20 14:34   ` Jon Hardcastle
2010-01-20 14:46     ` Brett Russ
2010-02-01 20:48       ` Bill Davidsen
2010-01-22 16:22   ` Jon Hardcastle
2010-01-22 16:34     ` Asdo
2010-01-22 17:41     ` Brett Russ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B853BBF.7000607@tmr.com \
    --to=davidsen@tmr.com \
    --cc=Jon@eHardcastle.com \
    --cc=bryan.mesich@ndsu.edu \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=netwiz@crc.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).