From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Clements Subject: Re: Bug report: mdadm -E oddity Date: Fri, 20 May 2005 12:04:34 -0400 Message-ID: <428E0A92.3060108@steeleye.com> References: <1115999051.3974.14.camel@compaq-rhel4.xsintricity.com> <1116004267.3974.35.camel@compaq-rhel4.xsintricity.com> <17029.12773.197506.463977@cse.unsw.edu.au> <1116077316.13780.52.camel@compaq-rhel4.xsintricity.com> <17037.35615.456231.737766@cse.unsw.edu.au> <1116592212.23785.75.camel@compaq-rhel4.xsintricity.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1116592212.23785.75.camel@compaq-rhel4.xsintricity.com> Sender: linux-raid-owner@vger.kernel.org To: Doug Ledford Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids Hi Doug, Doug Ledford wrote: > On Fri, 2005-05-20 at 17:00 +1000, Neil Brown wrote: >>There is a converse to this. People should be made to take notice if >>there is possible data corruption. >> >>i.e. if you have a system crash while running a degraded raid5, then >>silent data corruption could ensue. mdadm will currently not start >>any array in this state without an explicit '--force'. This is somewhat >>akin to fsck sometime requiring human interaction. Ofcourse if there >>is good reason to believe the data is still safe, mdadm should -- and >>I believe does -- assemble the array even if degraded. > > > Well, as I explained in my email sometime back on the issue of silent > data corruption, this is where journaling saves your ass. Since the > journal has to be written before the filesystem proper updates are > writting, if the array goes down it either is in the journal write, in > which case you are throwing those blocks away anyway and so corruption > is irrelevant, or it's in the filesystem proper writes and if they get > corrupted you don't care because we are going to replay the journal and > rewrite them. I think you may be misunderstanding the nature of the data corruption that ensues when a system with a degraded raid4, raid5, or raid6 array crashes. Data that you aren't even actively writing can get corrupted. For example, say we have a 3 disk raid5 and disk 3 is missing. This means that for some stripes, we'll be writing parity and data: disk1 disk2 {disk3} D1 P {D2} So, say we're in the middle of updating this stripe, and we're writing D1 and P to disk when the system crashes. We may have just corrupted D2, which isn't even active right now. This is because we'll use D1 and P to reconstruct D2 when disk3 (or its replacement) comes back. If we wrote D1 and not P, then when we use D1 and P to reconstruct D2, we'll get the wrong data. Same goes if we wrote P and not D1, or some partial piece of either or both. There's no way for a filesystem journal to protect us from D2 getting corrupted, as far as I know. Note that if we lose the parity disk in a raid4, this type of data corruption isn't possible. Also note that for some stripes in a raid5 or raid6, this type of corruption can't happen (as long as the parity for that stripe is on the missing disk). Also, if you have a non-volatile cache on the array, as most hardware RAIDs do, then this type of data corruption doesn't occur. -- Paul