* --assume-clean on raid5/6 @ 2010-08-06 1:19 brian.foster 2010-08-07 12:28 ` Stefan /*St0fF*/ Hübner 0 siblings, 1 reply; 4+ messages in thread From: brian.foster @ 2010-08-06 1:19 UTC (permalink / raw) To: linux-raid Hi all, I've read in the list archives that use of --assume-clean on raid5 (raid6?) is not safe assuming the member drives are not sync, but it's not clear to me as to why. I can see the content of an written raid5 array change if I fail a drive out of the array (created w/ --assume-clean), but data that I write prior to failing a drive remains intact. Perhaps I'm missing something. Could somebody elaborate on the danger/risk of using --assume-clean? Thanks in advance. Brian ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: --assume-clean on raid5/6 2010-08-06 1:19 --assume-clean on raid5/6 brian.foster @ 2010-08-07 12:28 ` Stefan /*St0fF*/ Hübner 2010-08-08 8:56 ` Neil Brown 0 siblings, 1 reply; 4+ messages in thread From: Stefan /*St0fF*/ Hübner @ 2010-08-07 12:28 UTC (permalink / raw) To: brian.foster; +Cc: linux-raid Hi Brian, --assume-clean skips over the initial resync. Which - if you will create a filesystem after creating the array - is a time-saving idea. But keep in mind: even if the disks are brand new and contain only zeros, the parity would probably look not all zeros. So reading from such an array would be a bad idea. But if the next thing you do is create LVM/filesystem etc., then all bit read from the array will have been written to before (and by that are in sync). Stefan Am 06.08.2010 03:19, schrieb brian.foster@emc.com: > Hi all, > > I've read in the list archives that use of --assume-clean on raid5 > (raid6?) is not safe assuming the member drives are not sync, but it's > not clear to me as to why. I can see the content of an written raid5 > array change if I fail a drive out of the array (created w/ > --assume-clean), but data that I write prior to failing a drive remains > intact. Perhaps I'm missing something. Could somebody elaborate on the > danger/risk of using --assume-clean? Thanks in advance. > > Brian > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: --assume-clean on raid5/6 2010-08-07 12:28 ` Stefan /*St0fF*/ Hübner @ 2010-08-08 8:56 ` Neil Brown 2010-08-08 14:17 ` brian.foster 0 siblings, 1 reply; 4+ messages in thread From: Neil Brown @ 2010-08-08 8:56 UTC (permalink / raw) To: st0ff; +Cc: stefan.huebner, brian.foster, linux-raid On Sat, 07 Aug 2010 14:28:55 +0200 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de> wrote: > Hi Brian, > > --assume-clean skips over the initial resync. Which - if you will > create a filesystem after creating the array - is a time-saving idea. > But keep in mind: even if the disks are brand new and contain only > zeros, the parity would probably look not all zeros. So reading from > such an array would be a bad idea. > But if the next thing you do is create LVM/filesystem etc., then all bit > read from the array will have been written to before (and by that are in > sync). There is an important point that this misses. When md updates a block on a RAID5 it will sometimes use a read-modify-write cycle which reads the old block and old parity, subtracts the old block from the parity block and then added the new block to the parity block. Then it writes the new data block and the new parity block. If the old parity was correct for the old stripe, then the new parity will be correct for the new stripe. But if the old was wrong then the new will be wrong. So if you use assume-clean then the parity may well be wrong and could remain wrong even when you write new data. If you then lose a device, the data for that device will be computed using wrong parity and you will get wrong data - hence data corruption. So you should only use --assume-clean if you know the array really is 'clean'. RAID1/RAID10 cannot suffer from this so --assume-clean is quite safe with those array types. The current implementation of RAID6 never does read-modify-write so --assume-clean is currently safe with RAID6 too. However I do not promise that RAID6 might not change to use read-modify-write cycles in some future implementation. So I would not recommend using --assume-clean on RAID6 just to avoid the resync cost. NeilBrown > > Stefan > > Am 06.08.2010 03:19, schrieb brian.foster@emc.com: > > Hi all, > > > > I've read in the list archives that use of --assume-clean on raid5 > > (raid6?) is not safe assuming the member drives are not sync, but it's > > not clear to me as to why. I can see the content of an written raid5 > > array change if I fail a drive out of the array (created w/ > > --assume-clean), but data that I write prior to failing a drive remains > > intact. Perhaps I'm missing something. Could somebody elaborate on the > > danger/risk of using --assume-clean? Thanks in advance. > > > > Brian > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: --assume-clean on raid5/6 2010-08-08 8:56 ` Neil Brown @ 2010-08-08 14:17 ` brian.foster 0 siblings, 0 replies; 4+ messages in thread From: brian.foster @ 2010-08-08 14:17 UTC (permalink / raw) To: neilb, st0ff; +Cc: stefan.huebner, linux-raid > -----Original Message----- > From: Neil Brown [mailto:neilb@suse.de] > Sent: Sunday, August 08, 2010 4:56 AM > To: st0ff@npl.de > Cc: stefan.huebner@stud.tu-ilmenau.de; Foster, Brian; linux- > raid@vger.kernel.org > Subject: Re: --assume-clean on raid5/6 > > On Sat, 07 Aug 2010 14:28:55 +0200 > Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de> wrote: > > > Hi Brian, > > > > --assume-clean skips over the initial resync. Which - if you will > > create a filesystem after creating the array - is a time-saving idea. > > But keep in mind: even if the disks are brand new and contain only > > zeros, the parity would probably look not all zeros. So reading from > > such an array would be a bad idea. > > But if the next thing you do is create LVM/filesystem etc., then all > bit > > read from the array will have been written to before (and by that are > in > > sync). > > There is an important point that this misses. > > When md updates a block on a RAID5 it will sometimes use a read-modify- > write > cycle which reads the old block and old parity, subtracts the old block > from > the parity block and then added the new block to the parity block. > Then it > writes the new data block and the new parity block. > > If the old parity was correct for the old stripe, then the new parity > will be > correct for the new stripe. But if the old was wrong then the new will > be > wrong. > > So if you use assume-clean then the parity may well be wrong and could > remain > wrong even when you write new data. If you then lose a device, the > data for > that device will be computed using wrong parity and you will get wrong > data - > hence data corruption. > > So you should only use --assume-clean if you know the array really is > 'clean'. > Thanks for the information guys. I was actually attempting to test whether this could occur with a high-level sequence similar to the following: - dd /dev/urandom data to 4 small partitions (~10MB each). - Create a raid5 with --assume-clean on said partitions. - Write a small bit of data (32 bytes) to the beginning of the md, capture an image of the md to a file. - Fail/remove a drive from the md, capture a second md file image. - cmp the file images to see what changed, and read back the first 32 bytes of data. In this scenario I do observe differences in the file image, but my data remains intact. I ran this sequence multiple times, each time failing a different drive in the array and also tried to stop/restart the array (with a drop_caches in between) before the drive failure step. This leads to my question: is there a write test that can reproduce data corruption under this scenario, or is the rmw cycle some kind of optimization that is not so deterministic? Also out of curiousity, would --assume-clean be safe on a raid5 if the drives were explicitly zeroed beforehand? Thanks again. Brian > RAID1/RAID10 cannot suffer from this so --assume-clean is quite safe > with > those array types. > The current implementation of RAID6 never does read-modify-write so > --assume-clean is currently safe with RAID6 too. However I do not > promise > that RAID6 might not change to use read-modify-write cycles in some > future > implementation. So I would not recommend using --assume-clean on RAID6 > just > to avoid the resync cost. > > NeilBrown > > > > > Stefan > > > > Am 06.08.2010 03:19, schrieb brian.foster@emc.com: > > > Hi all, > > > > > > I've read in the list archives that use of --assume-clean on raid5 > > > (raid6?) is not safe assuming the member drives are not sync, but > it's > > > not clear to me as to why. I can see the content of an written > raid5 > > > array change if I fail a drive out of the array (created w/ > > > --assume-clean), but data that I write prior to failing a drive > remains > > > intact. Perhaps I'm missing something. Could somebody elaborate on > the > > > danger/risk of using --assume-clean? Thanks in advance. > > > > > > Brian > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux- > raid" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-08-08 14:17 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-06 1:19 --assume-clean on raid5/6 brian.foster 2010-08-07 12:28 ` Stefan /*St0fF*/ Hübner 2010-08-08 8:56 ` Neil Brown 2010-08-08 14:17 ` brian.foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).