* RAID5 write hole? @ 2010-06-26 14:31 Shaochun Wang 2010-06-26 15:42 ` Mikael Abrahamsson 2010-06-27 10:33 ` John Hendrikx 0 siblings, 2 replies; 8+ messages in thread From: Shaochun Wang @ 2010-06-26 14:31 UTC (permalink / raw) To: linux-raid Hi: Recently I heard of the so called "write hole" problem of raid5 in Linux software raid. I use ext4 filesystem on my NAS, which assembles data disks using Linux software raid. So I wonder how safe my such system! If the "write hole" is inevitable, will it result in the corruption of ext4 filesystem? -- Shaochun Wang <scwang@ios.ac.cn> Jabber: fungusw@jabber.org ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 write hole? 2010-06-26 14:31 RAID5 write hole? Shaochun Wang @ 2010-06-26 15:42 ` Mikael Abrahamsson 2010-06-26 21:28 ` Shaochun Wang 2010-06-27 10:33 ` John Hendrikx 1 sibling, 1 reply; 8+ messages in thread From: Mikael Abrahamsson @ 2010-06-26 15:42 UTC (permalink / raw) To: Shaochun Wang; +Cc: linux-raid On Sat, 26 Jun 2010, Shaochun Wang wrote: > Hi: > > Recently I heard of the so called "write hole" problem of raid5 in > Linux software raid. I use ext4 filesystem on my NAS, which assembles > data disks using Linux software raid. So I wonder how safe my such > system! RAID is never a replacement for backups, corruption can happen at multiple levels in your system for different reasons. Non-ECC memory can have bit flips which corrupts your data, write hole can cause data corruption, etc. Generally, unless you have really really high demands on data integrity, this is not a major problem. Ext4 has other potential software/fs interactions when it comes to data integrity, in that it write buffers for quite some time, so even if you think your file is saved, it might take many seconds before it's actually on disk if your software doesn't fsync() it. Most don't, because Ext3 took so long to do it. So generally, don't worry too much, but make sure you have backups for your important data. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 write hole? 2010-06-26 15:42 ` Mikael Abrahamsson @ 2010-06-26 21:28 ` Shaochun Wang 0 siblings, 0 replies; 8+ messages in thread From: Shaochun Wang @ 2010-06-26 21:28 UTC (permalink / raw) To: linux-raid Maybe Sun's ZFS is the ultimate choice! -- Shaochun Wang <scwang@ios.ac.cn> Jabber: fungusw@jabber.org ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 write hole? 2010-06-26 14:31 RAID5 write hole? Shaochun Wang 2010-06-26 15:42 ` Mikael Abrahamsson @ 2010-06-27 10:33 ` John Hendrikx 2010-06-27 12:16 ` Neil Brown 1 sibling, 1 reply; 8+ messages in thread From: John Hendrikx @ 2010-06-27 10:33 UTC (permalink / raw) To: Shaochun Wang; +Cc: linux-raid Shaochun Wang wrote: > Hi: > > Recently I heard of the so called "write hole" problem of raid5 in > Linux software raid. I use ext4 filesystem on my NAS, which assembles > data disks using Linux software raid. So I wonder how safe my such > system! > > If the "write hole" is inevitable, will it result in the corruption of > ext4 filesystem? The write hole occurs if your system crashes during a write operation, where one stripe gets updated but the other corresponding stripe does not. This could lead to parity information not matching the corresponding data. If the raid 5 system atleast ensures that the data stripe is always written before parity, then the montly resync check that mdadm does should be able to detect this and write new parity information. Atleast this way the bad parity does not lurk around forever on your raid system causing numerous problems when a disk finally fails. The write hole is not inevitable, but would require some special measures at the raid level which could affect performance. And as with any corruption, it could definitely corrupt your filesystem. --John ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 write hole? 2010-06-27 10:33 ` John Hendrikx @ 2010-06-27 12:16 ` Neil Brown 2010-06-29 6:16 ` Shaochun Wang 0 siblings, 1 reply; 8+ messages in thread From: Neil Brown @ 2010-06-27 12:16 UTC (permalink / raw) To: John Hendrikx; +Cc: Shaochun Wang, linux-raid On Sun, 27 Jun 2010 12:33:49 +0200 John Hendrikx <hjohn@xs4all.nl> wrote: > Shaochun Wang wrote: > > Hi: > > > > Recently I heard of the so called "write hole" problem of raid5 in > > Linux software raid. I use ext4 filesystem on my NAS, which assembles > > data disks using Linux software raid. So I wonder how safe my such > > system! > > > > If the "write hole" is inevitable, will it result in the corruption of > > ext4 filesystem? > The write hole occurs if your system crashes during a write operation, > where one stripe gets updated but the other corresponding stripe does > not. This could lead to parity information not matching the > corresponding data. Correct. > > If the raid 5 system atleast ensures that the data stripe is always > written before parity, then the montly resync check that mdadm does > should be able to detect this and write new parity information. This bit isn't so correct. When the RAID5 is next assembled after the crash, if all devices are present (i.e. the array is not degraded) then it will check and correct all the parity blocks immediately. If you have a write-intent-bitmap configured, this will be quite quick. If not it could take hours. Once the resync has completed you are safe again, any risk from the "write hole" will have disappeared. If your array was degraded when the system crashed, or is degraded on restart, or degrades before the resync completes, then you could suffer from the "Write hole" ... if a write was interrupted by the crash. In the first two cases (which are effectively the same case), mdadm will refuse to assemble the array because it knows it could be suffering from a write-hole problem. You need to reassemble with "--force" which means you acknowledge that there could be corruption due to the write hole. If you lose a device during the resync you could still suffer from the write hole, but md doesn't alert you to this. That could be seen as a short-coming, but I'm not sure how it might be fixed. I wouldn't want the array to suddenly stop working because there is suddenly a risk of write-hold based corruption.... > > Atleast this way the bad parity does not lurk around forever on your > raid system causing numerous problems when a disk finally fails. Yes, it certainly does not lurk forever - the resync fixes it. > > The write hole is not inevitable, but would require some special > measures at the raid level which could affect performance. And as with > any corruption, it could definitely corrupt your filesystem. The write hole can be "fixed" in two ways that I am aware of. 1/ log all writes (including parity updates) to some stable storage before writing them to the RAID5. This is typically done in "hardware RAID" cards using NVRAM for the stable storage. Once NVRAM is widely available on commodity server hardware I suspect md/raid5 will be enhanced to support this. I have thought about doing this using a RAID1 as the alternate stable storage, but the performance cost is unlikely to acceptable. 2/ use a filesystem which understands the layout of the RAID5 and which somehow "knows" which stripes were written "recently" so that it can invalidate them (if it cannot verify them) after a crash. This would almost certainly require a copy-on-write disciple in the filesystem. NeilBrown ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 write hole? 2010-06-27 12:16 ` Neil Brown @ 2010-06-29 6:16 ` Shaochun Wang 2010-06-29 6:23 ` Mikael Abrahamsson 0 siblings, 1 reply; 8+ messages in thread From: Shaochun Wang @ 2010-06-29 6:16 UTC (permalink / raw) To: linux-raid On Sun, Jun 27, 2010 at 10:16:13PM +1000, Neil Brown wrote: > On Sun, 27 Jun 2010 12:33:49 +0200 > John Hendrikx <hjohn@xs4all.nl> wrote: > > parity blocks immediately. If you have a write-intent-bitmap configured, > this will be quite quick. If not it could take hours. How do I know whether my RAID5 has wirte-intent bitmap enabled? -- Shaochun Wang <scwang@ios.ac.cn> Jabber: fungusw@jabber.org ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 write hole? 2010-06-29 6:16 ` Shaochun Wang @ 2010-06-29 6:23 ` Mikael Abrahamsson 2010-06-29 13:28 ` Shaochun Wang 0 siblings, 1 reply; 8+ messages in thread From: Mikael Abrahamsson @ 2010-06-29 6:23 UTC (permalink / raw) To: Shaochun Wang; +Cc: linux-raid On Tue, 29 Jun 2010, Shaochun Wang wrote: > How do I know whether my RAID5 has wirte-intent bitmap enabled? $ cat /proc/mdstat | grep bitmap bitmap: 0/8 pages [0KB], 131072KB chunk If you don't get any bitmap information in there, it's not enabled. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 write hole? 2010-06-29 6:23 ` Mikael Abrahamsson @ 2010-06-29 13:28 ` Shaochun Wang 0 siblings, 0 replies; 8+ messages in thread From: Shaochun Wang @ 2010-06-29 13:28 UTC (permalink / raw) To: linux-raid On Tue, Jun 29, 2010 at 08:23:35AM +0200, Mikael Abrahamsson wrote: > On Tue, 29 Jun 2010, Shaochun Wang wrote: > > $ cat /proc/mdstat | grep bitmap > bitmap: 0/8 pages [0KB], 131072KB chunk > > If you don't get any bitmap information in there, it's not enabled. It seems that I do not have write-intent bitmap enabled. If write-intent bitmap is useful, why is it not the default one? -- Shaochun Wang <scwang@ios.ac.cn> Jabber: fungusw@jabber.org ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-06-29 13:28 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-26 14:31 RAID5 write hole? Shaochun Wang 2010-06-26 15:42 ` Mikael Abrahamsson 2010-06-26 21:28 ` Shaochun Wang 2010-06-27 10:33 ` John Hendrikx 2010-06-27 12:16 ` Neil Brown 2010-06-29 6:16 ` Shaochun Wang 2010-06-29 6:23 ` Mikael Abrahamsson 2010-06-29 13:28 ` Shaochun Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).