* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) [not found] ` <47A7188A.4070005@msgid.tls.msk.ru> @ 2008-02-04 14:09 ` Justin Piszcz 2008-02-04 14:25 ` Eric Sandeen 0 siblings, 1 reply; 11+ messages in thread From: Justin Piszcz @ 2008-02-04 14:09 UTC (permalink / raw) To: Michael Tokarev; +Cc: Moshe Yudkowsky, linux-raid, xfs, sandeen On Mon, 4 Feb 2008, Michael Tokarev wrote: > Moshe Yudkowsky wrote: > [] >> If I'm reading the man pages, Wikis, READMEs and mailing lists correctly >> -- not necessarily the case -- the ext3 file system uses the equivalent >> of data=journal as a default. > > ext3 defaults to data=ordered, not data=journal. ext2 doesn't have > journal at all. > >> The question then becomes what data scheme to use with reiserfs on the > > I'd say don't use reiserfs in the first place ;) > >> Another way to phrase this: unless you're running data-center grade >> hardware and have absolute confidence in your UPS, you should use >> data=journal for reiserfs and perhaps avoid XFS entirely. > > By the way, even if you do have a good UPS, there should be some > control program for it, to properly shut down your system when > UPS loses the AC power. So far, I've seen no such programs... > > /mjt Why avoid XFS entirely? esandeen, any comments here? Justin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 14:09 ` RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) Justin Piszcz @ 2008-02-04 14:25 ` Eric Sandeen 2008-02-04 14:42 ` Eric Sandeen ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Eric Sandeen @ 2008-02-04 14:25 UTC (permalink / raw) To: Justin Piszcz; +Cc: Michael Tokarev, Moshe Yudkowsky, linux-raid, xfs Justin Piszcz wrote: > Why avoid XFS entirely? > > esandeen, any comments here? Heh; well, it's the meme. see: http://oss.sgi.com/projects/xfs/faq.html#nulls and note that recent fixes have been made in this area (also noted in the faq) Also - the above all assumes that when a drive says it's written/flushed data, that it truly has. Modern write-caching drives can wreak havoc with any journaling filesystem, so that's one good reason for a UPS. If the drive claims to have metadata safe on disk but actually does not, and you lose power, the data claimed safe will evaporate, there's not much the fs can do. IO write barriers address this by forcing the drive to flush order-critical data before continuing; xfs has them on by default, although they are tested at mount time and if you have something in between xfs and the disks which does not support barriers (i.e. lvm...) then they are disabled again, with a notice in the logs. Note also that ext3 has the barrier option as well, but it is not enabled by default due to performance concerns. Barriers also affect xfs performance, but enabling them in the non-battery-backed-write-cache scenario is the right thing to do for filesystem integrity. -Eric > Justin. > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 14:25 ` Eric Sandeen @ 2008-02-04 14:42 ` Eric Sandeen 2008-02-04 15:31 ` Moshe Yudkowsky 2008-02-04 16:38 ` Michael Tokarev 2 siblings, 0 replies; 11+ messages in thread From: Eric Sandeen @ 2008-02-04 14:42 UTC (permalink / raw) To: Justin Piszcz; +Cc: Michael Tokarev, Moshe Yudkowsky, linux-raid, xfs Eric Sandeen wrote: > Justin Piszcz wrote: > >> Why avoid XFS entirely? >> >> esandeen, any comments here? > > Heh; well, it's the meme. > > see: > > http://oss.sgi.com/projects/xfs/faq.html#nulls > > and note that recent fixes have been made in this area (also noted in > the faq) Actually, continue reading past that specific entry to the next several, it covers all this quite well. -Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 14:25 ` Eric Sandeen 2008-02-04 14:42 ` Eric Sandeen @ 2008-02-04 15:31 ` Moshe Yudkowsky 2008-02-04 16:45 ` Eric Sandeen 2008-02-04 16:38 ` Michael Tokarev 2 siblings, 1 reply; 11+ messages in thread From: Moshe Yudkowsky @ 2008-02-04 15:31 UTC (permalink / raw) To: Eric Sandeen; +Cc: Justin Piszcz, Michael Tokarev, linux-raid, xfs Eric, Thanks very much for your note. I'm becoming very leery of resiserfs at the moment... I'm about to run another series of crash tests. Eric Sandeen wrote: > Justin Piszcz wrote: > >> Why avoid XFS entirely? >> >> esandeen, any comments here? > > Heh; well, it's the meme. Well, yeah... > Note also that ext3 has the barrier option as well, but it is not > enabled by default due to performance concerns. Barriers also affect > xfs performance, but enabling them in the non-battery-backed-write-cache > scenario is the right thing to do for filesystem integrity. So if I understand you correctly, you're stating that current the most reliable fs in its default configuration, in terms of protection against power-loss scenarios, is XFS? -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "There is something fundamentally wrong with a country [USSR] where the citizens want to buy your underwear." -- Paul Thereaux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 15:31 ` Moshe Yudkowsky @ 2008-02-04 16:45 ` Eric Sandeen 2008-02-04 17:22 ` Michael Tokarev 0 siblings, 1 reply; 11+ messages in thread From: Eric Sandeen @ 2008-02-04 16:45 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Justin Piszcz, Michael Tokarev, linux-raid, xfs Moshe Yudkowsky wrote: > So if I understand you correctly, you're stating that current the most > reliable fs in its default configuration, in terms of protection against > power-loss scenarios, is XFS? I wouldn't go that far without some real-world poweroff testing, because various fs's are probably more or less tolerant of a write-cache evaporation. I suppose it'd depend on the size of the write cache as well. -Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 16:45 ` Eric Sandeen @ 2008-02-04 17:22 ` Michael Tokarev 2008-02-05 12:31 ` Linda Walsh 0 siblings, 1 reply; 11+ messages in thread From: Michael Tokarev @ 2008-02-04 17:22 UTC (permalink / raw) To: Eric Sandeen; +Cc: Moshe Yudkowsky, Justin Piszcz, linux-raid, xfs Eric Sandeen wrote: > Moshe Yudkowsky wrote: >> So if I understand you correctly, you're stating that current the most >> reliable fs in its default configuration, in terms of protection against >> power-loss scenarios, is XFS? > > I wouldn't go that far without some real-world poweroff testing, because > various fs's are probably more or less tolerant of a write-cache > evaporation. I suppose it'd depend on the size of the write cache as well. I know no filesystem which is, as you say, tolerant to a write-cache evaporation. If a drive says the data is written but in fact it's not, it's a Bad Drive (tm) and it should be thrown away immediately. Fortunately, almost all modern disk drives don't lie this way. The only thing needed for the filesystem is to tell the drive to flush it's cache at the appropriate time, and actually wait for the flush to complete. Barriers (mentioned in this thread) is just another way to do so, in a somewhat more efficient way, but normal cache flush will do as well. IFF the write caching is enabled in the first place - note that with some workloads, write caching in the drive actually makes write speed worse, not better - namely, in case of massive writes. Speaking of XFS (and with ext3fs with write barriers enabled) - I'm confused here as well, and answers to my questions didn't help either. As far as I understand, XFS only use barriers, not regular cache flushes, hence without write barrier support (which is not here for linux software raid, which is explained elsewhere) it's unsafe, -- probably the same applies to ext3 with barrier support enabled. But I'm not sure I got it all correctly. /mjt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 17:22 ` Michael Tokarev @ 2008-02-05 12:31 ` Linda Walsh 0 siblings, 0 replies; 11+ messages in thread From: Linda Walsh @ 2008-02-05 12:31 UTC (permalink / raw) To: xfs; +Cc: linux-raid Michael Tokarev wrote: > note that with some workloads, write caching in > the drive actually makes write speed worse, not better - namely, > in case of massive writes. ---- With write barriers enabled, I did a quick test of a large copy from one backup filesystem to another. I'm not what you refer to when you say large, but this disk has 387G used with 975 files, averaging about 406MB/file. I was copying from /hde (ATA100-750G) to /sdb (SATA-300-750G) (both, basically underlying model) Of course your 'mileage may vary', and these were averages over 12 runs each (w/ + w/out wcaching); (write cache on) write read dev ave TPS MB/s MB/s hde ave 64.67 30.94 0.0 sdb ave 249.51 0.24 30.93 (write cache off) write read dev ave TPS MB/s MB/s hde ave 45.63 21.81 0.0 xx: ave 177.76 0.24 21.96 write w/cache = (30.94-21.86)/21.86 => 45% faster w/o write cache = 100-(100*21.81/30.94) => 30% slower These disks have barrier support, so I'd guess the differences would have been greater if you didn't worry about losing w-cache contents. If barrier support doesn't work and one has to disable write-caching, that is a noticeable performance penalty. All writes with noatime, nodiratime, logbufs=8. FWIW...slightly OT, the rates under Win for their write-through (FAT32-perf) vs. write-back caching (NTFS-perf) were FAT about 60% faster over NTFS or NTFS ~ 40% slower than FAT32 (with ops for no-last-access and no 3.1 filename creation) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 14:25 ` Eric Sandeen 2008-02-04 14:42 ` Eric Sandeen 2008-02-04 15:31 ` Moshe Yudkowsky @ 2008-02-04 16:38 ` Michael Tokarev 2008-02-04 22:27 ` Justin Piszcz 2008-02-06 1:12 ` Linda Walsh 2 siblings, 2 replies; 11+ messages in thread From: Michael Tokarev @ 2008-02-04 16:38 UTC (permalink / raw) To: Eric Sandeen; +Cc: Justin Piszcz, Moshe Yudkowsky, linux-raid, xfs Eric Sandeen wrote: [] > http://oss.sgi.com/projects/xfs/faq.html#nulls > > and note that recent fixes have been made in this area (also noted in > the faq) > > Also - the above all assumes that when a drive says it's written/flushed > data, that it truly has. Modern write-caching drives can wreak havoc > with any journaling filesystem, so that's one good reason for a UPS. If Unfortunately an UPS does not *really* help here. Because unless it has control program which properly shuts system down on the loss of input power, and the battery really has the capacity to power the system while it's shutting down (anyone tested this? With new UPS? and after an year of use, when the battery is not new?), -- unless the UPS actually has the capacity to shutdown system, it will cut the power at an unexpected time, while the disk(s) still has dirty caches... > the drive claims to have metadata safe on disk but actually does not, > and you lose power, the data claimed safe will evaporate, there's not > much the fs can do. IO write barriers address this by forcing the drive > to flush order-critical data before continuing; xfs has them on by > default, although they are tested at mount time and if you have > something in between xfs and the disks which does not support barriers > (i.e. lvm...) then they are disabled again, with a notice in the logs. Note also that with linux software raid barriers are NOT supported. /mjt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 16:38 ` Michael Tokarev @ 2008-02-04 22:27 ` Justin Piszcz 2008-02-06 1:12 ` Linda Walsh 1 sibling, 0 replies; 11+ messages in thread From: Justin Piszcz @ 2008-02-04 22:27 UTC (permalink / raw) To: Michael Tokarev; +Cc: Eric Sandeen, Moshe Yudkowsky, linux-raid, xfs On Mon, 4 Feb 2008, Michael Tokarev wrote: > Eric Sandeen wrote: > [] >> http://oss.sgi.com/projects/xfs/faq.html#nulls >> >> and note that recent fixes have been made in this area (also noted in >> the faq) >> >> Also - the above all assumes that when a drive says it's written/flushed >> data, that it truly has. Modern write-caching drives can wreak havoc >> with any journaling filesystem, so that's one good reason for a UPS. If > > Unfortunately an UPS does not *really* help here. Because unless > it has control program which properly shuts system down on the loss > of input power, and the battery really has the capacity to power the > system while it's shutting down (anyone tested this? With new UPS? > and after an year of use, when the battery is not new?), -- unless > the UPS actually has the capacity to shutdown system, it will cut > the power at an unexpected time, while the disk(s) still has dirty > caches... You use nut and a large enough UPS to handle the load of the system, it shuts the machine down just fine. > >> the drive claims to have metadata safe on disk but actually does not, >> and you lose power, the data claimed safe will evaporate, there's not >> much the fs can do. IO write barriers address this by forcing the drive >> to flush order-critical data before continuing; xfs has them on by >> default, although they are tested at mount time and if you have >> something in between xfs and the disks which does not support barriers >> (i.e. lvm...) then they are disabled again, with a notice in the logs. > > Note also that with linux software raid barriers are NOT supported. > > /mjt > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-04 16:38 ` Michael Tokarev 2008-02-04 22:27 ` Justin Piszcz @ 2008-02-06 1:12 ` Linda Walsh 2008-02-06 2:12 ` Michael Tokarev 1 sibling, 1 reply; 11+ messages in thread From: Linda Walsh @ 2008-02-06 1:12 UTC (permalink / raw) To: Michael Tokarev Cc: Eric Sandeen, Justin Piszcz, Moshe Yudkowsky, linux-raid, xfs Michael Tokarev wrote: > Unfortunately an UPS does not *really* help here. Because unless > it has control program which properly shuts system down on the loss > of input power, and the battery really has the capacity to power the > system while it's shutting down (anyone tested this? ---- Yes. I must say, I am not connected or paid by APC. > With new UPS? > and after an year of use, when the battery is not new?), -- unless > the UPS actually has the capacity to shutdown system, it will cut > the power at an unexpected time, while the disk(s) still has dirty > caches... -------- If you have a "SmartUPS" by "APC", their is a freeware demon that monitors it's status. The UPS has USB and serial connections. It's included in some distributions (SuSE). The config file is pretty straight forward. I recommend the "1000XL" (1000 peak Volt-Amp load -- usually at startup; note, this is not the same as watts as some of us were taught in basic electronics class since the unit isn't a simple resistor (like a light bulb). over the 1500XL because with the 1000XL, you can buy several "add-on batteries" that plug into the back. One minor (but not fatal) design flaw: the add-on batteries give no indication that they are "live" (I knocked a cord on one, and only got 7 minutes of uptime before things shut-down instead of my expected 20. I have 3-cells total (controller & 1 extra pack). So why is my run time so short? I am being lazy in buying more extension packs. The UPS is running 3 computers, the house-phone, (answering and wireless handsets). a digital clock, 1 LCD (usually off), The real killer is a new workstation with 2x2-Core-II chips and other comparable equipment. The "1500XL" doesn't allow for adding more power packs. The "2200XL" does allow extra packs but comes in a rack-mount format. It's not just a battery backup -- it conditions the power -- to filter out spikes and emit a pure sine wave. It will kick in during over or under voltage conditions (you can set the sensitivity). Adjustable alarm when on battery, setting of output volts (115, 230, 120, 240). It selftests at least every 2 weeks or shorter (to your fancy). It also has a network feature (that I haven't gotten to work yet -- they just changed the format), that allows other computers on the same net to also be notified and take action. You specify what scripts to run at what times (power off, power on, getting critically low, etc). Hasn't failed me 'yet' -- cept when a charger died and was replaced free of cost (within warantee). I have a separate setup another room for another computer. The upspowerd runs on linux or windows (under cygwin, I think). You can specify when to shut down -- like "5 minutes of battery life left. The controller unit has 1 battery. But the add-ons have 2 batteries each, so the first add-on adds 3x to the run-time. When my system did shut down "prematurely", it went through the full "halt" sequence, which I'd presume flushes disk caches. > >> the drive claims to have metadata safe on disk but actually does not, >> and you lose power, the data claimed safe will evaporate, there's not >> much the fs can do. IO write barriers address this by forcing the drive >> to flush order-critical data before continuing; xfs has them on by >> default, although they are tested at mount time and if you have >> something in between xfs and the disks which does not support barriers >> (i.e. lvm...) then they are disabled again, with a notice in the logs. > Note also that with linux software raid barriers are NOT supported. ------ Are you sure about this? When my system boots, I used to have 3 new IDE's, and one older one. XFS checked each drive for barriers and turned off barriers for a disk that didn't support it. ... or are you referring specifically to linux-raid setups? Would it be possible on boot to have xfs probe the Raid array, physically, to see if barriers are really supported (or not), and disable them if they are not (and optionally disabling write caching, but that's a major performance hit in my experience. Linda ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) 2008-02-06 1:12 ` Linda Walsh @ 2008-02-06 2:12 ` Michael Tokarev 0 siblings, 0 replies; 11+ messages in thread From: Michael Tokarev @ 2008-02-06 2:12 UTC (permalink / raw) To: Linda Walsh; +Cc: Eric Sandeen, Justin Piszcz, Moshe Yudkowsky, linux-raid, xfs Linda Walsh wrote: > > Michael Tokarev wrote: >> Unfortunately an UPS does not *really* help here. Because unless >> it has control program which properly shuts system down on the loss >> of input power, and the battery really has the capacity to power the >> system while it's shutting down (anyone tested this? > ---- > Yes. I must say, I am not connected or paid by APC. > >> With new UPS? >> and after an year of use, when the battery is not new?), -- unless >> the UPS actually has the capacity to shutdown system, it will cut >> the power at an unexpected time, while the disk(s) still has dirty >> caches... > -------- > If you have a "SmartUPS" by "APC", their is a freeware demon that monitors [...] Good stuff. I knew at least SOME UPSes are good... ;) Too bad I rarely see such stuff in use by regular home users... [] >> Note also that with linux software raid barriers are NOT supported. > ------ > Are you sure about this? When my system boots, I used to have > 3 new IDE's, and one older one. XFS checked each drive for barriers > and turned off barriers for a disk that didn't support it. ... or > are you referring specifically to linux-raid setups? I'm referring especially to linux-raid setups (software raid). md devices don't support barriers, because of a very simple reasons: once more than one disk drive is involved, md layer can't guarantee ordering ACROSS drives too. The problem is that in case of power loss during writes, when an array needs recovery/resync (at least the parts which were being written, if bitmaps are in use), md layer will choose arbitrary drive as a "master" and will copy data to another drive (speaking of simplest case of 2-drive raid1 array). But the thing is that one drive may have two last barriers written (I mean the data that was "assotiated" with the barriers), and another neither of the two - in two different places. And hence we may see quite.. some inconsistency here. This is regardless of whether underlying component devices supports barriers or not. > Would it be possible on boot to have xfs probe the Raid array, > physically, to see if barriers are really supported (or not), and disable > them if they are not (and optionally disabling write caching, but that's > a major performance hit in my experience. Xfs already probes the devices as you describe, exactly the same way as you've seen with your ide disks, and disables barriers. The question and confusing was about what happens when the barriers are disabled (provided, again, that we don't rely on UPS and other external things). As far as I understand, when barriers are working properly, xfs should be safe wrt power losses (still a bit unsure about this). Now, when barriers are turned off (for whatever reason), is it still as safe? I don't know. Does it use regular cache flushes in place of barriers in that case (which ARE supported by md layer)? Generally, it has been said numerous times that XFS is not "powercut-friendly", and it has to be used when everything is stable, including power. Hence I'm afraid to deploy it where I know the power is not stable (we've about 70 such places here, with servers in each, where they don't always replace UPS batteries in time - ext3fs never crashed so far, while ext2 did). Thanks. /mjt ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-02-06 2:12 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <47A612BE.5050707@pobox.com>
[not found] ` <47A623EE.4050305@msgid.tls.msk.ru>
[not found] ` <47A62A17.70101@pobox.com>
[not found] ` <47A6DA81.3030008@msgid.tls.msk.ru>
[not found] ` <47A6EFCF.9080906@pobox.com>
[not found] ` <47A7188A.4070005@msgid.tls.msk.ru>
2008-02-04 14:09 ` RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash) Justin Piszcz
2008-02-04 14:25 ` Eric Sandeen
2008-02-04 14:42 ` Eric Sandeen
2008-02-04 15:31 ` Moshe Yudkowsky
2008-02-04 16:45 ` Eric Sandeen
2008-02-04 17:22 ` Michael Tokarev
2008-02-05 12:31 ` Linda Walsh
2008-02-04 16:38 ` Michael Tokarev
2008-02-04 22:27 ` Justin Piszcz
2008-02-06 1:12 ` Linda Walsh
2008-02-06 2:12 ` Michael Tokarev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox